10,000 Matching Annotations
  1. Nov 2025
    1. Reviewer #2 (Public review):

      Summary:

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations.

      Strengths:

      (1) The authors have carried out sufficient literature review to establish the background and significance of their study.

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis.

      Weaknesses:

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers.

      (2) The results discussed in the paper add only a slight novel aspect to the field of tuberculosis. While the authors have used multiple models to investigate the role of Mast cells in TB, majority of the results discussed in the Figure 1-2 are already known and are re-validation of previous literature.

      (3) The claims made in the manuscript are only partially supported by the presented data. However, additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

      Comments on revisions:

      While most of the comments have been addressed by the authors, a few important concerns pertaining to the data interpretation remain unanswered.

      (1) The discrepancy between published studies and the current study on function of mast cells during TB remains. The authors could not justify the reason behind differences in results obtained during Mtb infection in humans vs macaques.

      (2) To address the concern regarding immune alterations in mast cells deficient mice, the authors carried out adoptive transfer of mast cells to WT mice. However, they do not observe any changes in mycobacterial lung burden and inflammation, diluting their conclusions throughout the study.

      (3) Additionally, as the authors propose mast cells as players in LTBI to PTB conversion, the adoptive transfer experiment could be conducted in a low-dosage model of TB. This would aid in assessing its role in TB reactivation.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study by Gupta et al. investigates the role of mast cells (MCs) in tuberculosis (TB) by examining their accumulation in the lungs of M. tuberculosis-infected individuals, non-human primates, and mice. The authors suggest that MCs expressing chymase and tryptase contribute to the pathology of TB and influence bacterial burden, with MC-deficient mice showing reduced lung bacterial load and pathology. 

      Strengths: 

      (1) The study addresses an important and novel topic, exploring the potential role of mast cells in TB pathology. 

      (2) It incorporates data from multiple models, including human, non-human primates, and mice, providing a broad perspective on MC involvement in TB. 

      (3) The finding that MC-deficient mice exhibit reduced lung bacterial burden is an interesting and potentially significant observation. 

      Weaknesses: 

      (1) The evidence is inconsistent across models, leading to divergent conclusions that weaken the overall impact of the study. 

      The strength of the study is the use of multiple models including mouse, nonhuman primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Key claims, such as MC-mediated cytokine responses and conversion of MC subtypes in granulomas, are not well-supported by the data presented.

      To address the reviewer’ s comments we will carry out further experimentation to strengthen the link between MC subtypes and cytokine responses. 

      (3) Several figures are either contradictory or lack clarity, and important discrepancies, such as the differences between mouse and human data, are not adequately discussed. 

      We will further clarify the figures and streamline the discussions between the different models used in the study. 

      (4) Certain data and conclusions require further clarification or supporting evidence to be fully convincing. 

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper. 

      Reviewer #2 (Public review): 

      Summary: 

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations. 

      Strengths: 

      (1) The authors have carried out a sufficient literature review to establish the background and significance of their study. 

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis. 

      Weaknesses: 

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers. 

      The strength of the study is the use of multiple models including mouse, nonhuman primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Throughout the manuscript, the authors have mislabelled the legends for WT B6 mice and mast cell-deficient mice. As a result, the discussion and claims made in relation to the data do not align with the corresponding graphs (Figure 1B, 3, 4, and S2). This discrepancy undermines the accuracy of the conclusions drawn from the results. 

      We apologize for the discrepancy which will be corrected in the revised manuscript 

      (3) The results discussed in the paper do not add a significant novel aspect to the field of tuberculosis, as the majority of the results discussed in Figure 1-2 are already known and are a re-validation of previous literature.

      This is the first study which has used mouse, NHP and human TB samples from Mtb infection to characterize and validate the role of MC in TB. We believe the current study provides significant novel insights into the role of MC in TB. 

      (4) The claims made in the manuscript are only partially supported by the presented data. Additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper.

      Reviewer #1 (Recommendations for the authors):

      In the study by Gupta et al., the authors report an accumulation of mast cells (MCs) expressing the proteases chymase and tryptase in the lungs of M. tuberculosis-infected individuals and non-human primates, as compared to healthy controls and latently infected individuals. They also MCs appear to play a pathological role in mice. Notably, MC-deficient mice show reduced lung bacterial burden and pathology during infection.

      While the topic is of interest, the study is overall quite preliminary, and many conclusions are not wellsupported by the presented data. The reliance on three different models, each suggesting divergent outcomes, weakens the ability to draw definitive conclusions. Specifically, the claim that "MCs (...) mediate cytokine responses to drive pathology and promote Mtb susceptibility and dissemination during TB" is not substantiated by the data.

      Major comments

      (1) In human samples, the authors conclude that "While MCTCs accumulated in early immature granulomas within TB lesions, MCCs accumulated in late granulomas in TB patients" and that MCTs "likely convert first to MCTCs in early granulomas before becoming MCCs in late mature granulomas with necrotic cores." However, Figure 1B shows the opposite. Furthermore, the assertion that MCTs "convert" into MCTCs is not justified by the data.

      Corrections have been made to the figures to ensure clarity for the reader. We demonstrate accumulation of tryptase-expressing MCs in healthy individuals, while the dual tryptase and chymaseexpressing MCs were seen in early granulomas, and only chymase-associated MCs were observed in late granulomas depicting more pathology of the disease. We have removed the line as advised by the reviewer.

      (2) In Figure 2 I and J, the panels do not demonstrate co-expression of chymase and tryptase in clusters 0, 1, and 3 in PTB samples, which contradicts the histology data. This discrepancy is left unaddressed and raises concerns about the conclusions drawn from Figures 1 and 2.

      We thank the reviewer for pointing this out. We revisited the data and now show the coexpression of the dual expressing cells in the data (Figure 2H). This discrepancy stemmed from the crossspecies nature of the dataset. It turns out the there is a considerable diversity in sequence similarity and tryptase function between human and NHPs (Trivedi et al., 2007). We explain this in the section now (line 313-364). Briefly, while humans express TPSG1 (encoding  tryptase) and TPSD1 (encoding  tryptase) and have the same gene name in NHP, the gene name for more widely expressed TPSAB1(encoding  /  tryptase) is different for NHP and the gene names are not shared as they are still predicated putative protein. The putative genes from NHP that map to human TPSAB1 is LOC699599 for M. mulatta and LOC102139613 for M. fasicularis, respectively. Thus, looking for TPSAB1 gene yielded no result in our previous analysis but examining these orthologous gene names, now phenocopy the results we see in the histology data. To strengthen our findings, we have now analyzed an additional single-cell dataset from the lungs of NHP M. fasicularis (Figure 2J-L) and found the co-expression of chymase and tryptase, adding an important validation to our histological findings.

      (3) Figure 2 serves more as a resource and contributes little to the core findings of the study. It might be better suited as supplementary material.

      We thank the reviewer for the suggestion; however, we believe that Figure 2 serves as an independent validation in a different species (NHP), showing heterogeneity in MCs across species in a TB model. The figure adds value as there are only a handful of studies (Tauber et al., 2023, Derakhshan et al., 2022, Cildir et al., 2021) but none in TB, describing MCs at single cell level, of which one is published from our group showing MC cluster in Mtb infected macaques (Esaulova et al., 2021). We feel strongly that dissecting MCs as specifically done here provides an important insight into the transcriptional heterogeneity of these cells linked to disease states. We have also added an additional NHP lung single cell dataset (Gideon et al., 2022) to complement our analysis, thus adding another validation, strengthening these findings. So, we believe in retaining the figure as an integral part of the main paper.

      (4) In lines 275-277, the data referenced should be shown to support the claims.

      We thank the reviewer for the suggestion. The text originally noted by the reviewer now appears in the revised manuscript at line 370-372 and the corresponding data has now been included as supplementary Figure S3. 

      (5) In Figure 3B, the difference between the two mouse strains becomes non-significant by day 150 pi, weakening the overall conclusion that MCs contribute to the bacterial burden.

      At 100 dpi, MC-deficient mice exhibit lower Mtb CFU in both the lung and spleen, indicating improved protection. By 150 dpi, lung CFU differences are no longer significant; however, dissemination to the spleen remains reduced in MC-deficient mice. Thus, the overall conclusion that MCs contribute to increased bacterial burden remains valid, particularly with respect to dissemination. This conclusion is further supported by new data showing that adoptive transfer of MCs into B6 Mtb-infected mice increased Mtb dissemination to the spleen (Figure 5E). 

      (6) Figures 3D and E are not particularly convincing.

      Figures 3D and 3E illustrate lung inflammation in MC-deficient mice compared to wild-type which more distinctly show that MC-deficient mice exhibit significantly less inflammation at 150 dpi, supporting the role of MCs in driving lung.

      (7) In Figures 4 and S3, the color coding in panels A-F appears incorrect but is accurate in G. This inconsistency is confusing.

      We thank the reviewer for noting this. The color coding has been corrected to ensure consistency across all figures.

      (8) In the mouse model, MCs seem to disappear during infection, in contrast to observations in human and macaque samples. This discrepancy is not discussed in the paper.

      We thank the reviewer for this important observation. In response, we performed a new analysis of lung MCs at baseline in wild-type and MC-deficient mice. Our data show that naïve wild-type lungs contain a small population of MCs, which is further reduced in MC-deficient mice. Following Mtb infection, MCs progressively accumulate in wild-type mice, whereas this accumulation is significantly impaired in MC-deficient mice. These new data are now included in Figure (Figure 4A) and also updated in the text (line 395-403).

      (9) In lines 306-307, data should be shown to support the claims.

      We thank the reviewer for the suggestion. The text originally noted by the reviewer now appears in the revised manuscript at line 399-400 and the corresponding data has now been included as supplementary Figure S4. 

      Minor comments

      (1) What does "granuloma-associated" cells mean in samples from healthy controls?

      We thank the reviewer for this point. The language has been revised to accurately refer to cells in the lung parenchyma in the Figure 1, rather than “granuloma associated” cells.

      (2) In line 229, it is unclear what "these cells" refers to.

      The phrase “these cells” refers to tryptase-expressing mast cells. This has now been clarified in the revised manuscript (line 276-277).

      (3) The citation of Figure 3A in lines 284-285 is misplaced in the text and should be corrected.

      The figure citation has been corrected in the text in the revised manuscript (lines 376-379).

      Reviewer #2 (Recommendations for the authors):

      (1) The data presented in Figure 1 seems to be a re-validation of the already known aspects of mast cells in TB granulomas. While distinct roles for mast cells in regulating Mtb infection have been reported, the manuscript appears to be a failed opportunity to characterize the transcriptional signatures of the distinct subsets and identify their role in previously reported processes towards controlling TB disease progression.

      We thank the reviewer for the insight. While it was not our intent to investigate the bulk transcriptome, owing to the high number of cells required to get enough RNA for transcriptomic sequencing, it is technically challenging due to the low abundance of mast cells during TB infection (Figure 2). The motivation for Figure 2, that we utilized a more sensitive transcriptomic analysis to find the different transcriptional states in the distinct TB disease states. We believe that this analysis captures the essence of what the reviewer and provides meaningful insights into mast cell heterogeneity during TB.

      (2) The experiments lack uniformity with respect to the strains of Mtb used for experimentation. For eg: Mtb strain HN878 was used for aerosol infection of mice while Mtb CDC1551 was used for macaques. If there were experimental constraints with respect to the choice, the same should be mentioned.

      We thank the reviewer for this comment. The Mtb strain usage has been consistent within each species: HN878 for mice and CDC1551 for non-human primates (NHPs), in line with prior studies from our lab. The species-specific choice reflects the differences in pathogenicity of these strains in mice versus NHPs. CDC1551, which exhibits lower virulence, allows the development of a macaque model that recapitulates human latent to chronic TB when administered via aerosol at low to moderate doses (Kaushal et al., 2015; Sharan et al., 2021; Singh et al., 2025). In contrast, the more virulent HN878 strain leads to severe disease and high mortality in NHPs and is therefore not suitable for these models. Using CDC1551 in macaques provides a controlled and clinically relevant platform to study immunological and pathophysiological mechanisms of TB, justifying its use in the current study. This explanation has now been added to the manuscript method section (lines 109-114).

      (3) Line 84- 85, the authors state that "Chymase positive MCs contribute to immune pathology and reduced Mtb control". Previous reports including Garcia-Rodriguez et al., 2021 associate high MCTCs with improved lung function. Additionally, in the macaques model of latent TB infection reported in the manuscript, the number of chymase-expressing MCs seems to significantly decrease. The authors should justify the same. 

      We thank the reviewer for this comment. In Garcia-Rodriguez et al., 2021, chymase-expressing MCs accumulate in fibrotic lung lesions. Fibrosis is a result of excessive inflammation in TB infection and is associated with lung damage. Similarly, in idiopathic pulmonary fibrosis, higher density and percentage of chymase-expressing MCs correlate positively with fibrosis severity (Andersson et al., 2011). In our study, although fibrosis was not directly assessed, chymase-positive MCs increased in late lung granulomas, consistent with advanced inflammatory disease. Therefore, our conclusion that chymaseproducing MCs contribute to lung pathology is justified and aligns with prior observations.

      (4) The manuscript would benefit from a brief description of the experimental conditions for the previously published scRNAseq data used in the current study.

      We thank the reviewer for the suggestion, and the information has been included in the final manuscript (lines 294-297) and represented as Figure 2A.

      (5) The authors have not mentioned the criteria used to categorize early and late granulomas in TB patients. A lucid description of the same is necessary.

      Based on reviewer’s comment the detailed categorization of early and late granulomas in TB patients is now included in the revised manuscript (line 256-260). Early granulomas: Discrete conglomerates of immune cells and resident stromal cells with defined borders and absence of central necrosis, and Late granulomas: Large and dense clusters of immune cells and resident cells with an evident necrotic center containing bacteria and dead neutrophils and lymphocytic infiltrating cells on the periphery of the necrotic center. MCs were measured in the periphery and inside early granulomas, while in the late granulomas, they were mainly quantified in the periphery.

      (6) The authors mention that "While MCTCs accumulated in early immature granulomas within TB lesions, MCCs accumulated in late granulomas in TB patients". While this is evident from the representative, the quantification in Figure 1B seems to indicate otherwise.

      We thank the reviewer for pointing this out. The labeling in the quantitative analysis shown in Figure 1B has been corrected in the revised manuscript to accurately reflect the accumulation of MC<sub>TC</sub>s in early granulomas and MC<sub>C</sub>s in late granulomas.

      (7) The labelling followed in Figures 3, 4 and S2 do not match with the discussion. Such errors should be rectified to minimize any ambiguity within the text of the manuscript.

      We thank the reviewer for noting this. The color coding has been corrected to ensure consistency across all figures.

      (8) The mast cell deficient mice model has a differential number of immune cells at the site of granuloma as reported in the manuscript. This could contribute to the altered mycobacterial survival and inflammation cytokine production in the lung and hence might not be a direct effect of mast cell depletion. The authors can consider reconstituting mast cell populations to analyze the mast cell function.

      We thank the reviewers for this suggestion. In the revised manuscript, we have adoptively transferred MCs into WT mice before Mtb challenge to assess if this would increase inflammation and Mtb CFU in the lung and spleen. Our results show that while lung inflammation was not impacted, we found that the dissemination to the spleen and the frequency of neutrophils in the lung were increased in WT mice that received MCs (Figure 5, lines 429-443).

      (9) Line 295- 297, the authors state "MCs continued to accumulate in the lung up to 100 dpi in CgKitWsh mice, following which the MC numbers decreased at later stages". However, the quantification in Figure 4A does not reflect the same. This should be addressed.

      In response to the reviewers' comments, we conducted a new analysis of lung MCs at baseline, comparing wild-type and MC-deficient mice. The revised data show that MC-deficient mice have fewer mast cells at baseline compared to B6 mice. Furthermore, mast cell numbers increase during infection, peaking at 100 days post-infection (dpi) and subsequently stabilize by 150 dpi. The revised data has been included in Figure 4A and text line 395-403.

      (10) Additionally, while the scRNAseq data reflects a lower production of TNF in pulmonary TB granulomas, the mice deficient in mast cells are discussed to have a lower production of proinflammatory cytokines.

      Mast cells increasing and contributing to the TB pathogenesis is the theme of the paper and as such we see and increase in the IFNG pathway genes and similar reduction in the production of pro- inflammatory cytokines. The relative decrease in the TNF pathway gene expression can be reconciled by the fact that less TNF gene expression in PTB could also represent loss of Mtb control and increased pathogenesis (Yuk et al., 2024), which is maintained in the LTBI/HC clusters. Higher bacterial burden of Mtb can also decrease the host TNF production, which is in line with what we observe here (Olsen et al., 2016, Reed et al., 2004, Kurtz et al., 2006).

      (11) The authors have not annotated Figure 2 I and J in the text while describing their results and interpretation.

      We thank the reviewer for noting this and the figure 2 has been revised and the results as pointed out have been added to the revised manuscript.

      (12) In line 284, the authors have discussed the results pertaining to Figure 3B, however, mentioned it as Figure 3A in the text.

      We thank the reviewer for noting this and the corrections have been made in the revised manuscript (lines 379-384).

      References

      ANDERSSON, C. K., ANDERSSON-SJOLAND, A., MORI, M., HALLGREN, O., PARDO, A., ERIKSSON, L., BJERMER, L., LOFDAHL, C. G., SELMAN, M., WESTERGREN-THORSSON, G. & ERJEFALT, J. S. 2011. Activated MCTC mast cells infiltrate diseased lung areas in cystic fibrosis and idiopathic pulmonary fibrosis. Respir Res, 12, 139.

      CILDIR, G., YIP, K. H., PANT, H., TERGAONKAR, V., LOPEZ, A. F. & TUMES, D. J. 2021. Understanding mast cell heterogeneity at single cell resolution. Trends Immunol, 42, 523-535.

      DERAKHSHAN, T., BOYCE, J. A. & DWYER, D. F. 2022. Defining mast cell differentiation and heterogeneity through single-cell transcriptomics analysis. J Allergy Clin Immunol, 150, 739-747.

      ESAULOVA, E., DAS, S., SINGH, D. K., CHORENO-PARRA, J. A., SWAIN, A., ARTHUR, L., RANGEL-MORENO, J., AHMED, M., SINGH, B., GUPTA, A., FERNANDEZ-LOPEZ, L. A., DE LA LUZ GARCIA-HERNANDEZ, M., BUCSAN, A., MOODLEY, C., MEHRA, S., GARCIA-LATORRE, E., ZUNIGA, J., ATKINSON, J., KAUSHAL, D., ARTYOMOV, M. N. & KHADER, S. A. 2021. The immune landscape in tuberculosis reveals populations linked to disease and latency. Cell Host Microbe, 29, 165-178 e8.

      GARCIA-RODRIGUEZ, K. M., BINI, E. I., GAMBOA-DOMINGUEZ, A., ESPITIA-PINZON, C. I., HUERTA-YEPEZ, S., BULFONE-PAUS, S. & HERNANDEZ-PANDO, R. 2021. Differential mast cell numbers and characteristics in human tuberculosis pulmonary lesions. Sci Rep, 11, 10687.

      GIDEON, H. P., HUGHES, T. K., TZOUANAS, C. N., WADSWORTH, M. H., 2ND, TU, A. A., GIERAHN, T. M., PETERS, J. M., HOPKINS, F. F., WEI, J. R., KUMMERLOWE, C., GRANT, N. L., NARGAN, K., PHUAH, J. Y., BORISH, H. J., MAIELLO, P., WHITE, A. G., WINCHELL, C. G., NYQUIST, S. K., GANCHUA, S. K. C., MYERS, A., PATEL, K. V., AMEEL, C. L., COCHRAN, C. T., IBRAHIM, S., TOMKO, J. A., FRYE, L. J., ROSENBERG, J. M., SHIH, A., CHAO, M., KLEIN, E., SCANGA, C. A., ORDOVAS-MONTANES, J., BERGER, B., MATTILA, J. T., MADANSEIN, R., LOVE, J. C., LIN, P. L., LESLIE, A., BEHAR, S. M., BRYSON, B., FLYNN, J. L., FORTUNE, S. M. & SHALEK, A. K. 2022. Multimodal profiling of lung granulomas in macaques reveals cellular correlates of tuberculosis control. Immunity, 55, 827846 e10.

      KAUSHAL, D., FOREMAN, T. W., GAUTAM, U. S., ALVAREZ, X., ADEKAMBI, T., RANGEL-MORENO, J., GOLDEN, N. A., JOHNSON, A. M., PHILLIPS, B. L., AHSAN, M. H., RUSSELL-LODRIGUE, K. E., DOYLE, L. A., ROY, C. J., DIDIER, P. J., BLANCHARD, J. L., RENGARAJAN, J., LACKNER, A. A., KHADER, S. A. & MEHRA, S. 2015. Mucosal vaccination with attenuated Mycobacterium tuberculosis induces strong central memory responses and protects against tuberculosis. Nat Commun, 6, 8533.

      KURTZ, S., MCKINNON, K. P., RUNGE, M. S., TING, J. P. & BRAUNSTEIN, M. 2006. The SecA2 secretion factor of Mycobacterium tuberculosis promotes growth in macrophages and inhibits the host immune response. Infect Immun, 74, 6855-64.

      OLSEN, A., CHEN, Y., JI, Q., ZHU, G., DE SILVA, A. D., VILCHEZE, C., WEISBROD, T., LI, W., XU, J., LARSEN, M., ZHANG, J., PORCELLI, S. A., JACOBS, W. R., JR. & CHAN, J. 2016. Targeting Mycobacterium tuberculosis Tumor Necrosis Factor Alpha-Downregulating Genes for the Development of Antituberculous Vaccines. mBio, 7.

      REED, M. B., DOMENECH, P., MANCA, C., SU, H., BARCZAK, A. K., KREISWIRTH, B. N., KAPLAN, G. & BARRY, C. E., 3RD 2004. A glycolipid of hypervirulent tuberculosis strains that inhibits the innate immune response. Nature, 431, 84-7.

      SHARAN, R., SINGH, D. K., RENGARAJAN, J. & KAUSHAL, D. 2021. Characterizing Early T Cell Responses in Nonhuman Primate Model of Tuberculosis. Front Immunol, 12, 706723.

      SINGH, D. K., AHMED, M., AKTER, S., SHIVANNA, V., BUCSAN, A. N., MISHRA, A., GOLDEN, N. A., DIDIER, P. J., DOYLE, L. A., HALL-URSONE, S., ROY, C. J., ARORA, G., DICK, E. J., JR., JAGANNATH, C., MEHRA, S., KHADER, S. A. & KAUSHAL, D. 2025. Prevention of tuberculosis in cynomolgus macaques by an attenuated Mycobacterium tuberculosis vaccine candidate. Nat Commun, 16, 1957.

      TAUBER, M., BASSO, L., MARTIN, J., BOSTAN, L., PINTO, M. M., THIERRY, G. R., HOUMADI, R., SERHAN, N., LOSTE, A., BLERIOT, C., KAMPHUIS, J. B. J., GRUJIC, M., KJELLEN, L., PEJLER, G., PAUL, C., DONG, X., GALLI, S. J., REBER, L. L., GINHOUX, F., BAJENOFF, M., GENTEK, R. & GAUDENZIO, N. 2023. Landscape of mast cell populations across organs in mice and humans. J Exp Med, 220.

      TRIVEDI, N. N., TONG, Q., RAMAN, K., BHAGWANDIN, V. J. & CAUGHEY, G. H. 2007. Mast cell alpha and beta tryptases changed rapidly during primate speciation and evolved from gamma-like transmembrane peptidases in ancestral vertebrates. J Immunol, 179, 6072-9.

      YUK, J. M., KIM, J. K., KIM, I. S. & JO, E. K. 2024. TNF in Human Tuberculosis: A Double-Edged Sword. Immune Netw, 24, e4.

    1. eLife Assessment

      This important study demonstrates a reduction in airway hyperresponsiveness (one of the mechanisms of allergic asthma) in the absence of IgM in a house dust mite-induced mouse model of allergic asthma. While this result suggests a new mechanistic role for IgM, the proposed new function is not as yet robustly supported by the current experiments and thus the evidence remains incomplete. A connection between the findings and human disease is not established so far, but the study will be interest to clinical immunologists.

    2. Reviewer #4 (Public review):

      Summary:

      The authors sought to determine the role of IgM in a house dust mite (HDM)-induced Th2 allergic model. Specifically, they examined the effect of IgM deficiency by comparing airway hyperresponsiveness (AHR) and Th2 immune responses between wild-type (WT) and IgM knockout (KO) mice exposed to HDM. They found and reported a reduction in AHR among the KO mice. This finding was followed by experiments investigating the role of IgM in airway smooth muscle (ASM) contraction using a human cell line, based on two genes that were reportedly differentially expressed between lung tissues from WT and IgM KO mice following HDM exposure.

      Strengths:

      Knocking out IgM produced a clear phenotype of reduced airway hyperresponsiveness (AHR), suggesting a previously unreported role for IgM in this process. The authors conducted extensive experiments to elucidate this novel role of IgM.

      Weaknesses:

      Although a few differentially expressed genes (DEGs) are reported between WT HDM vs. IgM KO HDM and WT PBS vs. IgM KO PBS, the principal component analysis (PCA) did not show any group-specific clustering based on these DEGs. This undermines the strength of the authors' reliance on these results as the foundation for subsequent experiments.

      Furthermore, if IgM does indeed have a demonstrable effect on airway smooth muscle (ASM), this could be more convincingly shown using in vitro muscle contraction assays with alternative methods.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review): 

      Summary:

      The authors of this study sought to define a role for IgM in responses to house dust mites in the lung. 

      Strengths: 

      Unexpected observation about IgM biology 

      Combination of experiments to elucidate function 

      Weaknesses: 

      Would love more connection to human disease 

      We thank the reviewer for these comments. At the time of this publication, we have not made a concrete link with human disease. While there is some anecdotal evidence of diseases such as Autoimmune glomerulonephritis, Hashimoto’s thyroiditis, Bronchial polyp, SLE, Celiac disease and other diseases in people with low IgM. Allergic disorders are also common in people with IgM deficiency, other studies have reported as high as 33-47%. The mechanisms for the high incidence of allergic diseases are unclear as generally, these patients have normal IgG and IgE levels. IgM deficiency may represent a heterogeneous spectrum of genetic defects, which might explain the heterogeneous nature of disease presentations.   

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Hadebe and colleagues describes a striking reduction in airway hyperresponsiveness in Igm-deficient mice in response to HDM, OVA and papain across the B6 and BALB-c backgrounds. The authors suggest that the deficit is not due to improper type 2 immune responses, nor an aberrant B cell response, despite a lack of class switching in these mice. Through RNA-Seq approaches, the authors identify few di]erences between the lungs of WT and Igm-deficient mice, but see that two genes involved in actin regulation are greatly reduced in IgM-deficient mice. The authors target these genes by CRISPR-Cas9 in in vitro assays of smooth muscle cells to show that these may regulate cell contraction. While the study is conceptually interesting, there are a number of limitations, which stop us from drawing meaningful conclusions. 

      Strengths:

      Fig. 1. The authors clearly show that IgMKO mice have striking reduced AHR in the HDM model, despite the presence of a good cellular B cell response. 

      Weaknesses: 

      Fig. 2. The authors characterize the cd4 t cell response to HDM in IGMKO mice.They have restimulated medLN cells with antiCD3 for 5 days to look for IL-4 and IL-13, and find no discernible di]erence between WT and KO mice. The absence of PBStreated WT and KO mice in this analysis means it is unclear if HDM-challenged mice are showing IL-4 or IL-13 levels above that seen at baseline in this assay. 

      We thank the Reviewer for this comment. We would like to mention that a very minimal level of IL-4 and IL-13 in PBS mice was detected. We have indicated with a dotted line on the Figure 2B to show levels in unstimulated or naïve cytokines. Please see Author response image 1 below from anti-CD3 stimulated cytokine ELISA data. The levels of these cytokines are very low (not detectable) and are not changed in control WT and IgM- KO mice challenge with PBS, this is also true for PMA/ionomycin-stimulated cells

      Author response image 1.

      The choice of 5 days is strange, given that the response the authors want to see is in already primed cells. A 1-2 day assay would have been better. 

      We agree with the reviewer that a shorter stimulation period would work. Over the years we have settled for 5-day re-stimulation for both anti-CD3 and HDM. We have tried other time points, but we consistently get better secretion of cytokines after 5 days. 

      It is concerning that the authors state that HDM restimulation did not induce cytokine production from medLN cells, since countless studies have shown that restimulation of medLN would induce IL-13, IL-5 and IL-10 production from medLN. This indicates that the sensitization and challenge model used by the authors is not working as it should. 

      We thank the reviewer for this observation. In our recent paper showing how antigen load a]ects B cell function, we used very low levels of HDM to sensitise and challenge mice (1 ug and 3 ug respectively). See below article, Hadebe et al., 2021 JACI. This is because Labs that have used these low HDM levels also suggested that antigen load impacts B cell function, especially in their role in germinal centres. We believe the reason we see low or undetectable levels of cytokines is because of this low antigen load sensitisation and challenge. In other manuscripts we have published or about to publish, we have shown that normal HDM sensitisation load (1 ug or 100 ug) and challenge (10 ug) do induce cytokine release upon restimulation with HDM. See the below article by Khumalo et al, 2020 JCI Insight (Figure 4A).

      Sabelo Hadebe*, Jermaine Khumalo, Sandisiwe Mangali, Nontobeko Mthembu, Hlumani Ndlovu, Amkele Ngomti, Martyna Scibiorek, Frank Kirstein, Frank Brombacher*. Deletion of IL-4Ra signalling on B cells limits hyperresponsiveness depending on antigen load. doi.org/10.1016/j.jaci.2020.12.635).

      Jermaine Khumalo, Frank Kirstein, Sabelo Hadebe*, Frank Brombacher*. IL-4Rα signalling in regulatory T cells is required for dampening allergic airway inflammation through inhibition of IL-33 by type 2 innate lymphoid cells. JCI Insight. 2020 Oct 15;5(20):e136206. doi: 10.1172/jci.insight.136206

      The IL-13 staining shown in panel c is also not definitive. One should be able to optimize their assays to achieve a better level of staining, to my mind. 

      We agree with the reviewer that much higher IL-13-producing CD4 T cells should be observed. We don’t think this is a technical glitch or non-optimal set-up as we see much higher levels of IL-13-producing CD4 T cells when using higher doses of HDM to sensitise and challenge, say between 7 -20% in WT mice (see Author response image 2 of lung stimulated with PMA/ionomycin+Monensin, please note this is for illustration purposes only and it not linked to the current manuscript, its merely to demonstrate a point from other experiments we have conducted in the lab).

      Author response image 2.

      In d-f, the authors perform a serum transfer, but they only do this once. The half life of IgM is quite short. The authors should perform multiple naïve serum transfers to see if this is enough to induce FULL AHR. 

      We thank the reviewer for this comment. We apologise if this was not clear enough on the Figure legend and method, we did transfer serum 3x, a day before sensitisation, on the day of sensitisation and a day before the challenge to circumvent the short life of IgM. In our subsequent experiments, we have now used busulfan to deplete all bone marrow in IgM-deficient mice and replace it with WT bone marrow and this method restores AHR (Figure 3B).

      This now appears in line 515 to 519 and reads

      Adoptive transfer of naïve serum

      Naïve wild-type mice were euthanised and blood was collected via cardiac puncture before being spun down (5500rpm, 10min, RT) to collect serum. Serum (200µL) was injected intraperitoneally into IgM-deficient mice. Serum was injected intraperitoneally at day -1, 0, and a day before the challenge with HDM (day 10).

      The presence of negative values of total IgE in panel F would indicate some errors in calculation of serum IgE concentrations. 

      We thank the reviewer for this observation. For better clarity, we have now indicated these values as undetected in Figure 2F, as they were below our detection limit.

      Overall, it is hard to be convinced that IgM-deficiency does not lead to a reduction in Th2 inflammation, since the assays appear suboptimal. 

      We disagree with the reviewer in this instance, because we have shown in 3 di]erent models and in 2 di]erent strains and 2 doses of HDM (high and low) that no matter what you do, Th2 remains intact. Our reason for choosing low dose HDM was based on our previous work and that of others, which showed that depending on antigen load, B cells can either be redundant or have functional roles. Since our interest was to tease out the role of B cells and specifically IgM, it was important that we look at a scenario where B cells are known to have a function (low antigen load). We did find similar findings at high dose of HDM load, but e]ects on AHR were not as strong, but Th2 was not changed, in fact in some instances Th2 was higher in IgM-deficient mice.

      Fig. 3. Gene expression di]erences between WT and KO mice in PBS and HDM challenged settings are shown. PCA analysis does not show clear di]erences between all four groups, but genes are certainly up and downregulated, in particular when comparing PBS to HDM challenged mice. In both PBS and HDM challenged settings, three genes stand out as being upregulated in WT v KO mice. these are Baiap2l1, erdr1 and Chil1. 

      Noted

      Fig. 4. The authors attempt to quantify BAIAP2L1 in mouse lungs. It is di]icult to know if the antibody used really detects the correct protein. A BAIAP2L1-KO is not used as a control for staining, and I am not sure if competitive assays for BAIAP2L1 can be set up. The flow data is not convincing. The immunohistochemistry shows BAIAP2L1 (in red) in many, many cells, essentially throughout the section. There is also no discernible di]erence between WT and KO mice, which one might have expected based on the RNA-Seq data. So, from my perspective, it is hard to say if/where this protein is located, and whether there truly exists a di]erence in expression between wt and ko mice. 

      We thank the reviewer for this comment. We are certain that the antibody does detect BAIAP2L1, we have used it in 3 assays, which we admit may show varying specificities since it’s a Polyclonal antibody. However, in our western blot (Figure 5A), the antibody detects a band at 56.7kDa, apart from what we think are isoforms. We agree that BAIAP2L1 is expressed by many cell types, including CD45+ cells and alpha smooth muscle negative cells and we show this in our Figure 5 – figure supplement 1A and B. Where we think there is a di]erence in expression between WT and IgM-deficient mice is in alpha-smooth muscle-positive cells. We have tested antibodies from di]erent companies (Proteintech and Abcam), and we find similar findings. We do not have access to BAIAP2L1 KO mice and to test specificity, we have also used single stain controls with or without secondary antibody and isotype control which show no binding in western blot and Immunofluorescence assays and Fluorescence minus one antibody in Flow cytometry, so that way we are convinced that the signal we are seeing is specific to BAIAP2L1.

      Here we have also added additional Flow cytometry images using anti-BAIAP2L1 (clone 25692-1-AP) from Proteintech

      Author response image 3.

      Figure similar to Figure 5C and Figure 5 -figure supplement 1A and B.

      Fig. 5 and 6. The authors use a single cell contractility assay to measure whether BAIAP2L1 and ERDR1 impact on bronchial smooth muscle cell contractility. I am not familiar with the assay, but it looks like an interesting way of analysing contractility at the single cell level.

      The authors state that targeting these two genes with Cas9gRNA reduces smooth muscle cell contractility, and the data presented for contractility supports this observation. However, the e]iciency of Cas9-mediated deletion is very unclear. The authors present a PCR in supp fig 9c as evidence of gene deletion, but it is entirely unclear with what e]iciency the gene has been deleted. One should use sequencing to confirm deletion. Moreover, if the antibody was truly working, one should be able to use the antibody used in Fig 4 to detect BAIAP2L1 levels in these cells. The authors do not appear to have tried this. 

      We thank the reviewer for these observations. We are in a process to optimise this using new polyclonal BAIAP2L1 antibodies from other companies, since the one we have tried doesn’t seem to work well on human cells via western blot. So hopefully in our new version, we will be able to demonstrate this by immunofluorescence or western blot.

      Other impressions: 

      The paper is lacking a link between the deficiency of IgM and the e]ects on smooth muscle cell contraction. 

      The levels of IL-13 and TNF in lavage of WT and IGMKO mice could be analysed. 

      We have measured Th2 cytokine IL-13 in BAL fluid and found no di]erences between IgM-deficient mice and WT mice challenged with HDM (Author response image 4 below). We could not detected TNF-alpha in the BAL fluid, it was below detection limit.

      Figure legend. IL-13 levels are not changed in IgM-deficient mice in the lung. Bronchoalveolar lavage fluid in WT or IgM-deficient mice sensitised and challenged with HDM. TNF-a levels were below the detection limit.

      Author response image 4.

      Moreover, what is the impact of IgM itself on smooth muscle cells? In the Fig. 7 schematic, are the authors proposing a direct role for IgM on smooth muscle cells? Does IgM in cell culture media induce contraction of SMC? This could be tested and would be interesting, to my mind. 

      We thank the Reviewer for these comments. We are still trying to test this, unfortunately, we have experienced delays in getting reagents such as human IgM to South Africa. We hope that we will be able to add this in our subsequent versions of the article. We agree it is an interesting experiment to do even if not for this manuscript but for our general understanding of this interaction at least in an in vitro system.

      Reviewer #3 (Public Review): 

      Summary: 

      This paper by Sabelo et al. describes a new pathway by which lack of IgM in the mouse lowers bronchial hyperresponsiveness (BHR) in response to metacholine in several mouse models of allergic airway inflammation in Balb/c mice and C57/Bl6 mice. Strikingly, loss of IgM does not lead to less eosinophilic airway inflammation, Th2 cytokine production or mucus metaplasia, but to a selective loss of BHR. This occurs irrespective of the dose of allergen used. This was important to address since several prior models of HDM allergy have shown that the contribution of B cells to airway inflammation and BHR is dose dependent. 

      After a description of the phenotype, the authors try to elucidate the mechanisms. There is no loss of B cells in these mice. However, there is a lack of class switching to IgE and IgG1, with a concomitant increase in IgD. Restoring immunoglobulins with transfer of naïve serum in IgM deficient mice leads to restoration of allergen-specific IgE and IgG1 responses, which is not really explained in the paper how this might work. There is also no restoration of IgM responses, and concomitantly, the phenotype of reduced BHR still holds when serum is given, leading authors to conclude that the mechanism is IgE and IgG1 independent. Wild type B cell transfer also does not restore IgM responses, due to lack of engraftment of the B cells. Next authors do whole lung RNA sequencing and pinpoint reduced BAIAP2L1 mRNA as the culprit of the phenotype of IgM-/- mice. However, this cannot be validated fully on protein levels and immunohistology since di]erences between WT and IgM KO are not statistically significant, and B cell and IgM restoration are impossible. The histology and flow cytometry seems to suggest that expression is mainly found in alpha smooth muscle positive cells, which could still be smooth muscle cells or myofibroblasts. Next therefore, the authors move to CRISPR knock down of BAIAP2L1 in a human smooth muscle cell line, and show that loss leads to less contraction of these cells in vitro in a microscopic FLECS assay, in which smooth muscle cells bind to elastomeric contractible surfaces. 

      Strengths: 

      (1) There is a strong reduction in BHR in IgM-deficient mice, without alterations in B cell number, disconnected from e]ects on eosinophilia or Th2 cytokine production.

      (2) BAIAP2L1 has never been linked to asthma in mice or humans 

      Weaknesses: 

      (1) While the observations of reduced BHR in IgM deficient mice are strong, there is insu]icient mechanistic underpinning on how loss of IgM could lead to reduced expression of BAIAP2L1. Since it is impossible to restore IgM levels by either serum or B cell transfer and since protein levels of BAIAP2L1 are not significantly reduced, there is a lack of a causal relationship that this is the explanation for the lack of BHR in IgMdeficient mice. The reader is unclear if there is a fundamental (maybe developmental) di]erence in non-hematopoietic cells in these IgM-deficient mice (which might have accumulated another genetic mutation over the years). In this regard, it would be important to know if littermates were newly generated, or historically bred along with the KO line. 

      We thank the reviewer for asking this question and getting us to think of this in a di]erent way. This prompted us to use a di]erent method to try and restore IgM function and since our animal facility no longer allows irradiation, we opted for busulfan. We present this data as new data in Figure 3. We had to go back and breed this strain and then generated bone marrow chimeras. What we have shown now with chimeras is that if we can deplete bone marrow from IgM-deficient mice and replace it with congenic WT bone marrow when we allow these mice to rest for 2 months before challenge with HDM (Figure 3 -figure supplement 1A-C) We also show that AHR (resistance and elastance) is partially restored in this way (Figure 3A and B) as mice that receive congenic WT bone marrow after chemical irradiation can mount AHR and those that receive IgM-deficient bone marrow, can’t mount AHR upon challenge with HDM. If the mice had accumulated an unknown genetic mutation in non-hematopoietic cells, the transfer of WT bone marrow would not make a di]erence. So, we don’t believe the colony could have gained a mutation that we are unaware of. We have also shipped these mice to other groups and in their hands, this strains still only behaves as an IgM only knockout mice. See their publication below.

      Mark Noviski, James L Mueller, Anne Satterthwaite, Lee Ann Garrett-Sinha, Frank Brombacher, Julie Zikherman 2018. IgM and IgD B cell receptors di]erentially respond to endogenous antigens and control B cell fate. eLife 2018;7:e35074. DOI: https://doi.org/10.7554/eLife.35074

      we have also added methods for bone marrow chimaeras and added results sections and new Figures related to these methods.

      Methods appear in line 521-532 of the untracked version of the article.

      Busulfan Bone marrow chimeras

      WT (CD45.2) and IgM<sup>-/-</sup> (CD45.2) congenic mice were treated with 25 mg/kg busulfan (Sigma-Aldrich, Aston Manor, South Africa) per day for 3 consecutive days (75 mg/kg in total) dissolved in 10% DMSO and Phosphate bu]ered saline (0.2mL, intraperitoneally) to ablate bone marrow cells. Twenty-four hours after last administration of busulfan, mice were injected intravenously with fresh bone marrow (10x10<sup>6</sup> cells, 100µL) isolated from hind leg femurs of either WT (CD45.1) or IgM<sup>-/-</sup> mice [33]. Animals were then allowed to complement their haematopoietic cells for 8 weeks. In some experiments the level of bone marrow ablation was assessed 4 days post-busulfan treatment in mice that did not receive donor cells. At the end of experiment level of complemented cells were also assessed in WT and IgM<sup>-/-</sup> mice that received WT (CD45.1) bone marrow. 

      Results appear in line 198-228 of the untracked version of the article

      Replacement of IgM-deficient mice with functional hematopoietic cells in busulfan mice chimeric mice restores airway hyperresponsiveness.

      We then generated bone marrow chimeras by chemical radiation using busulfan (Montecino-Rodriguez and Dorshkind, 2020). We treated mice three times with busulfan for 3 consecutive days and after 24 hrs transferred naïve bone marrow from congenic CD45.1 WT mice or CD45.2 IgM KO mice (Figure 3A and Figure 3 -figure supplement 1A). We showed that recipient mice that did not receive donor bone marrow after 4 days post-treatment had significantly reduced lineage markers (CD45<sup>+</sup>Sca-1<sup>+</sup>) or lineage negative (Lin<sup>-</sup>) cells in the bone marrow when compared to untreated or vehicle (10% DMSO) treated mice (Figure 3 -figure supplements 1B-C). We allowed mice to reconstitute bone marrow for 8 weeks before sensitisation and challenge with low dose HDM (Figure 3A). We showed that WT (CD45.2) recipient mice that received WT (CD45.1) donor bone marrow had higher airway resistance and elastance and this was comparable to IgM KO (CD45.2) recipient mice that received donor WT (CD45.1) bone marrow (Figure 3B). As expected, IgM KO (CD45.2) recipient mice that received donor IgM KO (CD45.2) bone marrow had significantly lower AHR compared to WT (CD45.2) or IgM KO (CD45.2) recipient mice that received WT (CD45.1) bone marrow (Figure 3B). We confirmed that the di]erences observed were not due to di]erences in bone marrow reconstitution as we saw similar frequencies of CD45.1 cells within the lymphocyte populations in the lungs and other tissues (Figure 3 -figure supplement 1D). We observed no significant changes in the lung neutrophils, eosinophils, inflammatory macrophages, CD4 T cells or B cells in WT or IgM KO (CD45.2) recipient mice that received donor WT (CD45.1/CD45.2) or IgM KO (CD45.2) bone marrow when sensitised and challenged with low dose HDM (Figure 3C).

      Restoring IgM function through adoptive reconstitution with congenic CD45.1 bone marrow in non-chemically irradiated recipient mice or sorted B cells into IgM KO mice (Figure 2 -figure supplement 1A) did not replenish IgM B cells to levels observed in WT mice and as a result did not restore AHR, total IgE and IgM in these mice (Figure 2 -figure supplements 1B-C). 

      The 2 new figures are Figure 3 which moved the rest of the Figures down and Figure 3- figure supplement 1AD), which also moved the rest of the supplementary figures down.

      Discussion appears in line 410-419 of the untracked version of the article.To resolve other endogenous factors that could have potentially influenced reduced AHR in IgM-deficient mice, we resorted to busulfan chemical irradiation to deplete bone marrow cells in IgM-deficient mice and replace bone marrow with WT bone marrow. While it is well accepted that busulfan chemical irradiation partially depletes bone marrow cells, in our case it was not possible to pursue other irradiation methods due to changes in ethical regulations and that fact that mice are slow to recover after gamma rays irradiation. Busulfan chemical irradiation allowed us to show that we could mostly restore AHR in IgM-deficient recipient mice that received donor WT bone marrow when challenged with low dose HDM.

      (2) There is no mention of the potential role of complement in activation of AHR, which might be altered in IgM-deficient mice   

      We thank the reviewer for this comment. We have not directly looked at complement in this instance, however, from our previous work on C3 knockout mice, there have been comparable AHR to WT mice under the HDM challenge.

      (3) What is the contribution of elevated IgD in the phenotype of the IgM-deficient mice. It has been described by this group that IgD levels are clearly elevated 

      We thank the reviewer for this question. We believe that IgD is essentially what drives partial class switching to IgG, we certainly have shown that in the case of VSV virus and Trypanosoma congolense and Trypanosoma brucei brucei that elevated IgD drive delayed but e]ective IgG in the absence of IgM (Lutz et al, 2001, Nature). This is also confirmed by Noviski et al., 2018 eLife study where they show that both IgM and IgD do share some endogenous antigens, so its likely that external antigens can activate IgD in a similar manner to prompt class switching.

      (4) How can transfer of naïve serum in class switching deficient IgM KO mice lead to restoration of allergen specific IgE and IgG1? 

      We thank the Reviewer for these comments, we believe that naïve sera transferred to IgM deficient mice is able to bind to the surface of B cells via IgM receptors (FcμR / Fcα/μR), which are still present on B cells and this is su]icient to facilitate class switching. Our IgM KO mouse lacks both membrane-bound and secreted IgM, and transferred serum contains at least secreted IgM which can bind to surfaces via its Fc portion. We measured HDM-specific IgE and we found very low levels, but these were not di]erent between WT and IgM KO adoptively transferred with WT serum. We also detected HDM-specific IgG1 in IgM KO transferred with WT sera to the same level as WT, confirming a possible class switching, of course, we can’t rule out that transferred sera also contains some IgG1. We also can’t rule out that elevated IgD levels can partially be responsible for class switched IgG1 as discussed above.

      In the discussion line 463-464, we also added the following

      “We speculate that IgM can directly activate smooth muscle cells by binding a number of its surface receptors including FcμR, Fcα/μR and pIgR (Liu et al., 2019; Nguyen et al., 2017b; Shibuya et al., 2000). IgM binds to FcμR strictly, but shares Fcα/μR and pIgR with IgA (Liu et al., 2019; Michaud et al., 2020; Nguyen et al., 2017b). Both Fcα/μR and pIgR can be expressed by non-structural cells at mucosal sites (Kim et al., 2014; Liu et al., 2019). We would not rule out that the mechanisms of muscle contraction might be through one of these IgM receptors, especially the ones expressed on smooth muscle cells(Kim et al., 2014; Liu et al., 2019). Certainly, our future studies will be directed towards characterizing the mechanism by which IgM potentially activates the smooth muscle.”

      We have discussed this section under Discussion section, line 731 to 757. In addition, since we have now performed bone marrow chimaeras we have further added the following in our discussion in line 410-419.

      To resolve other endogenous factors that could have potentially influenced reduced AHR in IgM-deficient mice, we resorted to busulfan chemical irradiation to deplete bone marrow cells in IgM-deficient mice and replace bone marrow with WT bone marrow. While it is well accepted that busulfan chemical irradiation partially depletes bone marrow cells, in our case it was not possible to pursue other irradiation methods due to changes in ethical regulations and that fact that mice are slow to recover after gamma rays irradiation. Busulfan chemical irradiation allowed us to show that we could mostly restore AHR in IgM-deficient recipient mice that received donor WT bone marrow when challenged with low dose HDM. 

      We removed the following lines, after performing bone marrow chimaeras since this changed some aspects. 

      Our efforts to adoptively transfer wild-type bone marrow or sorted B cells into IgMdeficient mice were also largely unsuccessful partly due to poor engraftment of wildtype B cells into secondary lymphoid tissues. Natural secreted IgM is mainly produced by B1 cells in the peritoneal cavity, and it is likely that any transfer of B cells via bone marrow transfer would not be su]icient to restore soluble levels of IgM<sup>3,10</sup>.

      (5) lpha smooth muscle antigen is also expressed by myofibroblasts. This is insu]iciently worked out. The histology mentions "expression in cells in close contact with smooth muscle". This needs more detail since it is a very vague term. Is it in smooth muscle or in myofibroblasts. 

      We appreciate that alpha-smooth muscle actin-positive cells are a small fraction in the lung and even within CD45 negative cells, but their contribution to airway hyperresponsiveness is major. We also concede that by immunofluorescence BAIAP2L1 seems to be expressed by cells adjacent to alpha-smooth muscle actin (Figure 5B), however, we know that cells close to smooth muscle (such as extracellular matrix and myofibroblasts) contribute to its hypertrophy in allergic asthma.

      James AL, Elliot JG, Jones RL, Carroll ML, Mauad T, Bai TR, et al. Airway Smooth Muscle Hypertrophy and Hyperplasia in Asthma. Am J Respir Crit Care Med [Internet]. 2012; 185:1058–64. Available from: https://doi.org/10.1164/rccm.201110-1849OC

      (6) Have polymorphisms in BAIAP2L1 ever been linked to human asthma? 

      No, we have looked in asthma GWAS studies, at least summary statistics and we have not seen any SNPs that could be associated with human asthma.

      (7) IgM deficient patients are at increased risk for asthma. This paper suggests the opposite. So the translational potential is unclear 

      We thank the reviewer for these comments. At the time of this publication, we have not made a concrete link with human disease. While there is some anecdotal evidence of diseases such as Autoimmune glomerulonephritis, Hashimoto’s thyroiditis, Bronchial polyp, SLE, Celiac disease and other diseases in people with low IgM. Allergic disorders are also common in people with IgM deficiency as the reviewer correctly points out, other studies have reported as high as 33-47%. The mechanisms for the high incidence of allergic diseases are unclear as generally, these patients have normal or higher IgG and IgE levels. IgM deficiency may represent a heterogeneous spectrum of genetic defects, which might explain the heterogeneous nature of disease presentations.

    1. eLife Assessment

      This study used deep neural networks (DNN) to reconstruct voice information (viz., speaker identity), from fMRI responses in the auditory cortex and temporal voice areas, and assessed the representational content in these areas with decoding. A DNN-derived feature space approximated the neural representation of speaker identity-related information. The findings are valuable and the approach solid, yielding insight into how a specific model architecture can be used to relate the latent spaces of neural data and auditory stimuli to each other.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained a variational autoencoder (VAE) to create a high-dimensional "voice latent space" (VLS) using extensive voice samples, and analyzed how this space corresponds to brain activity through fMRI studies focusing on the temporal voice areas (TVAs). Their analyses included encoding and decoding techniques, as well as representational similarity analysis (RSA), which showed that the VLS could effectively map onto and predict brain activity patterns, allowing for the reconstruction of voice stimuli that preserve key aspects of speaker identity.

      Strengths:

      This paper is well-written and easy to follow. Most of the methods and results were clearly described. The authors combined a variety of analytical methods in neuroimaging studies, including encoding, decoding, and RSA. In addition to commonly used DNN encoding analysis, the authors performed DNN decoding and resynthesized the stimuli using VAE decoders. Furthermore, in addition to machine learning classifiers, the authors also included human behavioral tests to evaluate the reconstruction performance.

      Weaknesses:

      This manuscript presents a variational autoencoder (VAE) model to study voice identity representations from brain activity. While the model's ability to preserve speaker identity is expected due to its reconstruction objective, its broader utility remains unclear. Specifically, the VAE is not benchmarked against state-of-the-art speech models such as Wav2Vec2, HuBERT, or Whisper, which have demonstrated strong performance on standard speech tasks and alignment with cortical responses. Without comparisons on downstream tasks like automatic speech recognition (ASR) or phoneme classification, it is difficult to assess the relevance or advantages of the VLS representation.

      Furthermore, the neural basis of the observed correlations between VLS and brain activity is not well characterized. It remains unclear whether the VLS aligns with high-level abstract identity representations or lower-level acoustic features like pitch. Prior studies (e.g., Tang et al., Science 2017; Feng et al., NeuroImage 2021) have shown both types of coding in STG. The experimental design also does not clarify whether speech content was controlled across speakers, raising concerns about confounding acoustic-phonetic features. For example, PC2 in Figure 1b appears to reflect absolute pitch height, suggesting that identity decoding may partly rely on simpler acoustic cues. A more detailed analysis of the representational content of VLS would strengthen the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Lamothe et al. collected fMRI responses to many voice stimuli in 3 subjects. The authors trained two different autoencoders on voice audio samples and predicted latent space embeddings from the fMRI responses, allowing the voice spectrograms to be reconstructed. The degree to which reconstructions from different auditory ROIs correctly represented speaker identity, gender or age was assessed by machine classification and human listener evaluations. Complementing this, the representational content was also assessed using representational similarity analysis. The results broadly concur with the notion that temporal voice areas are sensitive to different types of categorical voice information.

      Strengths:

      The single-subject approach that allow thousands of responses to unique stimuli to be recorded and analyzed is powerful. The idea of using this approach to probe cortical voice representations is strong and the experiment is technically solid.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Lamothe et al. sought to identify the neural substrates of voice identity in the human brain by correlating fMRI recordings with the latent space of a variational autoencoder (VAE) trained on voice spectrograms. They used encoding and decoding models, and showed that the "voice" latent space (VLS) of the VAE performs, in general, (slightly) better than a linear autoencoder's latent space. Additionally, they showed dissociations in the encoding of voice identity across the temporal voice areas.

      Strengths:

      The geometry of the neural representations of voice identity has not been studied so far. Previous studies on the content of speech and faces in vision suggest that such geometry could exist. This study demonstrates this point systematically, leveraging a specifically trained variational autoencoder.

      The size of the voice dataset and the length of the fMRI recordings ensure that the findings are robust.

      Comments on revisions:

      The authors addressed my previous recommendations.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors trained a variational autoencoder (VAE) to create a high-dimensional "voice latent space" (VLS) using extensive voice samples, and analyzed how this space corresponds to brain activity through fMRI studies focusing on the temporal voice areas (TVAs). Their analyses included encoding and decoding techniques, as well as representational similarity analysis (RSA), which showed that the VLS could effectively map onto and predict brain activity patterns, allowing for the reconstruction of voice stimuli that preserve key aspects of speaker identity.

      Strengths:

      This paper is well-written and easy to follow. Most of the methods and results were clearly described. The authors combined a variety of analytical methods in neuroimaging studies, including encoding, decoding, and RSA. In addition to commonly used DNN encoding analysis, the authors performed DNN decoding and resynthesized the stimuli using VAE decoders. Furthermore, in addition to machine learning classifiers, the authors also included human behavioral tests to evaluate the reconstruction performance.

      Weaknesses:

      This manuscript presents a variational autoencoder (VAE) to evaluate voice identity representations from brain recordings. However, the study's scope is limited by testing only one model, leaving unclear how generalizable or impactful the findings are. The preservation of identity-related information in the voice latent space (VLS) is expected, given the VAE model's design to reconstruct original vocal stimuli. Nonetheless, the study lacks a deeper investigation into what specific aspects of auditory coding these latent dimensions represent. The results in Figure 1c-e merely tested a very limited set of speech features. Moreover, there is no analysis of how these features and the whole VAE model perform in standard speech tasks like speech recognition or phoneme recognition. It is not clear what kind of computations the VAE model presented in this work is capable of. Inclusion of comparisons with state-of-the-art unsupervised or self-supervised speech models known for their alignment with auditory cortical responses, such as Wav2Vec2, HuBERT, and Whisper, would strengthen the validation of the VAE model and provide insights into its relative capabilities and limitations.

      The claim that the VLS outperforms a linear model (LIN) in decoding tasks does not significantly advance our understanding of the underlying brain representations. Given the complexity of auditory processing, it is unsurprising that a nonlinear model would outperform a simpler linear counterpart. The study could be improved by incorporating a comparative analysis with alternative models that differ in architecture, computational strategies, or training methods. Such comparisons could elucidate specific features or capabilities of the VLS, offering a more nuanced understanding of its effectiveness and the computational principles it embodies. This approach would allow the authors to test specific hypotheses about how different aspects of the model contribute to its performance, providing a clearer picture of the shared coding in VLS and the brain.

      The manuscript overlooks some crucial alternative explanations for the discriminant representation of vocal identity. For instance, the discriminant representation of vocal identity can be either a higher-level abstract representation or a lower-level coding of pitch height. Prior studies using fMRI and ECoG have identified both types of representation within the superior temporal gyrus (STG) (e.g., Tang et al., Science 2017; Feng et al., NeuroImage 2021). Additionally, the methodology does not clarify whether the stimuli from different speakers contained identical speech content. If the speech content varied across speakers, the approach of averaging trials to obtain a mean vector for each speaker-the "identity-based analysis"-may not adequately control for confounding acoustic-phonetic features. Notably, the principal component 2 (PC2) in Figure 1b appears to correlate with absolute pitch height, suggesting that some aspects of the model's effectiveness might be attributed to simpler acoustic properties rather than complex identity-specific information.

      Methodologically, there are issues that warrant attention. In characterizing the autoencoder latent space, the authors initialized logistic regression classifiers 100 times and calculated the tstatistics using degrees of freedom (df) of 99. Given that logistic regression is a convex optimization problem typically converging to a global optimum, these multiple initializations of the classifier were likely not entirely independent. Consequently, the reported degrees of freedom and the effect size estimates might not accurately reflect the true variability and independence of the classifier outcomes. A more careful evaluation of these aspects is necessary to ensure the statistical robustness of the results.

      We thank Reviewer #1 for their thoughtful and constructive comments. Below, we address the key points raised:

      New comparitive models. We agree there are still many open questions on the structure of the VLS and the specific aspects of auditory coding that its latent dimensions represent. The features tested in Figure 1c-e are not speech features, but aspects related to speaker identity: age, gender and unique identity. Nevertheless we agree the VLS could be compared to recent speech models (not available when we started this project): we have now included comparisons with Wav2Vec and HuBERT in the encoding section (new Figure 2-S3). The comparison of encoding results based on LIN, the VLS, Wav2Vec and HuBERT (new Fig2S3) indicates no clear superiority of one model over the others; rather, different sets of voxels are better explained by the different models. Interestingly all four models yielded best encoding results for the m and a TVA, indicating some consistency across models.

      On decoding directly from spectrograms. We have now added decoding results obtained directly from spectrograms, as requested in the private review. These are presented in the revised Figure 4, and allow for comparison with the LIN- and VLS-based reconstructions. As noted, spectrogram-based reconstructions sounded less vocal-like and faithful to the original, confirming that the latent spaces capture more abstract and cerebral-like voice representations.

      On the number and length of stimuli. The rationale for using a large number of brief, randomly spliced speech excerpts from different languages was to extract identity features independent of specific linguistic cues. Indeed, the PC2 could very well correlate with pitch; we were not able to extract reliable f0 information from the thousands of brief stimuli, many of which are largely inharmonic (e.g., fricatives), such that this assumption could not be tested empirically. But it would be relevant that the weight of PC2 correlates with pitch: although the average fundamental frequency of phonation is not a linguistic cue, it is a major acoustical feature differentiating speaker identities.

      Statistics correction.  To address the issue of potential dependence between multiple runs of logistic regression, we replaced our previous analysis with a Wilcoxon signedrank test comparing decoding accuracies to chance. The results remain significant across classifications, and the revised figure and text reflect this change.

      Reviewer #2 (Public Review):

      Summary:

      Lamothe et al. collected fMRI responses to many voice stimuli in 3 subjects. The authors trained two different autoencoders on voice audio samples and predicted latent space embeddings from the fMRI responses, allowing the voice spectrograms to be reconstructed. The degree to which reconstructions from different auditory ROIs correctly represented speaker identity, gender, or age was assessed by machine classification and human listener evaluations. Complementing this, the representational content was also assessed using representational similarity analysis. The results broadly concur with the notion that temporal voice areas are sensitive to different types of categorical voice information.

      Strengths:

      The single-subject approach that allows thousands of responses to unique stimuli to be recorded and analyzed is powerful. The idea of using this approach to probe cortical voice representations is strong and the experiment is technically solid.

      Weaknesses:

      The paper could benefit from more discussion of the assumptions behind the reconstruction analyses and the conclusions it allows. The authors write that reconstruction of a stimulus from brain responses represents 'a robust test of the adequacy of models of brain activity' (L138). I concur that stimulus reconstruction is useful for evaluating the nature of representations, but the notion that they can test the adequacy of the specific autoencoder presented here as a model of brain activity should be discussed at more length. Natural sounds are correlated in many feature dimensions and can therefore be summarized in several ways, and similar information can be read out from different model representations. Models trained to reconstruct natural stimuli can exploit many correlated features and it is quite possible that very different models based on different features can be used for similar reconstructions. Reconstructability does not by itself imply that the model is an accurate brain model. Non-linear networks trained on natural stimuli are arguably not tested in the same rigorous manner as models built to explicitly account for computations (they can generate predictions and experiments can be designed to test those predictions). While it is true that there is increasing evidence that neural network embeddings can predict brain data well, it is still a matter of debate whether good predictability by itself qualifies DNNs as 'plausible computational models for investigating brain processes' (L72). This concern is amplified in the context of decoding and naturalistic stimuli where many correlated features can be represented in many ways. It is unclear how much the results hinge on the specificities of the specific autoencoder architectures used. For instance, it would be useful to know the motivations for why the specific VAE used here should constitute a good model for probing neural voice representations.

      Relatedly, it is not clear how VAEs as generative models are motivated as computational models of voice representations in the brain. The task of voice areas in the brain is not to generate voice stimuli but to discriminate and extract information. The task of reconstructing an input spectrogram is perhaps useful for probing information content, but discriminative models, e.g., trained on the task of discriminating voices, would seem more obvious candidates. Why not include discriminatively trained models for comparison?

      The autoencoder learns a mapping from latent space to well-formed voice spectrograms. Regularized regression then learns a mapping between this latent space and activity space. All reconstructions might sound 'natural', which simply means that the autoencoder works. It would be good to have a stronger test of how close the reconstructions are to the original stimulus. For instance, is the reconstruction the closest stimulus to the original in latent space coordinates out of using the experimental stimuli, or where does it rank? How do small changes in beta amplitudes impact the reconstruction? The effective dimensionality of the activity space could be estimated, e.g. by PCA of the voice samples' contrast maps, and it could then be estimated how the main directions in the activity space map to differences in latent space. It would be good to get a better grasp of the granularity of information that can be decoded/ reconstructed.

      What can we make of the apparent trend that LIN is higher than VLS for identity classification (at least VLS does not outperform LIN)? A general argument of the paper seems to be that VLS is a better model of voice representations compared to LIN as a 'control' model. Then we would expect VLS to perform better on identity classification. The age and gender of a voice can likely be classified from many acoustic features that may not require dedicated voice processing.

      The RDM results reported are significant only for some subjects and in some ROIs. This presumably means that results are not significant in the other subjects. Yet, the authors assert general conclusions (e.g. the VLS better explains RDM in TVA than LIN). An assumption typically made in single-subject studies (with large amounts of data in individual subjects) is that the effects observed and reported in papers are robust in individual subjects. More than one subject is usually included to hint that this is the case. This is an intriguing approach. However, reports of effects that are statistically significant in some subjects and some ROIs are difficult to interpret. This, in my view, runs contrary to the logic and leverage of the single-subject approach. Reporting results that are only significant in 1 out of 3 subjects and inferring general conclusions from this seems less convincing.

      The first main finding is stated as being that '128 dimensions are sufficient to explain a sizeable portion of the brain activity' (L379). What qualifies this? From my understanding, only models of that dimensionality were tested. They explain a sizeable portion of brain activity, but it is difficult to follow what 'sizable' is without baseline models that estimate a prediction floor and ceiling. For instance, would autoencoders that reconstruct any spectrogram (not just voice) also predict a sizable portion of the measured activity? What happens to reconstruction results as the dimensionality is varied?

      A second main finding is stated as being that the 'VLS outperforms the LIN space' (L381). It seems correct that the VAE yields more natural-sounding reconstructions, but this is a technical feature of the chosen autoencoding approach. That the VLS yields a 'more brain-like representational space' I assume refers to the RDM results where the RDM correlations were mainly significant in one subject. For classification, the performance of features from the reconstructions (age/ gender/ identity) gives results that seem more mixed, and it seems difficult to draw a general conclusion about the VLS being better. It is not clear that this general claim is well supported.

      It is not clear why the RDM was not formed based on the 'stimulus GLM' betas. The 'identity GLM' is already biased towards identity and it would be stronger to show associations at the stimulus level.

      Multiple comparisons were performed across ROIs, models, subjects, and features in the classification analyses, but it is not clear how correction for these multiple comparisons was implemented in the statistical tests on classification accuracies.

      Risks of overfitting and bias are a recurrent challenge in stimulus reconstruction with fMRI. It would be good with more control analyses to ensure that this was not the case. For instance, how were the repeated test stimuli presented? Were they intermingled with the other stimuli used for training or presented in separate runs? If intermingled, then the training and test data would have been preprocessed together, which could compromise the test set. The reconstructions could be performed on responses from independent runs, preprocessed separately, as a control. This should include all preprocessing, for instance, estimating stimulus/identity GLMs on separately processed run pairs rather than across all runs. Also, it would be good to avoid detrending before GLM denoising (or at least testing its effects) as these can interact.

      We appreciate Reviewer #2’s careful reading and numerous suggestions for improving clarity and presentation. We have implemented the suggested text edits, corrected ambiguities, and clarified methodological details throughout the manuscript. In particular, we have toned down several sentences that we agree were making strong claims (L72, L118, L378, L380-381).

      Clarifications, corrections and additional information:

      We streamlined the introduction by reducing overly specific details and better framing the VLS concept before presenting specifics.

      Clarified the motivation for the age classification split and corrected several inaccuracies and ambiguities in the methods, including the hearing thresholds, balancing of category levels, and stimulus energy selection procedure.

      Provided additional information on the temporal structure of runs and experimental stimuli selection.

      Corrected the description of technical issues affecting one participant and ensured all acronyms are properly defined in the text and figure legends.

      Confirmed that audiograms were performed repeatedly to monitor hearing thresholds and clarified our use of robust scaling and normalization procedures.

      Regarding the test of RDM correlations, we clarified in the text that multiple comparisons were corrected using a permutation-based framework.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Lamothe et al. sought to identify the neural substrates of voice identity in the human brain by correlating fMRI recordings with the latent space of a variational autoencoder (VAE) trained on voice spectrograms. They used encoding and decoding models, and showed that the "voice" latent space (VLS) of the VAE performs, in general, (slightly) better than a linear autoencoder's latent space. Additionally, they showed dissociations in the encoding of voice identity across the temporal voice areas.

      Strengths:

      The geometry of the neural representations of voice identity has not been studied so far. Previous studies on the content of speech and faces in vision suggest that such geometry could exist. This study demonstrates this point systematically, leveraging a specifically trained variational autoencoder. 

      The size of the voice dataset and the length of the fMRI recordings ensure that the findings are robust.

      Weaknesses:

      Overall, the VLS is often only marginally better than the linear model across analysis, raising the question of whether the observed performance improvements are due to the higher number of parameters trained in the VAE, rather than the non-linearity itself. A fair comparison would necessitate that the number of parameters be maintained consistently across both models, at least as an additional verification step.

      The encoding and RSM results are quite different. This is unexpected, as similar embedding geometries between the VLS and the brain activations should be reflected by higher correlation values of the encoding model.

      The consistency across participants is not particularly high, for instance, S1 seemed to have demonstrated excellent performances, while S2 showed poor performance.

      An important control analysis would be to compare the decoding results with those obtained by a decoder operating directly on the latent spaces, in order to further highlight the interest of the non-linear transformations of the decoder model. Currently, it is unclear whether the non-linearity of the decoder improves the decoding performance, considering the poor resemblance between the VLS and brain-reconstructed spectrograms.

      We thank Reviewer #3 for their comments. In response:

      Code and preprocessed data are now available as indicated in the revised manuscript.

      While we appreciate the suggestion to display supplementary analyses as boxplots split by hemisphere, we opted to retain the current format as we do not have hypotheses regarding hemispheric lateralization, and the small sample size per hemisphere would preclude robust conclusions.

      Confirmed that the identities in Figure 3a are indeed ordered by age and have clarified this in the legend.

      The higher variance observed in correlations for the aTVA in Figure 3b reflects the small number of data points (3 participants × 2 hemispheres), and this is now explained.

      Regarding the cerebral encoding of gender and age, we acknowledge this interesting pattern. Prior work (e.g., Charest et al., 2013) found overlapping processing regions for voice gender without clear subregional differences in the TVAs. Evidence on voice age encoding remains sparse, and we highlight this novel finding in our discussion.

      We again thank the reviewers for their insightful comments, which have greatly improved the quality and clarity of our work.

      Reviewer #1 (Recommendations For The Authors):

      (1) A set of recent advances have shown that embeddings of unsupervised/self-supervised speech models aligned to auditory responses to speech in the temporal cortex (e.g. Wav2Vec2: Millet et al NeurIPS 2022; HuBERT: Li et al. Nat Neurosci 2023; Whisper: Goldstein et al.bioRxiv 2023). These models are known to preserve a variety of speech information (phonetics, linguistic information, emotions, speaker identity, etc) and perform well in a variety of downstream tasks. These other models should be evaluated or at least discussed in the study. 

      We fully agree - the pace of progress in this area of voice technology has been incredible. Many of these models were not yet available at the time this work started so we could not use them in our comparison with cerebral representations.

      We have now implemented Reviewer #1’s suggestion and evaluated Wav2Vec and HuBERT. The results are presented in supplementary Figure 2-S3. Correlations between activity predicted by the model and the real activity were globally comparable with those obtained with the LIN and VLS models. Interestingly both HuBERT and Wav2Vec yielded highest correlations in the mTVA, and to a lesser extent, the aTVA, as the LIN and VLS models.

      (2) The test statistics of the results in Fig 1c-e need to be revised. Given that logistic regression is a convex optimization problem typically converging to a global optimum, these multiple initializations of the classifier were likely not entirely independent. Consequently, the reported degrees of freedom and the effect size estimates might not accurately reflect the true variability and independence of the classifier outcomes. A more careful evaluation of these aspects is necessary to ensure the statistical robustness of the results. 

      We thank Reviewer #1 for pointing out this important issue regarding the potential dependence between multiple runs of the logistic regression model. To address this concern, we have revised our analyses and used a Wilcoxon signed-rank test to compare the decoding accuracy to chance level. The results showed that the accuracy was significantly above chance for all classifications (Wilcoxon signed-rank test, all W=15, p=0.03125). We updated Figure 1c-e and the corresponding text (L154-L155) to reflect the revised analysis. Because the focus of this section is to probe the informational content of the autoencoder’s latent spaces, and since there are only 5 decoding accuracy values per model, we dropped the inter-model statistical test.

      (3) In Line 198, the authors discuss the number of dimensions used in their models. To provide a comprehensive comparison, it would be informative to include direct decoding results from the original spectrograms alongside those from the VLS and LIN models. Given the vast diversity in vocal speech characteristics, it is plausible that the speaker identities might correlate with specific speech-related features also represented in both the auditory cortex and the VLS. Therefore, a clearer understanding of the original distribution of voice identities in the untransformed auditory space would be beneficial. This addition would help ascertain the extent to which transformations applied by the VLS or LIN models might be capturing or obscuring relevant auditory information.

      We have now implemented Reviewer #1’s suggestion. The graphs on the right panel b of revised Figure 4 now show decoding results obtained from the regression performed directly on the spectrograms, rather than on representations of them, for our two example test stimuli. They can be listened to and compared to the LIN- and VLS-based reconstructions in Supplementary Audio 2. Compared to the LIN and VLS, the SPEC-based reconstructions sounded much less vocal or similar to the original, indicating that the latent spaces indeed capture more abstract voice representations, more similar to cerebral ones.

      Reviewer #2 (Recommendations For The Authors): 

      L31: 'in voice' > consider rewording (from a voice?).

      L33: consider splitting sentence (after interactions). 

      L39: 'brain' after parentheses. 

      L45-: certainly DNNs 'as a powerful tool' extend to audio (not just image and video) beyond their use in brain models. 

      L52: listened to / heard. 

      L63: use second/s consistently. 

      L64: the reference to Figure 5D is maybe a bit confusing here in the introduction. 

      We thank Reviewer #2 for these recommendations, which we have implemented.

      L79-88: this section is formulated in a way that is too detailed for the introduction text (confusing to read). Consider a more general introduction to the VLS concept here and the details of this study later. 

      L99-: again, I think the experimental details are best saved for later. It's good to provide a feel for the analysis pipeline here, but some of the details provided (number of averages, denoising, preprocessing), are anyway too unspecific to allow the reader to fully follow the analysis. 

      Again, thank you for these suggestions for improving readability: we have modified the text accordingly.

      L159: what was the motivation for classifying age as a 2-class classification problem? Rather than more classes or continuous prediction? How did you choose the age split? 

      The motivation for the 2 age classes was to align on the gender classification task for better comparison. The cutoff (30 years) was not driven by any scientific consideration, but by practical ones, based on the median age in our stimulus set. This is now clarified in the manuscript (L149).

      L263: Is the test of RDM correlation>0 corrected for multiple comparisons across ROIs, subjects, and models?

      The test of RDM correlation>0 was indeed corrected for multiple comparisons for models using the permutation-based ‘maximum statistics’ framework for multiple comparison correction (described in Giordano et al., 2023 and Maris & Oostenveld, 2007). This framework was applied for each ROI and subject. It was described in the Methods (L745) but not clearly enough in the text—we thank Reviewer #2 and clarified it in the text (L246, L260-L261).

      L379: 'these stimuli' - weren't the experimental stimuli different from those used to train the V/AE? 

      We thank Reviewer #2 for spotting this issue. Indeed, the experimental stimuli are different from those used to train the models. We corrected the text to reflect this distinction (L84-L85).

      L443: what are 'technical issues' that prevented subject 3 from participating in 48 runs?? 

      We thank Reviewer #2 for pointing out the ambiguity in our previous statement. Participant 3 actually experienced personal health concerns that prevented them from completing the whole number of runs. We corrected this to provide a more accurate description (L442-L443).

      L444: participants were instructed to 'stay in the scanner'!? Do you mean 'stay still', or something? 

      We thank the Reviewer for spotting this forgotten word. We have corrected the passage (L444).

      L463: Hearing thresholds of 15 dB: do you mean that all had thresholds lower than 15 dB at all frequencies and at all repeated audiogram measurements? 

      We thank Reviewer #2 for spotting this error: we meant thresholds below 15dB HL. This has been corrected (L463). Indeed participants were submitted to several audiograms between fMRI sessions, to ensure no hearing loss could be caused by the scanner noise in these repeated sessions.

      L472: were the 4 category levels balanced across the dataset (in number of occurrences of each category combination)? 

      The dataset was fully balanced, with an equal number of samples for each combination of language, gender, age, and identity. Furthermore, to minimize potential adaptation effects, the stimuli were also balanced within each run according to these categories, and identity was balanced across sessions. We made this clearer in Main voice stimuli (L492-L496).

      L482: the test stimuli were selected as having high energy by the amplitude envelope. It is unclear what this means (how is the envelope extracted, what feature of it is used to measure 'high energy'?) 

      The selection of sounds with high energy was based on analyzing the amplitude envelope of each signal, which was extracted using the Hilbert transform and then filtered to refine the envelope. This envelope, which represents the signal's intensity over time, was used to measure the energy of each stimulus, and those that exceeded an arbitrary threshold were selected. From this pool of high-energy stimuli, likely including vowels, we selected six stimuli to be repeated during the scanning session, then reconstructed via decoding. This has been clarified in the text (L483-L484). 

      L500 was the audio filtered to account for the transfer function of the Sensimetrics headphones? 

      We did not perform any filtering, as the transfer function of the Sensimetrics is already very satisfactory as is. This has been clarified in the text (L503).

      L500: what does 'comfortable level' correspond to and was it set per session (i.e. did it vary across sessions)? 

      By comfortable we mean around 85 dB SPL. The audio settings were kept similar across sessions. This has been added to the text (L504).

      L526- does the normalization imply that the reconstructed spectrograms are normalized? Were the reconstructions then scaled to undo the normalization before inversion? 

      The paragraph on spectrogram standardization was not well placed inducing confusion. We have placed this paragraph in its more suitable location, in the Deep learning section (L545L550)

      L606: does the identity GLM model the denoised betas from the first GLM or simply the BOLD data? The text indicates the latter, but I suspect the former. 

      Indeed: this has been clarified (L601-L602).

      L704: could you unpack this a bit more? It is not easy to see why you specify the summing in the objective. Shouldn't this just be the ridge objective for a given voxel/ROI? Then you could just state it in matrix notation. 

      Thanks for pointing this out: we kept the formula unchanged but clarified the text, in particular specified that the voxel id is the ith index (L695).

      L716: you used robust scaling for the classifications in latent space but haven't mentioned scaling here. Are we to assume that the same applies?  

      Indeed we also used robust scaling here, this is now made clear (L710-L711).

      L720: Pearson correlation as a performance metric and its variance will depend on the choice of test/train split sizes. Can you show that the results generalize beyond your specific choices? Maybe the report explained variance as well to get a better idea of performance. 

      We used a standard 80/20 split. We think it is beyond the scope of this study to examine the different possible choices of splits, and prefer not to spend additional time on this point which we think is relatively minor.

      Could you specify (somewhere) the stimulus timing in a run? ISI and stimulus duration are mentioned in different places, but it would be nice to have a summary of the temporal structure of runs.

      This is now clarified at the beginning of the Methods section (L437-441)

      Reviewer #3 (Recommendations For The Authors):

      Code and data are not currently available. 

      Code and preprocessed data are now available (L826-827).

      In the supplementary material, it would be beneficial to present the different analyses as boxplots, as in the main text, but with the ROIs in the left and right hemispheres separated, to better show potential hemispheric effect. Although this information is available in the Supplementary Tables, it is currently quite tedious to access it. 

      Although we provide the complete data split by hemisphere in the Tables, we do not believe it is relevant to illustrate left/right differences, as we do not have any hypotheses regarding hemispheric lateralization–and we would be underpowered in any case to test them with only three points by hemisphere.

      In Figure 3a, it might be beneficial to order the identities by age for each gender in order to more clearly illustrate the structure of the RDMs,  

      The identities are indeed already ordered by increasing age: we now make this clear.

      In Figure 3b, the variance for the correlations for the aTVA is higher than in other regions, why? 

      Please note that the error bar indicates variance across only 6 data points (3 subjects x 2 hemispheres) such that some fluctuations are to be expected.

      Please make sure that all acronyms are defined, and that they are redefined in the figure legends. 

      This has been done.

      Gender and age are primarily encoded by different brain regions (Figure 5, pTVA vs aTVA). How does this finding compare with existing literature?

      This interesting finding was not expected. The cerebral processing of voice gender has been investigated by several groups including ours (Charest et al., 2013, Cerebral Cortex). Using an fMRI-adaptation design optimized using a continuous carry-over protocol and voice gender continua generated by morphing, we found that regions dealing with acoustical differences between voices of varying gender largely overlapped with the TVAs, without clear differentiation between the different subparts. Evidence for the role of the different TVAs in voice age processing remains scarce.

    1. eLife Assessment

      This study makes a valuable contribution by elucidating the genetic determinants of growth and fitness across multiple clinical strains of Mycobacterium intracellulare, an understudied non-tuberculous mycobacterium. Using transposon sequencing (Tn-seq), the authors identify a core set of 131 genes essential for bacterial adaptation to hypoxia, providing a convincing foundation for anti-mycobacterial drug discovery. Minor concerns remain regarding the presentation of Fig. 8C and the interpretation of data related to hypoxia.

    2. Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study.

      Comments on revisions:

      The revised manuscript has responded to the previous concerns of the reviewers, albeit modestly. The overemphasis on hypoxic adaptation of the clinical isolates persist as a key concern in the paper. The authors have compared the growth-curve of each of the clinical and ATCC strains under normal and hypoxic conditions (Fig. 8), but don't show how mutations in some of the genes identified in Tn-seq would impact the growth phenotype under hypoxia. They largely base their arguments on previously published results.

      As I mentioned previously, the paper will be better without over-interpreting the TnSeq data in the context of hypoxia.

      Other points:

      The y-axis legends of plots in Fig.8c are illegible.

      The statements in lines 376-389 are convoluted and need some explanation. If the clinical strains enter the log phase sooner than ATCC strain under hypoxia, then how come their growth rates (fig. 8c) are lower? Aren't they are expected to grow faster?

    3. Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community.

      Weaknesses:

      (1) Comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.

      (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growth-advantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear.

      (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

      Comments on revisions:

      The revised version has satisfactorily addressed my initial comments in the discussion section.

    4. Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

      Comments on revisions:

      There is quite a lot of data and this could have been a really impactful study if the the authors had channelized the Tn mutagenesis by focussing on one pathway or network. It looks scattered. However, from the previous version, the authors have made significant improvements to the manuscript and have provided comments that fairly address my questions.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Thank you for the comment on the significance of our manuscript on the basic research of non-tuberculous mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study

      Thank you for the comment on the issue of the claim of better adaptation for hypoxic growth in the clinical strains being not completely revealed. We agree the reviewer’s comment that comprehensive investigation of adaptation for hypoxic growth in the clinical strains should be a future project in terms of the complexity of an experimental design.

      Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community

      Thank you for the comment on the significance of our manuscript on the basic research of non-tuberculous mycobacteria.

      Weaknesses:

      (1) A comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.

      Thank you for the comment on the idea of a comparative growth assay of pure and mixed cultures of clinical and reference strains under hypoxia. We appreciate the idea that showing the phenomenon of advantage of bacterial growth of the clinical strains under hypoxia in mixed culture with the ATCC strain would be important to strengthen the claim of better adaptation for hypoxic growth in the clinical strains. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we consider that our current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      Following the comment, we have added the mention of the mixed culture experiment and the growth assay using individual knockout strains as future directions (page 35 lines 614-632 in the revised manuscript).

      “We have provided the data suggesting the preferential hypoxic adaptation in clinical strains compared to the ATCC type strain by the growth assay of individual strains. To strengthen our claim, several experiments are suggested including mixed culture experiments of clinical and reference strains under hypoxia. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we took the current approach using monoculture growth curves under defined oxygen conditions, which offers a clearer interpretation of strainspecific hypoxic responses. Furthermore, one of the limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Contrary to the case of Mtb, the technique of constructing knockout mutants of slow-growing NTM including M. intracellulare has not been established long time. We have just recently succeeded in constructing the vector plasmids for making knockout mutants of M intracellulare (Tateishi. Microbiol Immunol. 2024). Growth assay of individual knockout strains of genes showing increased genetic requirements such as pckA, glpX, csd, eccC5 and mycP5 in the clinical strains is suggested to provide the direct involvement of these genes on the preferential hypoxic adaptation in clinical strains. We have a future plan to construct knockout mutants of these genes to confirm the involvement of these genes on preferential hypoxic adaptation.”

      Reference

      Tateishi, Y., Nishiyama, A., Ozeki, Y. & Matsumoto, S. Construction of knockoutmutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL<sup>+</sup>. Microbiol Immunol 68, 339-347 (2024).

      (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growthadvantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear

      Thank you for the comment on the issue of not providing the quantitative value of read counts for classifying the gene essentiality. In this study, we used an Hidden Markov Model (HMM) to predict gene essentiality. The HMM does not classify the 4 gene essentiality uniquely by the quantitative number of read counts but uses a probabilistic model to estimate the state at each TA based on the read counts and consistency with adjacent sites (Ioerger. Methods Mol Biol 2022).

      The HMM uses consecutive data of read counts and calculates transition probability for predicting gene essentiality across the genome. The HMM allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions. The HMM can smooth over individual outlier values (such as an isolated insertion in any otherwise empty region, or empty sites scattered among insertion in a non-essential region) and make a call for a region/gene that integrates information over multiple sites. The gene-level calls are made based on the majority call among the TA sites within each gene. The HMM automatically tunes its internal parameters (e.g. transition probabilities) to the characteristics of the input datasets (saturation and mean insertion counts) and can work over a broad range of saturation levels (as low as 20%) (DeJesus. BMC Bioinformatics 2013). Thus, HMM can represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes (https://orca1.tamu.edu/essentiality/Tn-HMM/index.html)

      Thus, the prediction of gene essentiality by the HMM does not rely on the quantitative threshold of Tn insertion reads independently at each TA site, but rather it is the most probable states for the whole sequence taken together (computed using Vitebri algorithm). Of the statistical methods, the HMM is a standard method for predicting gene essentiality in TnSeq (Ioerger TR. Methods Mol Biol. 2022) since a substantial number of TnSeq studies adopt this method for predicting gene essentiality (Akusobi. mBio 2025, DeJesus. mBio 2017, Dragset mSystems 2019, Mendum. BCG Genomics 2019). The HMM can be applied in many bioinformatics fields such as profiling functional protein families, identifying functional domains, sequence motif discoveries and gene prediction.

      Taken together, we do not have the quantitative value of read counts for classifying gene essentiality by an HMM because the statistical methods for predicting gene essentiality do not uniquely use the quantitative value of read counts but use the transition of the read counts across the genome.

      Reference

      Ioerger TR. Analysis of Gene Essentiality from TnSeq Data Using Transit. Methods Mol Biol. 2022 ; 2377: 391–421. doi:10.1007/978-1-0716-1720-5_22.

      DeJesus MA, Ioerger TR (2013) A Hidden Markov Model for identifying essential and 5 growth-defect regions in bacterial genomes from transposon insertion sequencing data. BMC Bioinformatics 14:303 [PubMed: 24103077]

      Website by Ioerger: A Hidden Markov Model for identifying essential and growthdefect regions in bacterial genomes from transposon insertion sequencing data. https://orca1.tamu.edu/essentiality/Tn-HMM/index.html

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Dragset, M.S., et al. Global assessment of Mycobacterium avium subsp. hominissuis genetic requirement for growth and virulence. mSystems 4, e00402-19 (2019). Mendum T.A., et al. Transposon libraries identify novel Mycobacterium bovis BCG genes involved in the dynamic interactions required for BCG to persist during in vivo passage in cattle. BMC Genomics 20, 431 (2019)

      (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

      Thank you for the comment on the issue of the lack of validation of TnSeq results by using individual knockout mutants. We agree that the lack of validation of TnSeq results is one of the limitations of this study. We have just recently succeeded in constructing the vector plasmids for making knockout mutants of M intracellulare (Tateishi. Microbiol Immunol. 2024). We will proceed to the validation experiment of TnSeq-hit genes by constructing knockout mutants.

      Following the comment, we have added the description in the Discussion (page 35 lines 622-632 in the revised manuscript) as follows: “Furthermore, one of the limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Contrary to the case of Mtb, the technique of constructing knockout mutants of slow-growing NTM including M. intracellulare has not been established long time. We have just recently succeeded in constructing the vector plasmids for making knockout mutants of M intracellulare (Tateishi. Microbiol Immunol 2024). Growth assay of individual knockout strains of genes showing increased genetic requirements such as pckA, glpX, csd, eccC5 and mycP5 in the clinical strains is suggested to provide the direct involvement of these genes on the 6 preferential hypoxic adaptation in clinical strains. We have a future plan to construct knockout mutants of these genes to confirm the involvement of these genes on preferential hypoxic adaptation.”

      Reference

      Tateishi, Y., Nishiyama, A., Ozeki, Y. & Matsumoto, S. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL + . Microbiol Immunol 68, 339-347 (2024).

      Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse 7 infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Thank you for the comment on the significance of our manuscript on the basic research of non-tuberculous mycobacteria.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

      Thank you for the comment on the issue of data presentation. Our point-by-point response to the Reviewer’s comments is shown below.

      Reviewer #5 (Recommendations for the authors):

      Major comments:

      (1) The result section could have been better organized by splitting into multiple sections with each section focusing on a particular aspect.

      Thank you for the comment on the organization of the section. We have split into multiple sections with each section focusing on a particular aspect as follows:

      (1) Common essential and growth-defect-associated genes representing the genomic diversity of M. intracellulare strains (page 6 lines 102-103 in the revised manuscript)

      (2) The sharing of strain-dependent and accessory essential and growth-defectassociated genes with genes required for hypoxic pellicle formation in the type strain ATCC13950 (page 8 lines 129-131 in the revised manuscript)

      (3) Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in the type strain ATCC13950 (page 9 lines 151-153 in the revised manuscript)

      (4) Minor role of gene duplication on reduced genetic requirements in clinical MACPD strains (page 11 lines 184-185 in the revised manuscript)

      (5) Identification of genes in the clinical MAC-PD strains required for mouse lung infection (page 12 lines 210-211 in the revised manuscript) 8

      (6) Effects of knockdown of universal essential or growth-defect-associated genes in clinical MAC-PD strains (page 17 lines 305-306 in the revised manuscript)

      (7) Differential effects of knockdown of accessory/strain-dependent essential or growth-defect-associated genes among clinical MAC-PD strains (page 19 lines 325- 326 in the revised manuscript)

      (8) Preferential hypoxic adaptation of clinical MAC-PD strains evaluated with bacterial growth kinetics (page 21 lines 365-366 in the revised manuscript)

      (9) The pattern of hypoxic adaptation not simply determined by genotypes (page 22 line 386 in the revised manuscript)

      (2) The different strains that were used in the study, how they were isolated and some information on their genotypes could have been mentioned in brief in the main text and a table of different strains included as a supplementary table

      Thank you for the comment on the information on the clinically isolated strains used in this study. All clinical strains were isolated from sputum of MAC-PD patients (Tateishi. BMC Microbiol. 2021, BMC Microbiol. 2023). Sputum samples were treated by the standard method for clinical isolation of mycobacteria with 0.5% (w/v) Nacetyl-L-cysteine and 2% (w/v) sodium hydroxide and plated on 7H10/OADC agar plates. Single colonies were picked up for use in experiments as isolated strains.

      Following the comment, we have added the description on the information of the strains (page 37 lines 652-660 in the revised manuscript). “All eleven clinical strains from MAC-PD patients in Japan were isolated from sputum (Tateishi. BMC Microbiol 2021, BMC Microbiol 2023). Sputum samples were treated by the standard method for clinical isolation of mycobacteria with 0.5% (w/v) N-acetyl-L-cysteine and 2% (w/v) sodium hydroxide and plated on 7H10/OADC agar. Single colonies were picked up for use in experiments as isolated strains. Of these strains, ATCC13950, M.i.198, M.i.27, M018, M005 and M016 belong to the typical M. intracellulare (TMI) genotype and M001, M003, M019, M021 and MOTT64 belong to the M. paraintracellulare-M. indicus pranii (MP-MIP) genotype (Fig. 1, new Supplementary Table 1)”

      Moreover, we have added the Supplementary Table showing the information on genotypes of each strain and the purpose of the use of study strains as new Supplementary Table 1

      References

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium aviumintracellulare complex disease. BMC Microbiol 21, 103 (2021). Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3) As stated by the previous reviews, an explanation for the variation in the Tn insertion across different strains has not been provided and how they derive conclusions when the Tn frequency was not saturating.

      Thank you for the comment on how to predict gene essentiality from our TnSeq data under the variation in the Tn insertion reads with suboptimal levels of saturation without reaching full saturation of Tn insertion.

      As for the overcome of the Tn insertion variation, we normalized data by using Beta-Geometric correction (BGC), a non-linear normalization method. BGC normalizes the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ, and BGC improves resampling by reducing the skew. On TRANSIT software, we set the replicate option as Sum to combine read counts. And we normalized the datasets by Beta-Geometric correction (BGC) to reduce variabilities and performed resampling analysis by using normalized datasets to compare the genetic requirements between strains.

      Following the comment, we have explained the variation in the Tn insertion across different strains in the manuscript (pages 39-40, lines 700-708 in the revised manuscript). “The number of Tn insertion in our datasets varied between 1.3 to 5.8 million among strains. To reduce the variation in the Tn insertion across strains, we adopt a non-linear normalization method, Beta-Geometric correction (BGC). BGC normalizes the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ, and BGC improves resampling by reducing the skew. On TRANSIT software, we set the replicate option as Sum to combine read counts. And we normalized the datasets by BGC and performed resampling analysis by using normalized datasets to compare the genetic requirements between strains.”

      As for the issue of saturation levels of Tn insertion in our Tn mutant libraries, we made a description in the Discussion in the 1st version of the revised manuscript (pages 33-35 lines 592-613 in the 2nd version of the revised manuscript). The saturation of our Tn mutant libraries became 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9% by combining replicates. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study are similar to the very recent TnSeq anlaysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) are used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025). The saturation of Tn insertion in individual replicates of our libraries is also comparable to that reported by DeJesus (Table S1 in mBio 2017). Thus, we consider that our TnSeq method of identifying essential genes and detecting the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 is acceptable.

      As for the identification of essential or growth-defect-associated genes by an HMM analysis, we do not consider that we made a serious mistake for the classification of essentiality by an HMM method in most of the structural genes that encode proteins. Because, as DeJesus shows, the number essential genes identified by TnSeq are comparable in large genes possessing more than 10 TA sites between 2 and 14 TnSeq datasets, most of which seem to be structural genes (Supplementary Fig 2 in mBio 2017). If the reviewer intends to regard our libraries far less saturated due to the smaller replicates (n = 2 or 3) than the previous DeJesus’ and Rifat’s reports using 10-14 replicates obtained to acquire so-called “high-density” transposon libraries (DeJesus. mBio 2017, Rifat. mBio 2021), there is a possibility that not all genes could be detected as essential due to the incomplete 11 covering of Tn insertion at nonpermissive TA sites, especially the small genes including small regulatory RNAs. Even if this were the case, it would not detract from the findings of our current study

      As for the identification of genetic requirements by a resampling analysis, we consider that our data is acceptable because we compared the normalized data between strains whose saturation levels are similar to the previous report by Akusobi with “high-density” transposon libraries as mentioned above.

      References

      DeJesus, M.A., Ambadipudi, C., Baker, R., Sassetti, C. & Ioerger, T.R. TRANSIT--A software tool for Himar1 TnSeq analysis. PLoS Comput Biol 11, e1004401 (2015). Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Rifat, D., Chen L., Kreiswirth, B.N. & Nuermberger, E.L.. Genome-wide essentiality analysis of Mycobacterium abscessus by saturated transposon mutagenesis and deep sequencing. mBio 12, e0104921 (2021).

      (4) ATCC strain is missing in the mouse experiment.

      Thank you for the comment on the necessity of setting ATCC13950 as a control strain of mouse TnSeq experiment. To set ATCC13950 as a control strain in mouse infection experiments would be ideal. However, we have proved that ATCC13950 is eliminated within 4 weeks of infection in mice (Tateishi. BMC Microbiol 2023). To perform TnSeq, it is necessary to collect colonies at least the number of TA sites mathematically (Realistically, colonies with more than the number of TA sites are needed to produce biologically robust data.). That means, it is impossible to perform in vivo TnSeq study using ATCC13950 due to the inability to harvest sufficient number of colonies.

      To make these things understood clearly, we have added the description of being unable to perform in vivo TnSeq in ATCC13950 in the result section (page 13 lines 221-222 in the revised manuscript).

      “(It is impossible to perform TnSeq in lungs infected with ATCC13950 because ATCC13950 is eliminated within 4 weeks of infection) (Tateishi. BMC Microbiol 2023)”

      Reference

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (5) The viability assays done in 96 well plate may not be appropriate given that mycobacterial cultures often clump without vigorous shaking. How did they control evaporation for 10 days and above?

      Thank you for the comment on the issue of viability assay in terms of bacterial clumping. As described in the Methods (page 44 lines 778-781 in the revised manuscript), we have mixed the culture containing 250 μL by pipetting 40 times to loosen clumping every time before sampling 4 μL for inoculation on agar plates to count CFUs. By this method, we did not observe macroscopic clumping or pellicles like of Mtb or M. bovis BCG as seen in statistic culture.

      We used inner wells for culture of bacteria in hypoxic growth assay. To control evaporation of the culture, we filled the distilled water in the outer wells and covered the plates with plastic lids. We cultured the plates with humidification at 37°C in the incubator.

      (6) Fig. 7a many time points have only two data points and in few cases. The Y axis could have been kept same for better comparison for all strains and conditions.

      Thank you for the comments on the data presentation of hypoxic growth assay in original Fig. 7a (new Fig 8a). The reason of many time points with only two data points is the close values of data in individual replicates. For example, the log10- transformed values of CFUs in ATCC13950 under aerobic culture are 4.716, 4.653, 4.698 at day 5, 4.949, 5.056, 4.954 at day 6, and 5.161, 5.190, 5.204 at day 8. We have added the numerical data of CFUs used for drawing growth curves as new Supplementary Table 19. Therefore, the data itself derives from three independent replicates.

      Following the comment, we have revised the data presentation in new Fig 8a (original Fig. 7a) by keeping the same maximal value of Y axis across all graphs. In addition, we have revised the legend to designate clearly how we obtained the data of growth curves as follows (page 63 lines 1107-1108 in the revised manuscript): “Data on the growth curves are the means of three biological replicates from one experiment. Data from one experiment representative of three independent 13 experiments (N = 3) are shown.”

      (7) The relevance of 7b is not well discussed and a suitable explanation for the difference in the profiles of M001 and MOTT64 between aerobic and hypoxia is not provided. Data representation should be improved for 7c with appropriate spacing.

      Thank you for the comments on the relevance of original Fig. 7b (new Fig. 8b). In order to compare the pattern of logarithmic growth curves between strains quantitatively, we focused on time and slope at midpoint. The time at midpoint is the timing of entry to logarithmic growth phase. The earlier the strain enters logarithmic phase, the smaller the value of the time at midpoint becomes.

      The two strains belonging to the MP-MIP subgroup, MOTT64 and M001 showed similar time at midpoint under aerobic conditions. However, the time at midpoint was significantly different between MOTT64 and M001 under hypoxia, the latter showing great delay of timing of entry to logarithmic phase. In contrast to the majority of the clinical strains that showed reduced growth rate at midpoint under hypoxia, neither strain showed such phenomenon under hypoxia. Although the implication in clinical situations has not been proven, strains without slow growth under hypoxia may have different (possibly strain-specific) mechanisms of hypoxic adaptation corresponding to the growth phenotypes under hypoxia.

      Following the comment, we have added the explanation on the difference in the profiles of M001 and MOTT64 between aerobic and hypoxia in the Discussion (page 31 lines 552-557, page 32 lines 562-567 in the revised manuscript). “The two strains belonging to the MP-MIP subgroup, MOTT64 and M001 showed similar time at midpoint under aerobic conditions. However, the time at midpoint was significantly different between MOTT64 and M001 under hypoxia, the latter showing great delay of timing of entry to logarithmic phase. In contrast to the majority of the clinical strains that showed slow growth at midpoint under hypoxia, neither strain showed such phenomenon.”.

      ” Our inability to construct knockdown strains in M001 and MOTT64 prevented us from clarifying the factors that discriminate against the pattern of hypoxic adaptation. Although the implication in clinical situations has not been proven, strains without slow growth under hypoxia may have different (possibly strainspecific) mechanisms of hypoxic adaptation corresponding to the growth phenotypes under hypoxia.”

      Following the comment, we have made the space between new Fig. 8b and 14 new Fig. 8c (original Fig. 7b and Fig. 7c).

      (8) Fig. 8a, the antibiotic sensitivity at early and later time points do not seem to correlate. Any explanation?

      Thank you for the comment on the uncorrelation of data of growth inhibition in knockdown strains of universal essential genes between early and later time points. The diminished effects of growth inhibition observed at Day 7 in knockdown strains may be due to the “escape” clones of knockdown strains under long-term culture by adding anhydrotetracycline (aTc) that induces sgRNA. As described in the Methods (pages 42-43 lines 754-758), we added aTc repeatedly every 48 h to maintain the induction of dCas9 and sgRNAs in experiments that extended beyond 48 h (Singh. Nucl Acid Res 2016). Such phenomenon has been reported by McNeil (Antimicrob Agent Chem. 2019) showing the increase in CFUs by day 9 with 100 ng/mL aTc with bacterial growth being detected between 2 and 3 weeks. These phenotypes of “escape” mutants is considered to be attributed to the promotor responsiveness to aTc.

      Nevertheless, except for gyrB in M.i.27, the effect of growth inhibition at Day 7 in knockdown strains of universal essential genes was 10-1 or less of comparative growth rates of knockdown strains to vector control strains (y-axis of original Fig. 8). In this study, we judged the positive level of growth inhibition as 10-1 or less of comparative growth rates of knockdown strains to vector control strains (y-axis of new Fig. 7). Thus, we consider that the CRISPR-i data overall validated the essentiality of these genes.

      References

      Singh A.K., et al. Investigating essential gene function in Mycobacterium tuberculosis using an efficient CRISPR interference system, Nucl Acid Res 44, e143 (2016) McNeil M.B. &, Cook, G.M. Utilization of CRISPR interference to validate MmpL3 as a drug target in Mycobacterium tuberculosis. Antimicrob Agent Chem 63, e00629-19 (2019)

      (9) Fig. 8b and c very data representation could have been improved. Some strains used in 7 are missing. The authors refer to technical challenge with respect to M001. Is it the same for others as well (MOTT64). The interpretation of data in result and discussion section is difficult to follow. Is the data subjected to statistical analysis?

      Thank you for the comment on data presentation in original Fig. 8b (new Fig 7b). As 15 mentioned in the Discussion (page 18 lines 316-31 in the revised manuscript), the reason of missing M001 and MOTT64 in CRISPR-i experiment in original Fig. 7 (new Fig. 8) was we were unable to construct the knockdown strains in M001 and MOTT64. We consider these are the same technical challenges between M001 and MOTT64.

      Following the comment, we have added the explanation of the technical challenge with respect to M001 and MOTT64 in the Discussion (page 32 lines 561- 566 in the revised manuscript). ”Our inability to construct knockdown strains in M001 and MOTT64 prevented us from clarifying the factors that discriminate against the pattern of hypoxic adaptation. Although the implication in clinical situations has not been proven, strains without slow growth under hypoxia may have different (possibly strain-specific) mechanisms of hypoxic adaptation corresponding to the growth phenotypes under hypoxia.”

      As for the interpretation of growth suppression in knockdown experiments described in original Fig. 8 (new Fig. 7), We judged the positive level of growth inhibition as 10-1 or less of comparative growth rates of knockdown strains to vector control strains (y-axis of new Fig. 7). We interpreted the results based on whether the level of growth inhibition was positive or not (i.e. the comparative growth rates of knockdown strains to vector control strains became below 10-1 or not). Since our aim was to investigate whether knockdown of the target genes in each strain leads to growth inhibition, we did not perform statistical analysis between strains or target genes.

      The major weakness of the study is the organization and data representation. It became very difficult to connect the role of gluconeogenesis, secretion system and others identified by authors to hypoxia, pellicle formation. The authors may consider rephrasing the results and discussion sections.

      Thank you for the comments on the issue of organization and data presentation. Following the comment, we have revised the manuscript to indicate the relevance of the role of gluconeogenesis, secretion system and others defined by us more clearly (page 23 lines 404-408 in the revised manuscript).

      “Because the profiles of genetic requirements reflect the adaptation to the environment in which bacteria habits, it is reasonable to assume that the increase of genetic requirements in hypoxia-related genes such as gluconeogenesis (pckA, glpX), type VII secretion system (mycP5, eccC5) and cysteine desulfurase (csd) play an important role on the growth under hypoxia-relevant conditions in vivo.”

      Following the comments, we have exchanged the order of data presentation as follows: in vitro TnSeq (pages 6-12 lines 102-208 in the revised manuscript) , Mouse TnSeq (pages 12-17 lines 210-303 in the revised manuscript), Knockdown experiment (pages 17-21 lines 305-363 in the revised manuscript), Hypoxic growth assay (pages 21-23 lines 365-408 in the revised manuscript).

      In association with the exchange of the order of data presentation, we have changed the order of the contents of the Discussion as follows: Preferential carbohydrate metabolism under hypoxia such as pckA and glpX (pages 24-26 lines 424-466 in the revised manuscript), Cysteine desulfurase gene (csd) (pages 26-27 lines 467-482 in the revised manuscript), Conditional essential genes in vivo such as type VII secretion system (pages 27-28 lines 483-497 in the revised manuscript), Knockdown experiment (pages 28-30 lines 498-536 in the revised manuscript), Hypoxic growth pattern (pages 30-32 lines 537-571 in the revised manuscript), Failure of assay using PckA inhibitors (pages 32-33 lines 572-578 in the revised manuscript), Transformation efficiencies (page 33 lines 579-591 in the revised manuscript), Saturation of Tn insertion (pages 33-35 lines 592-613 in the revised manuscript), Suggested future experiment plan (pages 35-36 lines 614-632 in the revised manuscript).

    1. eLife Assessment

      This work offers important insights into the protein CHD4's function in chromatin remodeling and gene regulation in embryonic stem cells, supported by extensive biochemical, genomic, and imaging data. The use of an inducible degron system allows precise functional analysis, and the datasets generated represent a key resource for the field. While some interpretations of complex data could be more strongly substantiated, the study overall provides compelling evidence and makes a significant contribution to understanding CHD4's role in epigenetic regulation. This work will be of interest to the epigenetics and stem biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed an elegant investigation to clarify the roles of CHD4 in chromatin accessibility and transcription regulation. In addition to the common mechanisms of action through nucleosome repositioning and opening of transcriptionally active regions, the authors considered here a new angle of CHD4 action through modulating the off-rate of transcription factor binding. Their suggested scenario is that the action of CHD4 is context-dependent and is different for highly-active regions vs low-accessibility regions.

      Strengths:

      This is a very well-written paper that will be of interest to researchers working in this field. The authors performed a large amount of work with different types of NGS experiments and the corresponding computational analyses. The combination of biophysical measurements of the off-rate of protein-DNA binding with NGS experiments is particularly commendable.

      Weaknesses:

      This is a very strong paper. I have only very minor suggestions to improve the presentation:

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level).

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text.

      4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG".

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section.

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1).

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text.

    3. Reviewer #2 (Public review):

      This study leverages acute protein degradation of CHD4 to define its role in chromatin and gene regulation. Previous studies have relied on KO and/or RNA interference of this essential protein and, as such, are hampered by adaptation, cell population heterogeneity, cell proliferation, and indirect effects. The authors have established an AID2-based method to rapidly deplete the dMi-2 remodeller to circumvent these problems. CHD4 is gone within an hour, well before any effects on cell cycle or cell viability can manifest. This represents an important technical advance that, for the first time, allows a comprehensive analysis of the immediate and direct effect of CHD4 loss of function on chromatin structure and gene regulation.

      Rapid CHD4 degradation is combined with ATAC-seq, CUT&RUN, (nascent) RNA-seq, and single-molecule microscopy to comprehensively characterise the impact on chromatin accessibility, histone modification, transcription, and transcription factor (NANOG, SOX2, KLF4) binding in mouse ES cells.

      The data support the previously developed model that high levels of CHD4/NuRD maintain a degree of nucleosome density to limit TF binding at open regulatory regions (e.g., enhancers). The authors propose that CHD4 activity at these sites is an important prerequisite for enhancers to respond to novel signals that require an expanded or new set of TFs to bind.

      What I find even more exciting and entirely novel is the finding that CHD4 removes TFs from regions of limited accessibility to repress cryptic enhancers and to suppress spurious transcription. These regions are characterised by low CHD4 binding and have so far never been thoroughly analysed. The authors correctly point out that the general assumption that chromatin regulators act on regions where they seem to be concentrated (i.e., have high ChIP-seq signals) runs the risk of overlooking important functions elsewhere. This insight is highly relevant beyond the CHD4 field and will prompt other chromatin researchers to look into low-level binding sites of chromatin regulators.

      The biochemical and genomic data presented in this study are of high quality (I cannot judge single microscopy experiments due to my lack of expertise). This is an important and timely study that is of great interest to the chromatin field.

      I have a number of comments that the authors might want to consider to improve the manuscript further:

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of fold-difference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation.

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)?

      (3) The is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, an inducible degron approach is taken to investigate the function of the CHD4 chromatin remodelling complex. The cell lines and approaches used are well thought out, and the data appear to be of high quality. They show that loss of CHD4 results in rapid changes to chromatin accessibility at thousands of sites. Of these locations at which chromatin accessibility is decreased are strongly bound by CHD4 prior to activation of the degron, and so likely represent primary sites of action. Somewhat surprisingly, while chromatin accessibility is reduced at these sites, transcription factor occupancy is little changed. Following CHD4 degradation, occupancy of the key pluripotency transcription factors NANOG and SOX2 increases at many locations genome-wide wide and at many of these sites, chromatin accessibility increases. These represent important new insights into the function of CHD4 complexes.

      Strengths:

      The experimental approach is well-suited to providing insight into a complex regulator such as CHD4. The data generated to characterise how cells respond to the loss of CHD4 is of high quality. The study reveals major changes in transcription factor occupancy following CHD4 depletion.

      Weaknesses:

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low.

    5. Author response:

      Reviewer #1 (Public review):

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level). 

      This is now expanded in the Discussion

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable. 

      We have normalised figure text as much as is feasible.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text. 

      We have removed references to these terms from the text and included a definition in the figure legend. 

      (4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG". 

      We have removed this panel as it was confusing and did not demonstrate any robust conclusion. 

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section. 

      We have included an explanation of the curve fitting equation in the Methods as suggested.

      The apparent dissociation rate observed is a sum of multiple rates of decay – true dissociation rate (𝑘<sub>off</sub>), signal loss caused by photobleaching 𝑘<sub>pb</sub>, and signal loss caused by defocusing/tracking error (𝑘<sub>tl</sub>).

      k<sub>off</sub><sup>app</sup>= k<sub>off</sub> + K<sub>pb</sub> + k<sub>tl</sub>

      We are making conclusions about relative changes in k<sub>off</sub><sup>app</sup> upon CHD4 depletion, not about the absolute magnitude of true k<sub>off</sub> or TF residence times. Our conclusions extend to true k<sub>off</sub> based on the assumption that K<sub>pb</sub> and k<sub>tl</sub> are equal across all samples imaged due to identical experimental conditions and analysis.

      K<sub>pb</sub> and k<sub>tl</sub> vary hugely across experimental set-ups, especially with diZerent laser powers, so other k<sub>off</sub> or k<sub>off</sub><sup>app</sup> values reported in the literature would be expected to diZer from ours. Time-lapse experiments or independent determination of K<sub>pb</sub> (and k<sub>tl</sub>) would be required to make any statements about absolute values of k<sub>off</sub>.

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1). 

      We have now included a discussion of this point and referenced both papers.

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text. 

      We have endeavoured to define all relevant terms in the figure legends. 

      Reviewer #2 (Public review): 

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of folddifference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation. 

      The heatmap displays z-scores, meaning expression for each gene has been centred and scaled across the entire time course. As a result, time zero is not a true baseline, it simply shows whether the gene’s expression at that moment is above or below its own mean. A transition from blue to red therefore indicates that the gene increases relative to its overall average, which typically corresponds to upregulation, but it doesn’t directly represent fold-change from the 0-hour time point. We have now included a brief explanation of this in the figure legend to make this point clear.  

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)? 

      We have edited the text to more accurately reflect what is going on in the screen shot. We have also replaced “WT” with “0” as this more accurately reflects the status of these cells. 

      (3) The is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself. 

      We now include more speculation on this point in the Discussion.

      Reviewer #3 (Public review): 

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low. 

      We acknowledge that we cannot definitively say any effect is a direct consequence of CHD4 depletion and have mitigated statements in the Results and Discussion. 

      Reviewing Editor Comments: 

      I am pleased to say all three experts had very complementary and complimentary comments on your paper - congratulations. Reviewer 3 does suggest toning down a few interpretations, which I suggest would help focus the manuscript on its greater strengths. I encourage a quick revision to this point, which will not go back to reviewers, before you request a version of record. I would also like to take this opportunity to thank all three reviewers for excellent feedback on this paper. 

      As advised we have mitigated the points raised by the reviewers.

    1. eLife Assessment

      Dong et al. present a valuable analysis of mutant phenotypes of the Rab GTPases Rab5, Rab7, and Rab11 in Drosophila second-order olfactory neuron development. This is a solid characterization and comparison of the different Rab mutants on projection neuron development, with clear differences for the three Rabs, and by inference for the early, late, and recycling endosomal functions executed by each.

    2. Reviewer #1 (Public review):

      Summary:

      Dong et al. present an in-depth analysis of mutant phenotypes of the Rab GTPases Rab5, Rab7, and Rab11 in Drosophila second-order olfactory neuron development. These three Rab GTPases are amongst the best-characterized Rab GTPases in eukaryotes and have been associated with major roles in early endosomes, late endosomes, and recycling endosomes, respectively. All three have been investigated in Drosophila neurons before; however, this study provides the most detailed characterization and comparison of mutant phenotypes for axonal and dendritic development of fly projection neurons to date. In addition, the authors provide excellent high-resolution data on the distribution of each of the three Rabs in developmental analyses.

      Strengths:

      The strength of the work lies in the detailed characterization and comparison of the different Rab mutants on projection neuron development, with clear differences for the three Rabs and by inference for the early, late, and recycling endosomal functions executed by each.

      Weaknesses:

      Some weakness derives from the fact that Rab5, Rab7, and Rab11 are, as acknowledged by the authors, somewhat pleiotropic, and their actual roles in projection neuron development are not addressed beyond the characterization of (mostly adult) mutant phenotypes and developmental expression.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Dong et al. characterizes the roles of highly-expressed Rab GTPases Rab5, Rab7, and Rab11 in the development and wiring of olfactory projection neurons in Drosophila. This convincing descriptive study provides complementary approaches to Rab expression and localization profiling, conventional dominant-negative mutants, and clonal loss-of-function mutants to address the roles of different endosomal trafficking pathways across circuit development. They show distinct distributions and phenotypes for different Rabs. Overall, the study sets the stage for future mechanistic studies in this well-defined central neuron.

      Strengths:

      Beautiful imaging in central neurons demonstrates differential roles of 3 key Rab proteins in neuronal morphogenesis, as well as interesting patterns of subcellular endosome distribution. These descriptions will be critical for future mechanistic studies. The cell biology is well-written and explanatory, very accessible to a wide audience without sacrificing technical accuracy.

      Weaknesses:

      The Drosophila manipulations require more explanation in the main text to reach a wide audience.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed at a comprehensive phenotypic characterization of the roles of all Rab proteins expressed in PN neurons in the developing Drosophila olfactory system. Important data are shown for a number of these Rabs with small/no phenotypes (in the Supplements) as well as the main endosomal Rabs, Rab5, 7, and 11 in the main figures.

      Strengths:

      The mosaic analysis is a great strength, allowing visualization of small clones or single neuron morphologies. This also allows some assessment of the cell autonomy of the observed phenotypes. The impact of the work lies in the comprehensiveness of the experiments. The rescue experiments are a strength.

      Weaknesses:

      The main weakness is that the experiments do not address the mechanisms that are affected by the loss of these Rab proteins, especially in terms of the most significant cargos. The insights thus do not extend far beyond what is already known from other work in many systems.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Dong et al. present an in-depth analysis of mutant phenotypes of the Rab GTPases Rab5, Rab7, and Rab11 in Drosophila second-order olfactory neuron development. These three Rab GTPases are amongst the best-characterized Rab GTPases in eukaryotes and have been associated with major roles in early endosomes, late endosomes, and recycling endosomes, respectively. All three have been investigated in Drosophila neurons before; however, this study provides the most detailed characterization and comparison of mutant phenotypes for axonal and dendritic development of fly projection neurons to date. In addition, the authors provide excellent high-resolution data on the distribution of each of the three Rabs in developmental analyses.

      Strengths:

      The strength of the work lies in the detailed characterization and comparison of the different Rab mutants on projection neuron development, with clear differences for the three Rabs and by inference for the early, late, and recycling endosomal functions executed by each.

      We would like to thank Reviewer #1 for their appreciation of our characterization of distinct Rab mutants.

      Weaknesses:

      Some weakness derives from the fact that Rab5, Rab7, and Rab11 are, as acknowledged by the authors, somewhat pleiotropic, and their actual roles in projection neuron development are not addressed beyond the characterization of (mostly adult) mutant phenotypes and developmental expression.

      Prior to mid-pupal stage (around 48 hours after puparium formation), glomeruli in the antennal lobe have not yet assumed their stereotyped positions, which complicates analyses and interpretation; thus, many of our analyses are conducted at the adult stage. For Rab11 mutants we did perform many developmental analyses to evaluate the origins of the axonal development (Figure 6—figure supplement 1) and dendrite elaboration phenotypes (Figure 5 J–L) we observed at the adult stage. We realize that the development axonal analyses are in supplemental material where they could be missed. Given the reviewer’s comments, we will move these data to the main figures.

      Further, we will extend our Rab5 analyses to evaluate the function of this protein during development in experiments we will add to the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study by Dong et al. characterizes the roles of highly-expressed Rab GTPases Rab5, Rab7, and Rab11 in the development and wiring of olfactory projection neurons in Drosophila. This convincing descriptive study provides complementary approaches to Rab expression and localization profiling, conventional dominant-negative mutants, and clonal loss-of-function mutants to address the roles of different endosomal trafficking pathways across circuit development. They show distinct distributions and phenotypes for different Rabs. Overall, the study sets the stage for future mechanistic studies in this well-defined central neuron.

      We appreciate Reviewer #2’s analysis of our work and thank them for their suggestions to improve the clarity of our manuscript.

      Strengths:

      Beautiful imaging in central neurons demonstrates differential roles of 3 key Rab proteins in neuronal morphogenesis, as well as interesting patterns of subcellular endosome distribution. These descriptions will be critical for future mechanistic studies. The cell biology is well-written and explanatory, very accessible to a wide audience without sacrificing technical accuracy.

      Weaknesses:

      The Drosophila manipulations require more explanation in the main text to reach a wide audience.

      In our revised manuscript we will clarify the fly-specific manipulations and terminology to make our work more accessible to a broader audience.  

      Reviewer #3 (Public review):

      Summary:

      The authors aimed at a comprehensive phenotypic characterization of the roles of all Rab proteins expressed in PN neurons in the developing Drosophila olfactory system. Important data are shown for a number of these Rabs with small/no phenotypes (in the Supplements) as well as the main endosomal Rabs, Rab5, 7, and 11 in the main figures.

      We appreciate Reviewer #3’s assessment and appreciation of our work.

      Strengths:

      The mosaic analysis is a great strength, allowing visualization of small clones or single neuron morphologies. This also allows some assessment of the cell autonomy of the observed phenotypes. The impact of the work lies in the comprehensiveness of the experiments. The rescue experiments are a strength.

      Weaknesses:

      The main weakness is that the experiments do not address the mechanisms that are affected by the loss of these Rab proteins, especially in terms of the most significant cargos. The insights thus do not extend far beyond what is already known from other work in many systems.

      We understand this critique and are also interested in the specific cargos regulated by each Rab during development. We attempted to use antibodies to evaluate changes in cell-surface protein localization in response to disrupting individual Rabs but were unable to reliably distinguish(?) shifts in association with specific endosomal compartments. Many available antibodies label cell-surface proteins expressed in antennal lobe cells beyond projection neurons (such as olfactory receptor neurons, glia, or local interneurons) which complicates analyses. Further, although we have produced multiple ‘flp-on’ tags for PN cell-surface proteins, they cannot be used with the MARCM system. This prevents us from simultaneously perturbing individual Rabs and tracking corresponding changes in surface-protein localization with single cell resolution. Moreover, for proteins that are not highly endocytosed, it is difficult to separate plasma-membrane from endosomal localization, and we currently do not know which cell-surface proteins are most robustly endocytosed. Thus, while we share the reviewer’s interest in identifying candidate cargos, technological limitations make it difficult to achieve this goal within the scope of the current study.

    1. eLife Assessment

      This valuable study uses mathematical modeling and analysis to address the question of how neural circuits generate distinct low-dimensional, sequential neural dynamics that can change on fast, behaviorally relevant timescales. The authors propose a circuit model in which spatially heterogeneous inhibition constrains network dynamics to sequential activity on distinct neural subspaces and allows top-down sequence selection on fast timescales. The study convincingly demonstrates how this mechanism could operate and makes predictions about connectivity patterns and dynamics.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that targeted inhibition can turn on and off different sections of networks that produce sequential activity. These network sections may overlap under random assumptions, with the percent of gated neurons being the key parameter explored. The networks produce sequences of activity through drifting bump attractor dynamics embedded in 1D ring attractors or in 2D spaces. Derivations of eigenvalue spectra of the masked connectivity matrix are supported by simulations that include rate and spiking models. The paper is of interest to neuroscientists interested in sequences of activity and their relationship to neural manifolds and gating.

      Strengths:

      (1) The study convincingly shows preservation and switching of single sequences under inhibitory gating. It also explores overlap across stored subspaces.

      (2) The paper deals with fast switching of cortical dynamics, on the scale of 10ms, which is commonly observed in experimental data, but rarely addressed in theoretical work.

      (3) The introduction of winner-take-all dynamics is a good illustration of how such a mechanism could be leveraged for computations.

      (4) The progression from simple 1D rate to 2D spiking models carries over well the intuitions.

      (5) The derivations are clear, and the simulations support them. Code is publicly available.

      Weaknesses:

      (1) The inhibitory mechanism is mostly orthogonal to sequences: beyond showing that bump attractors survive partial silencing, the paper adds nothing on observed sequence properties or biological implications of these silenced sequences. The references clump together very different experimental sequences (from the mouse olfactory bulb to turtle spinal chord or rat hippocampus) with strongly varying spiking statistics and little evidence of targeted inhibitory gating. The study would benefit from focusing on fewer cases of sequences in more detail and what their mechanism would mean there.

      (2) The paper does not address the simultaneous expression of sequences either in the results or the discussion. This seems biologically relevant (e.g., Dechery & MacLean, 2017) and potentially critical to the proposed mechanism as it could lead to severe interference and decoding limitations.

      (3) The authors describe the mechanism as "rotating a neuronal space". In reality, it is not a rotation but a projection: a lossy transformation that skews the manifold. The two terms (rotation and projection) are used interchangeably in the text, which is misleading. It is also misrepresented in Figure 1de. Beyond being mathematically imprecise in the Results, this is a missed opportunity in the Discussion: could rotational dynamics in the data actually be projections introduced by inhibitory gating?

      (4) The authors also refer to their mechanism as "blanket of inhibition with holes". That term typically refers to disinhibitory mechanisms (the holes; for instance, VIP-SOM interactions in Karnani et al, 2014). In reality, the inhibition in the paper targets the excitatory neurons (all schematics), which makes the terminology and links to SOM-VIP incorrect. Other terms like "clustered" and "selective" inhibition are also used extensively and interchangeably, but have many connotations in neuroscience (clustered synapses, feature selectivity). The paper would benefit from a single, consistent term for its targeted inhibition mechanism.

      (5) Discussion of this mechanism in relation to theoretical work on gating of propagating signals (e.g., Vogels & Abbott 2009, among others) seems highly relevant but is missing.

      (6) Schematics throughout give the wrong intuition about the network model: Colors and arrows suggest single E/I neurons that follow Dale's rule and have no autapses. None of this is true (Figure 2b W). Autapses are actually required for the eigenvalue derivation (Equation 11).

    3. Reviewer #2 (Public review):

      Summary:

      In "Spatially heterogeneous inhibition projects sequential activity onto unique neural subspaces", Lehr et al. address the question of how neural circuits generate distinct low-dimensional, sequential neural dynamics that can shift to different neural subspaces on fast, behaviorally relevant timescales.

      Lehr et al. propose a circuit architecture in which spatially heterogeneous inhibition constrains network dynamics to sequential activity on distinct neural subspaces and allows top-down sequence selection on fast timescales. Two types of inhibitory interneurons play separate roles. One class of interneuron balances excitation and contributes to sequence propagation. The second class of interneuron forms spatially heterogeneous, clustered inhibition that projects onto the sequence-generating portion of the circuit and suppresses all but a subset of the sequential activity, thus driving sequence selection. Due to the random nature of the inhibitory projections from each inhibitory cluster, the selected sequences exist on well-separated neural subspaces, provided the 'selection' inhibition is sufficiently dense. Lehr et al. use mathematical analysis and computational modeling to study this type of circuit mechanism in two contexts: a 1D ring network and a 2D, locally connected, spiking network. This work connects to previous literature, which considers the role of selective inhibition in shaping and restructuring sequential dynamics.

      Strengths:

      (1) This study makes testable predictions about the connectivity patterns for the two types of interneurons contributing to sequence generation and sequence selection.

      (2) This study proposes a relatively simple circuit motif that can generate many distinct, low-dimensional neural sequences that can vary dynamically on fast, behaviorally relevant timescales. The authors make a clear analytical argument for the stability and structure of the dynamics of the sub-sequences.

      (3) This study applies the inhibitory selection mechanisms in two different model network contexts: a 1D rate model and a 2D spiking model. Both settings have local connectivity patterns and two inhibitory pools but differ in several significant ways, which supports the generality of the proposed mechanism.

      Weaknesses:

      (1) Scaling synaptic weights to match the original sequence dynamics is a complex requirement for this mechanism. In the 2D network, the solution to this scaling issue is the saturation of single-unit firing rates. It is unclear if this is in a biologically relevant dynamical regime or to what degree the saturation dynamics of the sequences themselves are altered by the density of selective inhibition.

      (2) In the 2D model, although the sequence-generating circuit is quite general, the heterogenous interneuron population requires a tuned connectivity structure paired with matched external inputs. In particular, the requirement that inhibitory pools project to shared but random excitatory neurons would benefit from a discussion about the biological feasibility of this architecture.

    4. Reviewer #3 (Public review):

      Summary:

      The study investigates the control of the subspaces in which sequences propagate, through static external and dynamic self-generated inhibition. For this, it first uses a 1D ring model with an asymmetry in the weights to evoke a drift of its bump. This model is studied in detail, showing and explaining that the trajectories take place in different subspaces due to the inhibition of different sets of contributing neurons. Sequence propagation is preserved, even if large numbers of neurons are silenced. In this regime, trajectories are restricted to near-orthogonal subspaces of neuronal activity space. The last part of the results shows that similar phenomena can be observed in a 2D spiking neural network model.

      Strengths:

      The results are important and convincing, and the analyses give a good further insight into the phenomena. The interpretation of inhibited networks as near-circulant is very elucidating. The sparsening by dynamically maintained winner-takes-all inhibition and the transfer to a 2D spiking model are particularly nice results.

      Weaknesses:

      I see no major weaknesses, except that some crucial literature has not yet been mentioned and discussed. Further, Figure 2c might raise doubts whether the sequences are indeed reliable for the largest amount of sparsening inhibition considered, and it is not yet clear whether the dynamical regime of the 2D model is biologically plausible.

    1. eLife Assessment

      This manuscript presents a valuable antiviral approach using an engineered ACE2-Fc fusion protein that demonstrates broad-spectrum neutralization capacity against SARS-CoV-2 variants and achieves significant prophylactic protection in animal models through a novel Fc-mediated phagocytosis mechanism. The study provides convincing evidence for protective efficacy through rigorous in vivo validation in mice, mechanistic characterization via biodistribution studies and macrophage depletion assays, and demonstration of antibody-dependent cellular phagocytosis as the primary clearance mechanism. However, there are some gaps that require attention, including the need for comparison with a previously reported ACE2 decamer, inclusion of control molecules, insufficient discussion of potential limitations such as off-target binding and immunogenicity risks, and lack of clarity regarding certain methodological aspects.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. describes the development of an optimized soluble ACE2-Fc fusion protein, B5-D3, for intranasal prophylaxis against SARS-CoV-2. As shown, B5-D3 conferred protection not only by acting as a neutralizing decoy, but also by redirecting virus-decoy complexes to phagocytic cells for lysosomal degradation. The authors showed complete in vivo protection in K18-hACE2 mice and investigated the underlying mechanism by a combination of Fc-mutant controls, transcriptomics, biodistribution studies, and in vitro assays.

      Strengths:

      The major strength of this work is the identification of a novel antiviral approach with broad-spectrum and beyond simple neutralization. Mutant ACE2 enables broad and potent binding activity with the S proteins of SARS-CoV-2 variants, while the fused Fc part mediates phagocytosis to clear the viral particles. The conceptual advance of this ACE2-Fc combination is convincingly validated by in vivo protection data and by the completely abrogated protection of Fc LALA mutant.

      Weaknesses:

      Some aspects could be further modified.

      (1) A previously reported ACE2 decamer (DOI: 10.1080/22221751.2023.2275598) needs to be mentioned and compared in the Discussion part.

      (2) Limitations of this study, such as off-target binding and potential immunogenicity, should also be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. engineered an optimized ACE2 mutant by introducing two mutations (T92Q and H374N) and fused this ACE2 mutant to human IgG1-Fc (B5-D3). Experimental results suggest that B5-D3 exhibits broad-spectrum neutralization capacity and confers effective protection upon intranasal administration in SARS-CoV-2-infected K18-hACE2 mice. Transcriptomic analysis suggests that B5-D3 induces early immune activation in lung tissues of infected mice. Fluorescence-based bio-distribution assay further indicates rapid accumulation of B5-D3 in the respiratory tract, particularly in airway macrophages. Further investigation shows that B5-D3 promotes viral phagocytic clearance by macrophages via an Fc-mediated effector function, namely antibody-dependent cellular phagocytosis (ADCP), while simultaneously blocking ACE2-mediated viral infection in epithelial cells. These results provide insights into improving decoy treatments against SARS-CoV-2 and other potential respiratory viruses.

      Strengths:

      The protective effect of this ACE2-Fc fusion protein against SARS-CoV-2 infection has been evaluated in a quite comprehensive way.

      Weaknesses:

      (1) The paper lacks an explanation regarding the reason for the combination of mutations listed in Supplementary Figure 2b. For example, for the mutations that enhance spike protein binding, B2-B6 does not fully align with the mutations listed in Table S1 of Reference 4, yet no specific criteria are provided. Second, for the mutations that abolished enzymatic activity, while D1 and D2, D3, D4, and D5 are cited from References 12, 11, and 33, respectively, the reason for combining D3 and D4 into A2, and D1 and D2 into A3 remains unexplained. It is also unclear whether some of these other possible combinations have been tested. Furthermore, for the B5-derived mutations, only double-mutant combinations with D1-D5 are tested, with no attempt made to evaluate triple mutations involving A2 or A3.

      (2) Figures 1b, 1d, and 1e lack statistical analyses, making it difficult to determine whether B5 and D3 exhibit significant advantages. For Wuhan-Hu-1 strain, B2 and B5 are similar, and for D614G strain, B2, B3, B4, B5, and B6 display comparable results. However, only the glycosylation-related single mutant B5 is chosen for further combinatorial constructs. Moreover, for VOC/VOI strains, B5 is superior to B5-D3; for the Alpha strain, B5-D4 and B5-D5 are superior to B5-D3; and for the Delta and Lambda strains, B5-D5 is superior to B5-D3. These observations further highlight the need for a clearer explanation of the selection strategy.

      (3) Figure 1e does not specify the construct form of the control hIgG1, namely whether it is an hIgG1 Fc fragment or a full-length hIgG1 protein. If the full-length form is used, the design of its Fab region should be clarified to ensure the accuracy and comparability of the experimental control.

      (4) In Figure 2a, all three PBS control mice died, whereas in Figure 2f, three out of five PBS control mice died, with the remaining showing gradual weight recovery. This discrepancy may reflect individual immune variations within the control groups, and it is necessary to clarify whether potential autoimmune factors could have affected the comparability of the results. Also, the mouse experiments suffer from insufficient sample sizes, which affects the statistical power and reliability of the results. In Figure 2a, each group contains only 4 replicates, one of which was used for lung tissue sampling. As a result, body weight monitoring data is derived from only 3 mice per group (the figure legend indicating n=4 should be corrected to n=3). Such a small sample size limits the robustness of the conclusions. Similarly, in Figure 2f, although each group has 5 replicates, body weight data are presented for only 4 mice, with no explanation provided for the exclusion of the fifth mouse. Furthermore, the lung tissue experiments in Figure 3a include only 3 replicates, which is also inadequate.

      (5) Compared to 6 hours, intranasal administration of B5-D3 at 24 hours before viral infection results in reduced protective efficacy. However, only survival and body weight data are provided, with no supporting evidence from virological assays such as viral titer measurement. Therefore, the long-term effectiveness lacks sufficient experimental validation.

      (6) In Figures 3b and 3c, viral spike (S) and nucleocapsid (N) RNA relative expression levels are quantified by qPCR. The results show significant individual variation within the B5-D3-LALA treatment group: one mouse exhibits high S and N expression, while the other two show low expression. Viral load levels are also inconsistent: two mice have high viral loads, and one has a low viral load. Due to this variability, the available data are insufficient to robustly support the conclusion.

      (7) Figure 3e: "H&E staining indicated alveolar thickening in all groups," including the Mock group. Since the Mock group did not receive virus or active drug treatment, this observed change may result from local tissue reaction induced by the intranasal inoculation procedure itself, rather than specific immune activation. A control group (no manipulation) should be set to rule out potential confounding effects of the experimental procedure on tissue morphology, thereby allowing a more accurate assessment of the drug's effects.

      (8) In Supplementary Figure 11b, a considerable number of alveolar macrophages (AMs) are observed in both the PBS and B5-D3 groups. This makes it difficult to determine whether the observed accumulation is specifically induced by B5-D3.

      (9) In the flow cytometry experiment shown in Figure 5, the PBS control group is not labeled with AF750, which necessarily results in a value of zero for "B5-D3+ cells" on the y-axis. An appropriate control (e.g., hIgG1-Fc labeled with AF750) should be included.

      (10) The Methods section: a more detailed description of the experimental procedures involving HIV p24 and SARS-CoV-2 should be included.

    4. Reviewer #3 (Public review):

      Strengths:

      The core strength of this study lies in its innovative demonstration that an engineered sACE2-Fc fusion redirects virus-decoy complexes to Fc-mediated phagocytosis and lysosomal clearance in macrophages, revealing a distinct antiviral mechanism beyond traditional neutralization. Its complete prophylactic protection in animal models and precise targeting of airway phagocytes establish a novel therapeutic paradigm against SARS-CoV-2 variants and future respiratory viruses.

      Weaknesses:

      The study attributes the complete antiviral protection to Fc-mediated phagocytic clearance, a central claim that requires more rigorous experimental validation. The observation that abrogating Fc functions compromises protection could be confounded by potential alterations in the protein's stability, half-life, or overall structure. To firmly establish this mechanism, it is crucial to include a control molecule with a mutated Fc region that lacks FcγR binding while preserving the Fc structure itself. Without this critical control, the conclusion that phagocytic clearance is the primary mechanism remains inadequately supported. The strategy of deliberately targeting virus-decoy complexes to phagocytes via Fc receptors inherently raises the question of Antibody-Dependent Enhancement (ADE) of disease. While the authors demonstrate a lack of productive infection in macrophages, this only addresses one facet of ADE. The risk of Fc-mediated exacerbation of inflammation (ADE) remains a critical concern. The manuscript would be significantly strengthened by a direct discussion of this risk and by including data, such as cytokine profiling from treated macrophages, to more comprehensively address the safety profile of this approach. The exclusive use of the K18-hACE2 mouse model, which exhibits severe disease, limits the generalizability of the findings. The "complete protection" observed may not translate to models with more robust and naturalistic immune responses or to human physiology. Furthermore, the lack of data on circulating SARS-CoV-2 variants is a concern. The concept of sACE2-Fc fusion proteins as decoy receptors is not novel, and numerous similar constructs have been previously reported. The manuscript would benefit from a clearer demonstration of how the optimized B5-D3 mutant represents a significant advance over existing sACE2-Fc designs. A direct comparative analysis with previously published benchmarks, particularly in terms of neutralizing potency, Fc effector function strength, and in vivo efficacy, is necessary to establish the incremental value and novelty of this specific agent.

    1. eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

    1. eLife Assessment

      This is an important study on how dissociable emotions of shame and guilt emerge from cognitive processes and guide behavioral responses. The task is well designed and yields compelling behavioral, computational, and neural evidence elucidating the cognitive link between emotions and compensatory decisions. The work has broad theoretical and practical implications across a range of disciplines concerned with human behavior, including psychology, neuroscience, economics, public policy, and psychiatry.

    2. Reviewer #1 (Public review):

      This work provides important new evidence of the cognitive and neural mechanisms that give rise to feelings of shame and guilt, as well as their transformation into compensatory behavior. The authors use a well-designed interpersonal task to manipulate responsibility and harm, eliciting varying levels of shame and guilt in participants. The study combines behavioral, computational, and neuroimaging approaches to offer a comprehensive account of how these emotions are experienced and acted upon. Notably, the findings reveal distinct patterns in how harm and responsibility contribute to guilt and shame and how these factors are integrated into compensatory decision-making.

      Strengths:

      • Investigating both guilt and shame in a single experimental framework allows for a direct comparison of their behavioral and neural effects while minimizing confounds

      • The study provides a novel contribution to the literature by exploring the neural bases underlying the conversion of shame into behavior

      • The task is creative and ecologically valid, simulating a realistic social situation while retaining experimental control

      • Computational modeling and fMRI analysis yield converging evidence for a quotient-based integration of harm and responsibility in guiding compensatory behavior

      Limitations:

      The authors address the study's limitations and offer well-reasoned explanations for their methodological choices.

      The conclusions of the paper are well supported by the data. It would be valuable for future studies to validate these findings using alternative tasks or paradigms, to ensure the robustness and generalizability of the observed behavioral and neural mechanisms. Overall, this is a well-executed and insightful study that makes a meaningful contribution to understanding the cognitive and neural mechanisms underlying guilt and shame.

    3. Reviewer #2 (Public review):

      Summary:

      The authors combined behavioral experiments, computational modeling, and functional magnetic resonance imaging (fMRI) to investigate the psychological and neural mechanisms underlying guilt, shame, and the altruistic behaviors driven by these emotions. The results revealed that guilt is more strongly associated with harm, whereas shame is more closely linked to responsibility. Compared to shame, guilt elicited a higher level of altruistic behavior. Computational modeling demonstrated how individuals integrate information about harm and responsibility. The fMRI findings identified a set of brain regions involved in representing harm and responsibility, transforming responsibility into feelings of shame, converting guilt and shame into altruistic actions, and mediating the effect of trait guilt on compensatory behavior.

      Strengths:

      This study offers a significant contribution to the literature on social emotions by moving beyond prior research that typically focused on isolated aspects of guilt and shame. The study presents a comprehensive examination of these emotions, encompassing their cognitive antecedents, affective experiences, behavioral consequences, trait-level characteristics, and neural correlates. The authors have introduce a novel experimental task that enables such a systematic investigation and holds strong potential for future research applications. The computational modeling procedures were implemented in accordance with current field standards. The findings are rich and offer meaningful theoretical insights. The manuscript is well written, and the results are clearly and logically presented.

      Weaknesses:

      In this study, participants' feelings of guilt and shame were assessed retrospectively, after they had completed all altruistic decision-making tasks. This reliance on memory-based self-reports may introduce recall bias, potentially compromising the accuracy of the emotion measurements.

      In many behavioral economic models, self-interest plays a central role in shaping individual decision-making, including moral decisions. However, the model comparison results in this study suggest that models without a self-interest component (such as Model 1.3) outperform those that incorporate it (such as Model 1.1 and Model 1.2). The authors have not provided a satisfactory explanation for this counterintuitive finding.

      The phrases "individuals integrate harm and responsibility in the form of a quotient" and "harm and responsibility are integrated in the form of a quotient" appear in the Abstract and Discussion sections. However, based on the results of the computational modeling, it is more accurate to state that "harm and the number of wrongdoers are integrated in the form of a quotient." The current phrasing misleadingly suggests that participants represent information as harm divided by responsibility, which does not align with the modeling results. This potentially confusing expression should be revised for clarity and accuracy.

      In the Discussion, the authors state: "Since no brain region associated social cognition showed significant responses to harm or responsibility, it appears that human brain encodes a unified measure integrating harm and responsibility (i.e., the quotient) rather than processing them as separate entities when both are relevant to subsequent emotional experience and decision-making." However, this interpretation overstates the implications of the null fMRI findings. The absence of significant activation in response to harm or responsibility does not necessarily imply that the brain does not represent these dimensions separately. Null results can arise from various factors, including limitations in the sensitivity of fMRI. It is possible that more fine-grained techniques, such as intracranial electrophysiological recordings, could reveal distinct neural representations of harm and responsibility. The interpretation of these null findings should be made with greater caution.

      For the revised manuscript, the authors have provided additional evidence and clarified expressions. all the comments were responded. I have no further comments.

    4. Reviewer #3 (Public review):

      Summary:

      Zhu et al. set out to elucidate how the moral emotions of guilt and shame emerge from specific cognitive antecedents - harm and responsibility - and how these emotions subsequently drive compensatory behavior. Consistent with their prediction derived from functionalist theories of emotion, their behavioral findings indicate that guilt is more influenced by harm, whereas shame is more influenced by responsibility. In line with previous research, their results also demonstrate that guilt has a stronger facilitating effect on compensatory behavior than shame. Furthermore, computational modeling and neuroimaging results suggest that individuals integrate harm and responsibility information into a composite representation of the individual's share of the harm caused. Brain areas such as the striatum, insula, temporoparietal junction, lateral prefrontal cortex, and cingulate cortex were implicated in distinct stages of the processing of guilt and/or shame. In general, this work makes an important contribution to the field of moral emotions. Its impact could be further enhanced by clarifying methodological details, offering a more nuanced interpretation of the findings, and discussing their potential practical implications in greater depth.

      Strengths:

      First, this work conceptualizes guilt and shame as processes unfolding across distinct stages (cognitive appraisal, emotional experience, and behavioral response) and investigates the psychological and neural characteristics associated with their transitions from one stage to the next.

      Second, the well-designed experiment effectively manipulates harm and responsibility - two critical antecedents of guilt and shame.

      Third, the findings deepen our understanding of the mechanisms underlying guilt and shame beyond what has been established in previous research.

      Comments on revisions:

      The authors have addressed the issues I raised in the previous review. I have no more comments on the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      This work provides important new evidence of the cognitive and neural mechanisms that give rise to feelings of shame and guilt, as well as their transformation into compensatory behavior. The authors use a well-designed interpersonal task to manipulate responsibility and harm, eliciting varying levels of shame and guilt in participants. The study combines behavioral, computational, and neuroimaging approaches to offer a comprehensive account of how these emotions are experienced and acted upon. Notably, the findings reveal distinct patterns in how harm and responsibility contribute to guilt and shame and how these factors are integrated into compensatory decision-making.

      Strengths

      (1) Investigating both guilt and shame in a single experimental framework allows for a direct comparison of their behavioral and neural effects while minimizing confounds.

      (2) The study provides a novel contribution to the literature by exploring the neural bases underlying the conversion of shame into behavior.

      (3) The task is creative and ecologically valid, simulating a realistic social situation while retaining experimental control.

      (4) Computational modeling and fMRI analysis yield converging evidence for a quotient-based integration of harm and responsibility in guiding compensatory behavior.

      We are grateful for your thoughtful summary of our work’s strengths and greatly appreciate these positive words.

      We would like to note that, in accordance with the journal’s requirements, we have uploaded both a clean version of the revised manuscript and a version with all modifications highlighted in blue.

      Weakness

      (1) Post-experimental self-reports rely both on memory and on the understanding of the conceptual difference between the two emotions. Additionally, it is unclear whether the 16 scenarios were presented in random order; sequential presentation could have introduced contrast effects or demand characteristics.

      Thank you for pointing out the two limitations of the experimental paradigm. We fully agree with your point. Participants recalled and reported their feelings of guilt and shame immediately after completing the task, which likely ensured reasonably accurate state reports. We acknowledge, however, that in-task assessments might provide greater precision. We opted against them to examine altruistic decision-making in a more natural context, as in-task assessments could have heightened participants’ awareness of guilt and shame and biased their altruistic decisions. Post-task assessments also reduced fMRI scanning time, minimizing discomfort from prolonged immobility and thereby preserving data quality.

      In the present study, assessing guilt and shame required participants to distinguish conceptually between the two emotions. Most research with adult participants has adopted this approach, relying on direct self-reports of emotional intensity under the assumption that adults can differentiate between guilt and shame (Michl et al., 2014; Wagner et al., 2011; Zhu et al., 2019). However, we acknowledge that this approach may be less suitable for studies involving children, who may not yet have a clear understanding of the distinction between guilt and shame.

      The limitations have been added into the Discussion section (Page 47): “This research has several limitations. First, post-task assessments of guilt and shame, unlike in-task assessments, rely on memory and may thus be less precise, although in-task assessments could have heightened participants’ awareness of these emotions and biased their decisions. Second, our measures of guilt and shame depend on participants’ conceptual understanding of the two emotions. While this is common practice in studies with adult participants (Michl et al., 2014; Wagner et al., 2011; Zhu et al., 2019), it may be less appropriate for research involving children.”

      We apologize for the confusion. The 16 scenarios were presented in a random order. We have clarified this in the revised manuscript (Page 13): “After the interpersonal game, the outcomes of the experimental trials were re-presented in a random order.”

      (2) In the neural analysis of emotion sensitivity, the authors identify brain regions correlated with responsibility-driven shame sensitivity and then use those brain regions as masks to test whether they were more involved in the responsibility-driven shame sensitivity than the other types of emotion sensitivity. I wonder if this is biasing the results. Would it be better to use a cross-validation approach? A similar issue might arise in "Activation analysis (neural basis of compensatory sensitivity)." 

      Thank you for this valuable comment. We replaced the original analyses with a leave-one-subject-out (LOSO) cross-validation approach, which minimizes bias in secondary tests due to non-independence (Esterman et al., 2010). The findings were largely consistent with the original results, except that two previously significant effects became marginally significant (one effect changed from P = 0.012 to P = 0.053; the other from P = 0.044 to P = 0.062). Although we believe the new results do not alter our main conclusions, marginally significant findings should be interpreted with caution. We have noted this point in the Discussion section (Page 48): “… marginally significant results should be viewed cautiously and warrant further examination in future studies with larger sample sizes.”

      In the revised manuscript, we have described the cross-validation procedure in detail and reported the corresponding results. Please see the Method section, Page 23: “The results showed that the neural responses in the temporoparietal junction/superior temporal sulcus (TPJ/STS) and precentral cortex/postcentral cortex/supplementary motor area (PRC/POC/SMA) were negatively correlated with the responsibility-driven shame sensitivity. To test whether these regions were more involved in responsibilitydriven shame sensitivity than in other types of emotion sensitivity, we implemented a leave-one-subject-out (LOSO) cross-validation procedure (e.g., Esterman et al., 2010). In each fold, clusters in the TPJ/STS and PRC/POC/SMA showing significant correlations with responsibility-driven shame sensitivity were identified at the group level based on N-1 participants. These clusters, defined as regions of interest (ROI), were then applied to the left-out participant, from whom we extracted the mean parameter estimates (i.e., neural response values). If, in a given fold, no suprathreshold cluster was detected within the TPJ/STS or PRC/POC/SMA after correction, or if the two regions merged into a single cluster that could not be separated, the corresponding value was coded as missing. Repeating this procedure across all folds yielded an independent set of ROI-based estimates for each participant. In the LOSO crossvalidation procedure, the TPJ/STS and PRC/POC/SMA merged into a single inseparable cluster in two folds, and no suprathreshold cluster was detected within the TPJ/STS in one fold. These instances were coded as missing, resulting in valid data from 39 participants for the TPJ/STS and 40 participants for the PRC/POC/SMA. We then correlated these estimates with all four types of emotion sensitivities and compared the correlation with responsibility-driven shame sensitivity against those with the other sensitivities using Z tests (Pearson and Filon's Z).” and Page 24: “To directly test whether these regions were more involved in one of the two types of compensatory sensitivity, we applied the same LOSO cross-validation procedure described above. In this procedure, no suprathreshold cluster was detected within the LPFC in one fold and within the TP in 27 folds. These cases were coded as missing, resulting in valid data from 42 participants for the bilateral IPL, 41 participants for the LPFC, and 15 participants for the TP. The limited sample size for the TP likely reflects that its effect was only marginally above the correction threshold, such that the reduced power in cross-validation often rendered it nonsignificant. Because the sample size for the TP was too small and the results may therefore be unreliable, we did not pursue further analyses for this region. The independent ROI-based estimates were then correlated with both guilt-driven and shame-driven compensatory sensitivities, and the strength of the correlations was compared using Z tests (Pearson and Filon's Z).”

      Please see the Results section, Pages 34 and 35: “To assess whether these brain regions were specifically involved in responsibility-driven shame sensitivity, we compared the Pearson correlations between their activity and all types of emotion sensitivities. The results demonstrated the domain specificity of these regions, by revealing that the TPJ/STS cluster had significantly stronger negative responses to responsibility-driven shame sensitivity than to responsibility-driven guilt sensitivity (Z = 2.44, P = 0.015) and harm-driven shame sensitivity (Z = 3.38, P < 0.001), and a marginally stronger negative response to harm-driven guilt sensitivity (Z = 1.87, P = 0.062) (Figure 4C; Supplementary Table 14). In addition, the sensorimotor areas (i.e., precentral cortex (PRC), postcentral cortex (POC), and supplementary motor area (SMA)) exhibited the similar activation pattern as the TPJ/STS (Figure 4B and 4C; Supplementary Tables 13 and 14).” and Page 35: “The results revealed that the left LPFC was more engaged in shame-driven compensatory sensitivity (Z = 1.93, P = 0.053), as its activity showed a marginally stronger positive correlation with shamedriven sensitivity than with guilt-driven sensitivity (Figure 5C). No significant difference was found in the Pearson correlations between the activity of the bilateral IPL and the two types of sensitivities (Supplementary Table 16). For the TP, the effective sample size was too small to yield reliable results (see Methods).”

      (1) Regarding the traits of guilt and shame, I appreciate using the scores from the subscales (evaluations and action tendencies) separately for the analyses (instead of a composite score). An issue with using the actions subscales when measuring guilt and shame proneness is that the behavioral tendencies for each emotion get conflated with their definitions, risking circularity. It is reassuring that the behavior evaluation subscale was significantly correlated with compensatory behavior (not only the action tendencies subscale). However, the absence of significant neural correlates for the behavior evaluation subscale raises questions: Do the authors have thoughts on why this might be the case, and any implications?

      We are grateful for this important comment. According to the Guilt and Shame Proneness Scale, trait guilt comprises two dimensions: negative behavior evaluations and repair action tendencies (Cohen et al., 2011). Behaviorally, both dimensions were significantly correlated with participants’ compensatory behavior (negative behavior evaluations: R = 0.39, P = 0.010; repair action tendencies: R = 0.33, P = 0.030). Neurally, while repair action tendencies were significantly associated with activity in the aMCC and other brain areas, negative behavior evaluations showed no significant neural correlates. The absence of significant neural correlates for negative behavior evaluations may be due to several factors. In addition to common explanations (e.g., limited sample size reducing the power to detect weak neural correlates or subtle effects obscured by fMRI noise), another possibility is that this dimension influences neural responses indirectly through intermediate processes not captured in our study (e.g., specific motivational states). We have added a discussion of the non-significant result to the revised manuscript (Page 47): “However, the neural correlates of negative behavior evaluations (another dimension of trait guilt) were absent. The reasons underlying the non-significant neural finding may be multifaceted. One possibility is that negative behavior evaluations influence neural responses indirectly through intermediate processes not captured in our study (e.g., specific motivational states).”

      In addition, to avoid misunderstanding, the revised manuscript specifies at the appropriate places that the neural findings pertain to repair action tendencies rather than to trait guilt in general. For instance, see Pages 46 and 47: “Furthermore, we found neural responses in the aMCC mediated the relationship between repair action tendencies (one dimension of trait guilt) and compensation… Accordingly, our fMRI findings suggest that individuals with stronger tendency to engage in compensation across various moral violation scenarios (indicated by their repair action tendencies) are more sensitive to the severity of the violation and therefore engage in greater compensatory behavior.”

      (2) Regarding the computational model finding that participants seem to disregard selfinterest, do the authors believe it may reflect the relatively small endowment at stake? Do the authors believe this behavior would persist if the stakes were higher?

      Additionally, might the type of harm inflicted (e.g., electric shock vs. less stigmatized/less ethically charged harm like placing a hand in ice-cold water) influence the weight of self-interest in decision-making?

      Taken together, the conclusions of the paper are well supported by the data. It would be valuable for future studies to validate these findings using alternative tasks or paradigms to ensure the robustness and generalizability of the observed behavioral and neural mechanisms.

      Thank you for these important questions. As you suggested, we believe that the relatively small personal stakes in our task (a maximum loss of 5 Chinese yuan) likely explain why the computational model indicated that participants disregarded selfinterest. We also agree that when the harm to others is less morally charged, people may be more inclined to consider self-interest in compensatory decision-making. Overall, the more stigmatized the harm and the smaller the personal stakes, the more likely individuals are to disregard self-interest and focus solely on making appropriate compensation.

      We have added the following passage to the Discussion section (Page 42): “Notably, in many computational models of social decision-making, self-interest plays a crucial role (e.g., Wu et al., 2024). However, our computational findings suggest that participants disregarded self-interest during compensatory decision-making. A possible explanation is that the personal stakes in our task were relatively small (a maximum loss of 5 Chinese yuan), whereas the harm inflicted on the receiver was highly stigmatized (i.e., an electric shock). Under conditions where the harm is highly salient and the cost of compensation is low, participants may be inclined to disregard selfinterest and focus solely on making appropriate compensation.”

      Reviewer #2 (Public review):

      Summary

      The authors combined behavioral experiments, computational modeling, and functional magnetic resonance imaging (fMRI) to investigate the psychological and neural mechanisms underlying guilt, shame, and the altruistic behaviors driven by these emotions. The results revealed that guilt is more strongly associated with harm, whereas shame is more closely linked to responsibility. Compared to shame, guilt elicited a higher level of altruistic behavior. Computational modeling demonstrated how individuals integrate information about harm and responsibility. The fMRI findings identified a set of brain regions involved in representing harm and responsibility, transforming responsibility into feelings of shame, converting guilt and shame into altruistic actions, and mediating the effect of trait guilt on compensatory behavior.

      Strengths

      This study offers a significant contribution to the literature on social emotions by moving beyond prior research that typically focused on isolated aspects of guilt and shame. The study presents a comprehensive examination of these emotions, encompassing their cognitive antecedents, affective experiences, behavioral consequences, trait-level characteristics, and neural correlates. The authors have introduced a novel experimental task that enables such a systematic investigation and holds strong potential for future research applications. The computational modeling procedures were implemented in accordance with current field standards. The findings are rich and offer meaningful theoretical insights. The manuscript is well written, and the results are clearly and logically presented.

      We are thankful for your considerate acknowledgment of our work’s strengths and truly value your positive comments.

      We would like to note that, in accordance with the journal’s requirements, we have uploaded both a clean version of the revised manuscript and a version with all modifications highlighted in blue.

      Weakness

      In this study, participants' feelings of guilt and shame were assessed retrospectively, after they had completed all altruistic decision-making tasks. This reliance on memorybased self-reports may introduce recall bias, potentially compromising the accuracy of the emotion measurements.

      Thank you for this crucial comment. We fully agree that measuring guilt and shame after the task may affect accuracy to some extent. However, because participants reported their emotions immediately after completing the task, we believe their recollections were reasonably accurate. In designing the experiment, we considered intask assessments, but this approach risked heightening participants’ awareness of guilt and shame and thereby interfering with compensatory decisions. After careful consideration, we ultimately chose post-task assessments of these emotions. A similar approach has been adopted in prior research on gratitude, where post-task assessments were also used (Yu et al., 2018).

      In the revised manuscript, we have specified the limitations of both post-task and intask assessments of guilt and shame (Page 47): “… post-task assessments of guilt and shame, unlike in-task assessments, rely on memory and may thus be less precise, although in-task assessments could have heightened participants’ awareness of these emotions and biased their decisions.”.

      In many behavioral economic models, self-interest plays a central role in shaping individual decision-making, including moral decisions. However, the model comparison results in this study suggest that models without a self-interest component (such as Model 1.3) outperform those that incorporate it (such as Model 1.1 and Model 1.2). The authors have not provided a satisfactory explanation for this counterintuitive finding. 

      Thank you for this important comment. In the revised manuscript, we have provided a possible explanation (Page 42): “Notably, in many computational models of social decision-making, self-interest plays a crucial role (e.g., Wu et al., 2024). However, our computational findings suggest that participants disregarded self-interest during compensatory decision-making. A possible explanation is that the personal stakes in our task were relatively small (a maximum loss of 5 Chinese yuan), whereas the harm inflicted on the receiver was highly stigmatized (i.e., an electric shock). Under conditions where the harm is highly salient and the cost of compensation is low, participants may be inclined to disregard self-interest and focus solely on making appropriate compensation.”

      The phrases "individuals integrate harm and responsibility in the form of a quotient" and "harm and responsibility are integrated in the form of a quotient" appear in the Abstract and Discussion sections. However, based on the results of the computational modeling, it is more accurate to state that "harm and the number of wrongdoers are integrated in the form of a quotient." The current phrasing misleadingly suggests that participants represent information as harm divided by responsibility, which does not align with the modeling results. This potentially confusing expression should be revised for clarity and accuracy.

      We sincerely thank you for this helpful suggestion and apologize for the confusion caused. We have removed expressions such as “harm and responsibility are integrated in the form of a quotient” from the manuscript. Instead, we now state more precisely that “harm and the number of wrongdoers are integrated in the form of a quotient.”

      However, in certain contexts we continue to discuss harm and responsibility. Introducing “the number of wrongdoers” in these places would appear abrupt, so we have opted for alternative phrasing. For example, on Page 3, we now write:

      “Computational modeling results indicated that the integration of harm and responsibility by individuals is consistent with the phenomenon of responsibility diffusion.” Similarly, on Page 49, we state: “Notably, harm and responsibility are integrated in a manner consistent with responsibility diffusion prior to influencing guilt-driven and shame-driven compensation.”

      In the Discussion, the authors state: "Since no brain region associated with social cognition showed significant responses to harm or responsibility, it appears that the human brain encodes a unified measure integrating harm and responsibility (i.e., the quotient) rather than processing them as separate entities when both are relevant to subsequent emotional experience and decision-making." However, this interpretation overstates the implications of the null fMRI findings. The absence of significant activation in response to harm or responsibility does not necessarily imply that the brain does not represent these dimensions separately. Null results can arise from various factors, including limitations in the sensitivity of fMRI. It is possible that more finegrained techniques, such as intracranial electrophysiological recordings, could reveal distinct neural representations of harm and responsibility. The interpretation of these null findings should be made with greater caution.

      Thank you for this reminder. In the revised manuscript, we have provided a more cautious interpretation of the results (Page 43): “Although the fMRI findings revealed that no brain region associated with social cognition showed significant responses to harm or responsibility, this does not suggest that the human brain encodes only a unified measure integrating harm and responsibility and does not process them as separate entities. Using more fine-grained techniques, such as intracranial electrophysiological recordings, it may still be possible to observe independent neural representations of harm and responsibility.”

      Reviewer #3 (Public review):

      Summary

      Zhu et al. set out to elucidate how the moral emotions of guilt and shame emerge from specific cognitive antecedents - harm and responsibility - and how these emotions subsequently drive compensatory behavior. Consistent with their prediction derived from functionalist theories of emotion, their behavioral findings indicate that guilt is more influenced by harm, whereas shame is more influenced by responsibility. In line with previous research, their results also demonstrate that guilt has a stronger facilitating effect on compensatory behavior than shame. Furthermore, computational modeling and neuroimaging results suggest that individuals integrate harm and responsibility information into a composite representation of the individual's share of the harm caused. Brain areas such as the striatum, insula, temporoparietal junction, lateral prefrontal cortex, and cingulate cortex were implicated in distinct stages of the processing of guilt and/or shame. In general, this work makes an important contribution to the field of moral emotions. Its impact could be further enhanced by clarifying methodological details, offering a more nuanced interpretation of the findings, and discussing their potential practical implications in greater depth.

      Strengths

      First, this work conceptualizes guilt and shame as processes unfolding across distinct stages (cognitive appraisal, emotional experience, and behavioral response) and investigates the psychological and neural characteristics associated with their transitions from one stage to the next.

      Second, the well-designed experiment effectively manipulates harm and responsibility - two critical antecedents of guilt and shame.

      Third, the findings deepen our understanding of the mechanisms underlying guilt and shame beyond what has been established in previous research.

      We truly appreciate your acknowledgment of our work’s strengths and your encouraging feedback.

      We would like to note that, in accordance with the journal’s requirements, we have uploaded both a clean version of the revised manuscript and a version with all modifications highlighted in blue.

      Weakness

      Over the course of the task, participants may gradually become aware of their high error rate in the dot estimation task. This could lead them to discount their own judgments and become inclined to rely on the choices of other deciders. It is unclear whether participants in the experiment had the opportunity to observe or inquire about others' choices. This point is important, as the compensatory decision-making process may differ depending on whether choices are made independently or influenced by external input.

      Thank you for pointing this out. We apologize for not making the experimental procedure sufficiently clear. Participants (as deciders) were informed that each decider performed the dot estimation independently and was unaware of the estimations made by the other deciders. We now have clarified this point in the revised manuscript (Pages 10 and 11): “Each decider indicated whether the number of dots was more than or less than 20 based on their own estimation by pressing a corresponding button (dots estimation period, < 2.5 s) and was unaware of the estimations made by other deciders”.

      Given the inherent complexity of human decision-making, it is crucial to acknowledge that, although the authors compared eight candidate models, other plausible alternatives may exist. As such, caution is warranted when interpreting the computational modeling results.

      Thank you for this comment. We fully agree with your opinion. Although we tried to build a conceptually comprehensive model space based on prior research and our own understanding, we did not include all plausible models, nor would it be feasible to do so. We acknowledge it as a limitation in the revised manuscript (Page 47): “... although we aimed to construct a conceptually comprehensive computational model space informed by prior research and our own understanding, it does not encompass all plausible models. Future research is encouraged to explore additional possibilities.”

      I do not agree with the authors' claim that "computational modeling results indicated that individuals integrate harm and responsibility in the form of a quotient" (i.e., harm/responsibility). Rather, the findings appear to suggest that individuals may form a composite representation of the harm attributable to each individual (i.e., harm/the number of people involved). The explanation of the modeling results ought to be precise.

      We appreciate your comment and apologize for the imprecise description. In the revised manuscript, we now use the expressions “… integrate harm and the number of wrongdoers in the form of a quotient.” and “… the integration of harm and responsibility by individuals is consistent with the phenomenon of responsibility diffusion.” For example, on Page 19, we state: “It assumes that individuals neglect their self-interest, have a compensatory baseline, and integrate harm and the number of wrongdoers in the form of a quotient.” On Page 3, we state: “Computational modeling results indicated that the integration of harm and responsibility by individuals is consistent with the phenomenon of responsibility diffusion.”

      Many studies have reported positive associations between trait gratitude, social value orientation, and altruistic behavior. It would be helpful if the authors could provide an explanation about why this study failed to replicate these associations.

      Thanks a lot for this important comment. We have now added an explanation into the revised manuscript (Page 47): “Although previous research has found that trait gratitude and SVO are significantly associated with altruistic behavior in contexts such as donation (Van Lange et al., 2007; Yost-Dubrow & Dunham, 2018) and reciprocity (Ma et al., 2017; Yost-Dubrow & Dunham, 2018), their associations with compensatory decisions in the present study were not significant. This suggests that the effects of trait gratitude and SVO on altruistic behavior are context-dependent and may not predict all forms of altruistic behavior.”

      As the authors noted, guilt and shame are closely linked to various psychiatric disorders. It would be valuable to discuss whether this study has any implications for understanding or even informing the treatment of these disorders.

      We are grateful for this advice. Although our study did not directly examine patients with psychological disorders, the findings offer insights into the regulation of guilt and shame. As these emotions are closely linked to various disorders, improving their regulation may help alleviate related symptoms. Accordingly, we have added a paragraph highlighting the potential clinical relevance (Pages 48 and 49): “Our study has potential practical implications. The behavioral findings may help counselors understand how cognitive interventions targeting perceptions of harm and responsibility could influence experiences of guilt and shame. The neural findings highlight specific brain regions (e.g., TPJ) as potential intervention targets for regulating these emotions. Given the close links between guilt, shame, and various psychological disorders (e.g., Kim et al., 2011; Lee et al., 2001; Schuster et al., 2021), strategies to regulate these emotions may contribute to symptom alleviation. Nevertheless, because this study was conducted with healthy adults, caution is warranted when considering applications to other populations.”

      Reviewer #1 (Recommendations for the authors):

      (1) Would it be interesting to explore other categories of behavior apart from compensatory behavior?

      Thanks a lot for this insightful question. We focused on a classic form of altruistic behavior, compensation. Future studies are encouraged to adapt our paradigm to examine other behaviors associated with guilt and/or shame, such as donation (Xu, 2022), avoidance (Shen et al., 2023), or aggression (Velotti et al., 2014). Please see Page 48: “Future research could combine this paradigm with other cognitive neuroscience methods, such as electroencephalography (EEG) or magnetoencephalography (MEG), and adapt it to investigate additional behaviors linked to guilt and shame, including donation (Xu, 2022), avoidance (Shen et al., 2023), and aggression (Velotti et al., 2014).”

      (2) Did the computational model account for the position of the block (slider) at the start of each decision-making response (when participants had to decide how to divide the endowment)? Or are anchoring effects not relevant/ not a concern?

      Thank you for this interesting question. In our task, the initial position of the slider was randomized across trials, and participants were explicitly informed of this in the instructions. This design minimized stable anchoring effects across trials, as participants could not rely on a consistent starting point. Although anchoring might still have influenced individual trial responses, we believe it is unlikely that such effects systematically biased our results, since randomization would tend to cancel them out across trials. Additionally, prior research has shown that when multiple anchors are presented, anchoring effects are reduced if the anchors contradict each other (Switzer

      III & Sniezek, 1991). Therefore, we did not attempt to model potential anchoring effects. Nevertheless, future research could systematically manipulate slider starting positions to directly examine possible anchoring influences. In the revised manuscript, we have added a brief clarification (Page 11): “The initial position of the block was randomized across trials, which helped minimize stable anchoring effects across trials.”

      (3) Was there a real receiver who experienced the shocks and received compensation? I think it is not completely clear in the paper.

      We are sorry for not making this clear enough. The receiver was fictitious and did not actually exist. We have supplemented the Methods section with the following description (Page 12): “We told the participant a cover story that the receiver was played by another college student who was not present in the laboratory at the time. … In fact, the receiver did not actually exist.”.

      (4) What was the rationale behind not having participants meet the receiver?

      Thank you for this question. Having participants meet the receiver (i.e., the victim), played by a confederate, might have intensified their guilt and shame and produced a ceiling effect. In addition, the current approach simplified the experimental procedure and removed the need to recruit an additional confederate. These reasons have been added to the Methods section (Page 12): “Not having participants meet the receiver helped prevent excessive guilt and shame that might produce a ceiling effect, while also eliminating the need to recruit an additional confederate.”

      Minor edits:

      (1) Line 49: "the cognitive assessment triggers them", I think a word is missing.

      (2) Line 227: says 'Slide' instead of 'Slider'.

      (3) Lines 867/868: "No brain response showed significant correlation with responsibility-driven guilt sensitivity, harm-driven shame sensitivity, or responsibilitydriven shame sensitivity." I think it should be harm-driven guilt sensitivity, responsibility-driven guilt sensitivity, and harm-driven shame sensitivity.

      (4) Supplementary Information Line 12: I think there is a typo ( 'severs' instead of 'serves')

      We sincerely thank you for patiently pointing out these typos. We have corrected them accordingly. 

      (1) “the cognitive assessment triggers them” has been revised to “the cognitive antecedents that trigger them” (Page 2).

      (2) “SVO Slide Measure” has been revised to “SVO Slider Measure” (Page 8).

      (3) “No brain response showed significant correlation with responsibility-driven guilt sensitivity, harm-driven shame sensitivity, or responsibility-driven shame sensitivity." has been revised to “No brain response showed significant correlation with harm-driven guilt sensitivity, responsibility-driven guilt sensitivity, and harm-driven shame sensitivity.” (Page 35).

      (4) “severs” has been revised to “serves” (see Supplementary Information). In addition, we have carefully checked the entire manuscript to correct any remaining typographical errors.

      Reviewer #2 (Recommendations for the authors):

      The statement that trait gratitude and SVO were measured "for exploratory purposes" would benefit from further clarification regarding the specific questions being explored.

      Thank you for this valuable suggestion. In the revised manuscript, we have illustrated the exploratory purposes (Page 9): “We measured trait gratitude and SVO for exploratory purposes. Previous research has shown that both are linked to altruistic behavior, particularly in donation contexts (Van Lange et al., 2007; Yost-Dubrow & Dunham, 2018) and reciprocity contexts (Ma et al., 2017; Yost-Dubrow & Dunham, 2018). Here, we explored whether they also exert significant effects in a compensatory context.”

      In the Methods section, the authors state: "To confirm the relationships between κ and guilt-driven and shame-driven compensatory sensitivities, we calculated the Pearson correlations between them." However, the Results section reports linear regression results rather than Pearson correlation coefficients, suggesting a possible inconsistency. The authors are advised to carefully check and clarify the analysis approach used.

      We thank you for the careful reviewing and apologize for this mistake. We used a linear mixed-effects regression instead of Pearson correlations for the analysis. The mistake has been revised (Page 25): “To confirm the relationships between κ and guiltdriven and shame-driven compensatory sensitivities, we conducted a linear mixedeffects regression. κ was regressed onto guilt-driven and shame-driven compensatory sensitivities, with participant-specific random intercepts and random slopes for each fixed effect included as random effects.”

      A more detailed discussion of how the current findings inform the regulation of guilt and shame would further strengthen the contribution of this study.

      Thank you for this suggestion. We have added a paragraph discussing the implications for the regulation of guilt and shame (Pages 48 and 49): “Our study has potential practical implications. The behavioral findings may help counselors understand how cognitive interventions targeting perceptions of harm and responsibility could influence experiences of guilt and shame. The neural findings highlight specific brain regions (e.g., TPJ) as potential intervention targets for regulating these emotions. Given the close links between guilt, shame, and various psychological disorders (e.g., Kim et al., 2011; Lee et al., 2001; Schuster et al., 2021), strategies to regulate these emotions may contribute to symptom alleviation. Nevertheless, because this study was conducted with healthy adults, caution is warranted when considering applications to other populations.”

      As fMRI provides only correlational evidence, establishing a causal link between neural activity and guilt- or shame-related cognition and behavior would require brain stimulation or other intervention-based methods. This may represent a promising direction for future research.

      Thank you for this advice. We also agree that it is important for future research to establish the causal relationships between the observed brain activity, psychological processes, and behavior. We have added a corresponding discussion in the revised manuscript (Pages 47 and 48): “… fMRI cannot establish causality. Future studies using brain stimulation techniques (e.g., transcranial magnetic stimulation) are needed to clarify the causal role of brain regions in guilt-driven and shame-driven altruistic behavior.”

      Reviewer #3 (Recommendations for the authors):

      It was mentioned that emotions beyond guilt and shame, such as indebtedness, may also drive compensation. Were any additional types of emotion measured in the study?

      Thank you for this question. We did not explicitly measure emotions other than guilt and shame. However, the parameter κ from our winning computational model captures the combined influence of various psychological processes on compensation, which may reflect the impact of emotions beyond guilt and shame (e.g., indebtedness). We acknowledge that measuring other emotions similar to guilt and shame may help to better understand their distinct contributions. This point has been added into the revised manuscript (Page 48): “… we did not explicitly measure emotions similar to guilt and shame (e.g., indebtedness), which would have been helpful for understanding their distinct contributions.”

      The experimental task is complicated, raising the question of whether participants fully understood the instructions. For instance, one participant's compensation amount was zero. Could this reflect a misunderstanding of the task instructions?

      Thanks a lot for this question. In our study, after reading the instructions, participants were required to complete a comprehension test on the experimental rules. If they made any mistakes, the experimenter provided additional explanations. Only after participants fully understood the rules and correctly answered all comprehension questions did they proceed to the main experimental task. We have clarified this procedure in the revised manuscript (Page 13): “Participants did not proceed to the interpersonal game until they had fully understood the experimental rules and passed a comprehension test.”

      Making identical choices across different trials does not necessarily indicate that participants misunderstood the rules. Similar patterns, where participants made the same choices across trials, have also been observed in previous studies (Zhong et al., 2016; Zhu et al., 2021).

      Reference

      Cohen, T. R., Wolf, S. T., Panter, A. T., & Insko, C. A. (2011). Introducing the GASP scale: a new measure of guilt and shame proneness. Journal of Personality and Social Psychology, 100(5), 947–966. https://doi.org/10.1037/a0022641

      Esterman, M., Tamber-Rosenau, B. J., Chiu, Y. C., & Yantis, S. (2010). Avoiding nonindependence in fMRI data analysis: Leave one subject out. NeuroImage, 50(2), 572–576. https://doi.org/10.1016/j.neuroimage.2009.10.092

      Kim, S., Thibodeau, R., & Jorgensen, R. S. (2011). Shame, guilt, and depressive symptoms: A meta-analytic review. Psychological Bulletin, 137(1), 68. https://doi.org/10.1037/a0021466

      Lee, D. A., Scragg, P., & Turner, S. (2001). The role of shame and guilt in traumatic events: A clinical model of shame-based and guilt-based PTSD. British Journal of Medical Psychology, 74(4), 451–466. https://doi.org/10.1348/000711201161109

      Ma, L. K., Tunney, R. J., & Ferguson, E. (2017). Does gratitude enhance prosociality?: A meta-analytic review. Psychological Bulletin, 143(6), 601–635. https://doi.org/10.1037/bul0000103

      Michl, P., Meindl, T., Meister, F., Born, C., Engel, R. R., Reiser, M., & Hennig-Fast, K. (2014). Neurobiological underpinnings of shame and guilt: A pilot fMRI study. Social Cognitive and Affective Neuroscience, 9(2), 150–157.

      Schuster, P., Beutel, M. E., Hoyer, J., Leibing, E., Nolting, B., Salzer, S., Strauss, B., Wiltink, J., Steinert, C., & Leichsenring, F. (2021). The role of shame and guilt in social anxiety disorder. Journal of Affective Disorders Reports, 6, 100208. https://doi.org/10.1016/j.jadr.2021.100208

      Shen, B., Chen, Y., He, Z., Li, W., Yu, H., & Zhou, X. (2023). The competition dynamics of approach and avoidance motivations following interpersonal transgression. Proceedings of the National Academy of Sciences, 120(40), e2302484120. https://doi.org/10.1073/pnas.230248412

      Switzer III, F. S., & Sniezek, J. A. (1991). Judgment processes in motivation: Anchoring and adjustment effects on judgment and behavior. Organizational Behavior and Human Decision Processes, 49(2), 208–229. https://doi.org/10.1016/0749-5978(91)90049-Y

      Van Lange, P. A. M., Bekkers, R., Schuyt, T. N. M., & Van Vugt, M. (2007). From games to giving: Social value orientation predicts donations to noble causes. Basic and Applied Social Psychology, 29(4), 375–384. https://doi.org/10.1080/01973530701665223

      Velotti, P., Elison, J., & Garofalo, C. (2014). Shame and aggression: Different trajectories and implications. Aggression and Violent Behavior, 19(4), 454–461. https://doi.org/10.1016/j.avb.2014.04.011

      Wagner, U., N’Diaye, K., Ethofer, T., & Vuilleumier, P. (2011). Guilt-specific processing in the prefrontal cortex. Cerebral Cortex, 21(11), 2461–2470. https://doi.org/10.1093/cercor/bhr016

      Wu, X., Ren, X., Liu, C., & Zhang, H. (2024). The motive cocktail in altruistic behaviors. Nature Computational Science, 4, 659–676. https://doi.org/10.1038/s43588-024-00685-6

      Xu, J. (2022). The impact of guilt and shame in charity advertising: The role of self- construal. Journal of Philanthropy and Marketing, 27(1). https://doi.org/10.1002/nvsm.1709

      Yost-Dubrow, R., & Dunham, Y. (2018). Evidence for a relationship between trait gratitude and prosocial behaviour. Cognition and Emotion, 32(2), 397–403. https://doi.org/10.1080/02699931.2017.1289153

      Yu, H., Gao, X., Zhou, Y., & Zhou, X. (2018). Decomposing gratitude: Representation and integration of cognitive antecedents of gratitude in the brain. Journal of Neuroscience, 38(21), 4886–4898. https://doi.org/10.1523/JNEUROSCI.2944-17.2018

      Zhong, S., Chark, R., Hsu, M., & Chew, S. H. (2016). Computational substrates of social norm enforcement by unaffected third parties. NeuroImage, 129, 95–104. https://doi.org/10.1016/j.neuroimage.2016.01.040

      Zhu, R., Feng, C., Zhang, S., Mai, X., & Liu, C. (2019). Differentiating guilt and shame in an interpersonal context with univariate activation and multivariate pattern analyses. NeuroImage, 186, 476486. https://doi.org/10.1016/j.neuroimage.2018.11.012

      Zhu, R., Xu, Z., Su, S., Feng, C., Luo, Y., Tang, H., Zhang, S., Wu, X., Mai, X., & Liu, C. (2021). From gratitude to injustice: Neurocomputational mechanisms of gratitude-induced injustice. NeuroImage, 245, 118730. https://doi.org/10.1016/j.neuroimage.2021.118730

    1. eLife Assessment

      This Review Article provides a timely review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function, with a focus on monocytes/macrophages. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances.

    2. Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic. The revised version strengthens the connection with the ECM and all sections are now better integrated.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      • Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.
      • Incorporation of recent high-impact findings alongside classical literature.
      • Conceptually novel framing of ECM as an active regulator of immune function.
      • Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      • Some sections remain dense with signalling detail.
      • Figure readability could be improved through simplified labeling.

      Appraisal and Impact:

      The authors have achieved their aim of presenting an integrated view of ECM-immune interactions. The review provides conceptual and visual clarity on a complex topic.

    4. Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the inter between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently. This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author discusses the relevant barriers leukocytes face during extravasation, addresses interactions with and transmigrate through endothelial junctions, mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The question whether extravasation affects leukocyte differentiation and properties is original and thought-provoking and has received limited consideration thus far. The consequences leukocytes extracellular matrix interaction, non-linear responses to substrate stiffness and effects on macrophage polarization, efferocytosis and the outcome of inflammation are relevant topics raised. Finally, a unifying descriptive framework MIKA is introduced, which provides a tool for classifying macrophages based on their expression patterns and could inform the development of targeted therapies aimed at modulating macrophage identity and improving outcomes in inflammatory scenarios.

      In summary, this review provides a stimulating perspective on leukocyte extravasation in the context of extracellular matrix biology.

      Weaknesses:

      One potential drawback of this review is that the attempt to integrate a vast amount of information has resulted in complex figures, which may lead to important details being overlooked by readers.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      Since this review focuses on the whole extravasation journey of leukocyte, this topic is definitely quite broad and covers several related fields. The article highlights the involvement of extracellular matrices (ECM), which are important regulators in multiple phases of the process, as a common theme to thread together these related topics. In the revised manuscript, we have made further emphasis on the role of specific ECM where appropriate (see point 2 below) and reorganized the last section to fit to this theme (see point 3 below).

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      ECM may exert their roles either as a collective structure or as individual components. In the latter case, though the concerned ECM are specifically named throughout the manuscript, they may not be sufficiently obvious since they were often not mentioned in subheadings. For sections discussing functions of a specific ECM protein or at least a specific class of ECM proteins, we have now included their names in the subheadings as well for clarity (section 5 and 8). For other sections discussing functions that involve ECM as a macrostructure, either in form of vascular basement membrane to enable force generation or contributing to the overall tissue stiffness to provide biophysical cues (section 7, 9-10), we have included the specific processes regulated in the subheadings like that in section 4.

      In the newly added discussion about the effects of matrikines on lymphocytes, we have also focused on the roles of specific ECM (PGP and versican; line 396-408). We hope these measures have made the subheadings more informative and provided better clarity of the roles of specific ECM components.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      Section 10-11 concerns how macrophage phenotypes affect the tissue fate following inflammation, that is, either to resolve inflammation and regenerate damages incurred or to sustain inflammation. This fate decision is an important aspect of this review: By furthering our understanding on the processes and mechanisms involved, we hope to gain the capability to properly control tissue outcomes in inflammatory diseases.

      In section 10, an emphasis is put on macrophage efferocytosis, for its documented efficiency to resolve tissue inflammation. Specific ECM components (type-V collagens and 𝑎2-laminins) could directly promote macrophage efferocytosis (line 494-499). On the other hand, changes in tissue stiffness, as a result of ECM turnover regulated by activities of leukocytes or other cell types like fibroblasts as described in section 9, also affects efferocytosis (line 504-507).

      We acknowledge that section 11 does not integrate well to the rest of the review, this section is now restructured. First, we describe how the ECM-regulated efferocytosis may be leveraged in disease modulation (line 522-529) and the need for a unified system to describe macrophage states for disease modulation (line 527-533) such that the responsible cell states for producing ECM regulators / effectors can be clarified (line 533-535). Given means to control macrophage cell states, this clarification will be useful to modulate pathologies involving ECM malfunctioning, that might be hinted by emergence or expansion of those responsible macrophage states in pathology (line 577-579, 581-585). Next, we provide historic background of efforts to establish such a unified descriptive platform for macrophage states (line 538-548) and describe the recent solution offered by MIKA. MIKA is a pan-tissue archive for tissue macrophage cell states based on meta-analysis of published single-macrophage transcriptomes, we have described the establishment, the latest development (Supplementary Data 1-4) and how the complex tissue macrophage states are segmented to core and tissue-specific identities under this framework (line 548-560, Figure 5A). Under this identity framework, expression of different ECM regulators discussed in this review (either the ECM per se, fibroblastic growth factors or proteases or protease inhibitors that regulate ECM turnover or matrikine production) are examined and linked to specific macrophage identities to offer insights of their potential relevance in pathologies (line 561-586, Figure 5B).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      We apologize for the complex setup. Table 1 is now reformatted to horizontal orientation to have enough space for the columns and reorganized for much easier comprehension.

      (5) Figure 2 appears very complex and broad.

      The original Figure 2 is now split to 2 separate figures (Figure 3-4). Since many processes of diverse natures influence tissue decision of resolution/inflammation, Figure 3 serves to outline and summarise these processes. Figure 4 now focuses on the regulation and tissue-resolving roles of macrophage efferocytosis, which specific ECM components (type-V collagens and 2-laminins) or tissue stiffness contribute to acquisition of this cell state. We hope this split can better focus the messages and ease understanding.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

      The manuscript is now proofread again, with corrections made throughout the text.

      Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      A transition paragraph between these two phases is now added between Section 6 and Section 7 to provide a narrative that ECM interaction events during extravasation affect downstream leukocyte functions (line 300-307).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      Although lymphocytes follow a similar extravasation principle as described in earlier sections, their in-tissue activities differ much from innate leukocytes. Discussion of crosstalk amongst T cells, innate leukocytes and matrikines is now incorporated into section 8 (line 396-408). Functional effects of tissue stiffness on different T cell subsets are now discussed in section 9 (line 456-469).

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      The involved section 11 is now restructured to better integrate to the ECM topics with the associated Figure 3 changed to Figure 5. Specifically, under the MIKA framework, we have now linked specific macrophage identities to expression / production of ECM functional effectors or regulators discussed in this review to highlight their regulatory roles and potential relevance in pathologies. Reviewer #1 and #3 also have raised this issue, please refer to the response to point (3) of reviewer #1 for detailed description.

      (4) Limited discussion of translational implications and therapeutic strategies.

      Besides translational implications or therapeutic strategies included in the original manuscript (line 291-298, 375-377, 421-424, 427-429, 508-511, 512-516 of the current manuscript), we have now included additional discussion to enrich these aspects (line 356-358, line 396-398, 402-403, 428, 436-439, 467-469, 523-536, 579-586).

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      The original Figure 1 containing the insets is now split to Figure 1-2 to avoid too dense information fitting to a single figure and to better focus the message in each figure. To resolve the issue of overly dense insets, insets in Figure 1 are redrawn/ reorganized. The original Figure 1C is moved to Figure 2A. The inset showing platelet plugging, together with the issue of diapedesis overloading described in the original Figure 1B, is reorganized to Figure 2B. In this way, Figure 1 focuses on the vascular barrier organization, overview of extravasation, and the force related events during endothelial junctional remodelling. Figure 2 focuses on the low expression regions, and junctional sealing processes after diapedesis.

      We have now expanded discussion on ECM carryovers and their reported or implicated effects on downstream leukocyte functions (line 329-335).

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

      A glossary explaining specialized terms that may be confusing to readers of different fields is now included as Appendix 1 to broaden accessibility (line 977).

      Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development.

      The issue of section 11 being not well-integrated to the rest of the review has also been pointed out by other reviewers. In response, this section and the associated Figure 3 are now restructured for better integration to the theme of ECM. In brief, we have now discussed the regulatory roles of specific macrophage identities under the MIKA framework on the ECM regulators described in this review. Please refer to the response to point (3) of reviewer #1 for further details.

      Regarding the difficulties in understanding the MIKA framework without prior knowledge of our previous work, first, we thank the reviewer for pointing out this issue and for making suggestion to better introduce the framework in a way easy to comprehend. Accordingly, in the current structure of section 11, we have described the rationales behind the needs of a common descriptive platform for tissue macrophage states (line 523-536), previous historic efforts (line 538-548), have introduced MIKA with mentions of the establishment and significance (line 548-555), and also have explained the rationales behind further development (line 555-560).

      Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

      We apologize for the figures being too dense. Other reviewers have also raised this issue (see response to point (5) of reviewer #2 and response to point (5) of reviewer #1). The original Figure 1 and 2 are now reorganized to Figure 1-2 and 3-4 respectively, with insets also redrawn / expanded. Figure 1 now focuses on the vascular barrier organization, overview of extravasation, and the force related events during endothelial junctional remodelling. Figure 2 focuses on the low expression regions, and junctional sealing processes after diapedesis. Figure 3 serves to outline and summarise the diverse processes influencing tissue decision of resolution/inflammation. Figure 4 focuses on the regulation and tissue-resolving roles of macrophage efferocytosis. The original Figure 3, mainly concerning the methodological aspects of update of MIKA, is now integrated to Supplementary Data 1. This figure is now replaced as Figure 5 concerning the specific macrophage identities producing ECM effectors / regulators discussed in this review.

      The concerned colour-coding issue is now in Figure 2A. All integrins are now in sky blue and all laminins in red. VE-Cad is also in red but has a different size and shape than laminins. We hope these modifications have improved the figures avoiding confusion.

      Recommendations for the authors:

      As you will see, the reviewers thought your manuscript was interesting and timely. However, as part 11 and its corresponding Figure 3 seem somewhat detached from the rest of the manuscript, one recommendation would be to remove this part for improved clarity. Other recommendations can be found in the comments below.

      Reviewer #2 (Recommendations for the authors):

      (1) Improve narrative linkage between vascular extravasation (Sections 2-6) and in-tissue leukocyte activities (Sections 7-10) by adding explicit transition text that connects ECM changes during transmigration to downstream immune cell phenotypes.

      A transition paragraph is now added between section 6 and 7 (line 300-307).

      (2) Expand discussion of lymphocyte-ECM interactions, either within existing sections or as a dedicated subsection.

      We have now added discussion of the effects of matrikine on in vivo T cell traffic (line 396-409) and how T cell functions are regulated by tissue stiffness (line 457-466).

      (3) Strengthen integration of the MIKA macrophage identity framework with ECM-specific drivers (e.g., stiffness, matrikines) and reduce methodological detail in Fig. 3 to focus on biological relevance.

      We thank the reviewer for this recommendation and have adopted accordingly. First, the methodological details in the original Fig.3 is now integrated to Supplementary Data 1. This figure is now replaced as Fig.5 serving to examine different macrophage identities’ contribution to ECM effectors / regulators (specifically, ECM per se, growth factors for ECM-producing fibroblasts, proteases and protease inhibitors) discussed in earlier sections. Relevant texts are on line 561-586.

      (4) Consider adding a glossary of key terms (e.g., matrikines, efferocytosis) to aid accessibility.

      A glossary explaining selected terms that may be confusing to the general readership is now added as Appendix 1 (line 977).

      Reviewer #3 (Recommendations for the authors):

      The discussion of fibrosis as a significant consequence of inflammatory activity is currently limited to skin keloids and bleomycin-induced lung fibrosis. Considering the substantial clinical relevance, it would be beneficial to include a mention of the various forms of liver fibrosis resulting from chronic inflammation.

      Liver cirrhosis is now mentioned as further examples of stiffening tissues on line 428, 436-439.

      While the manuscript is generally well-written, there are several minor language issues that could be easily addressed by a native speaker during revisions. Some examples are listed below:

      We thank the reviewer for these very helpful suggestions. They are adopted with the relevant line number in the revised manuscript indicated below. In addition, the manuscript is proofread again, with other grammatical mistakes corrected throughout the text.

      (1) Line 40: ... proliferative pathogen, can be timely eliminated.

      line 40

      (2) Line 79: It may be worthwhile pointing out that while Claudin 5 expression is highest in the BBB, it is also relevant in the BRB and expressed at lower levels in peripheral ECs. Similarly, ZO-1 is widely found to be expressed in peripheral endothelial cells.

      Thanks for indicating this caution, it is now mentioned on line 79-82.

      (3) Line 82: affects leukocyte traffic and...

      line 84

      (4) Line 125: ..., both neutrophil and lymphocyte extravasation were reduced by ~60%

      line 125-126

      5) Line 128: The term "paracellular endothelial junction" is odd, as junctions are per se paracellular, i.e., between cells.

      line 129

      (6) Line 147: ... VE-Cadherin, in which the FRET signal vanishes.

      line 148

      (7) Line 186: "activation by direct leukocyte pressing" might be rephrased to be clearer, e.g. "it might as well be activated by mechanical force exerted by leukocytes like it is the case for Piezo-1."

      line 185-186

      (8) Line 216: The phrasing "knockout analogy" is somewhat unfortunate. I would suggest "...a4 ko mice consequently largely lack a5 low expression regions and the resulting reduction in leukocyte extravasation confirms the facilitating role of the low a5 expression regions."

      line 217-218

      (9) Line 219: ...how the low expression regions form / are formed in the first place... The term construction implies active planning.

      line 220

      (10) Line 278: ... thrombocytopenic mice ...

      line 279

      (11) Line 294: ... use platelets as a drug delivery vehicle ...

      line 295

      (12) Line 304: instead of "could have changed", use "might change"

      line 315

      (13) Line 320: at the level of the monocyte

      line 336-337

      (14) Line 324: ... consistent with ...

      line 340

      (15) Line 335: ... progenitors

      line 351

      (16) Line 432: ... a considerable number of apoptotic neutrophils has (been) accumulated

      line 480

      (17) Line 442: ..., which promote killing responses, cross activate other leukocytes ..., or reduce tissue availability...

      line 490-491

      (18) Line 453: ...This macrophage is responsive to BMP...

      This sentence is now rephrased on line 500-501.

      (19) Line 454: ...involved in forming S1 macrophages.

      line 502

      (20) Line 476: ...numerous pathologies...

      Points (20-22) concerns Section 11, which is now restructured (line 523-586).

      21) Line 492: ...macrophages acquiring phenotypes specific to their residence tissue.

      (22) Line 498: ...either - the tissue macrophage is of heterogeneous nature... or - tissue macrophages are of heterogeneous nature...

    1. eLife Assessment

      This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence is convincing, and while the revisions made by the author have addressed most of the concerns the reviewers had about the original version, a small number of concerns remain.

    2. Reviewer #1 (Public review):

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision. Reviewers who were cited were more likely to recommend the article for publication when compared with reviewers that were not cited. Reviewers who requested and received a citation were much likely to accept than reviewers that requested and did not receive a citation.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail.

      I am still concerned that there is a major confounding factor: if you ignore the reviewers requests for citations are you more likely to have ignored all their other suggestions too? This has now been mentioned briefly and slightly circuitously in the limitations section. I would still like this (I think) major limitation to be given more consideration and discussion, although I am happy that it cannot be addressed directly in the analysis.

    3. Reviewer #2 (Public review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weakness:

      I thank the author for addressing my comments about the original version.

    4. Reviewer #3 (Public review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My original concerns have been largely addressed. Much more detail is provided about the number of documents under consideration for each analysis, which clarifies a great deal.

      Much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text. Language has been toned down in the revised version.

      The conditional analysis on the 441 reviews (lines 224-228) does support the revised interpretation as presented.

      No additional concerns are noted.

    5. Reviewer #4 (Public review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self-citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      The methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      Weaknesses:

      The authors have addressed most concerns in the initial review. The only remaining concern is the asymmetric reporting and highlighting of version 1 (null result) versus version 2 (rejecting null). For example the abstract says "We find that reviewers who were cited in the article under review were more likely to recommend approval, but only after the first version (odds ratio = 1.61; adjusted 99.4% CI: 1.16 to 2.23)" instead of a symmetric sentence "We find ... in version 1 and ... in version 2"

    6. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)::

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail. However, this is also a weakness of the work. Such thorough analysis makes it very hard to read! It's a very interesting paper with some excellent and thought provoking references but it needs to be careful not to overstate the results and improve the readability so it can be disseminated widely. It should also discuss more alternative explanations for the findings and, where possible, dismiss them.

      I have toned down the language including a more neutral title. To help focus on the main results, I have moved four paragraphs from the methods to the supplement. These are the sample size, the two sensitivity analyses on including co-reviewers and confounding by reviewers’ characteristics, and the analysis examining potential bias for the reviewers with no OpenAlex record.

      Reviewer #2 (Public review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weaknesses:

      The author needs to be more clear on the fact that, in some instances, requests for selfcitations by reviewers is important and valuable.

      This is a key point. I have included a new text analysis to examine this issue and have addressed this in the updated discussion.

      Reviewer #3 (Public review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My concerns pertain to the interpretability of the data as presented and the overly terse writing style.

      Regarding interpretability, it is often unclear what subset of the data are being used both in the prose and figures. For example, the descriptive statistics show many more Version 1 articles than Version 2+. How are the data subset among the different possible methods?

      I have now included the number of articles and reviews in the legends of each plot. There are more version 1 articles because some are “approved” at this stage and hence a second version is never submitted (I’ve now specifically mentioned this in the discussion).

      Likewise, the methods indicate that a matching procedure was used comparing two reviewers for the same manuscript in order to control for potential confounds. However, the number of reviews is less than double the number of Version 1 articles, making it unclear which data were used in the final analysis. The methods also state that data were stratified by version. This raises a question about which articles/reviews were included in each of the analyses. I suggest spending more space describing how the data are subset and stratified. This should include any conditional subsetting as in the analysis on the 441 reviews where the reviewer was not cited in Version 1 but requested a citation for Version 2. Each of the figures and tables, as well as statistics provided in the text should provide this information, which would make this paper much more accessible to the reader.

      [Note from editor: Please see "Editorial feedback" for more on this]

      The numbers are now given in every figure legend, and show the larger sample size for the first versions.

      The analysis of the 441 reviews was an unplanned analysis that is separate to the planned models. The sample size is much smaller than the main models due to the multiple conditions applied to the reviewers: i) reviewed both versions, ii) not cited in first version, iii) requested a self-citation in their first review.

      Finally, I would caution against imputing motivations to the reviewers, despite the important findings provided here. This is because the data as presented suggest a more nuanced interpretation is warranted. First, the author observes similar patterns of accept/reject decisions whether the suggested citation is a citation to the reviewer or not (Figs 3 and 4). Second, much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text, but largely left out of the discussion. The conditional analysis on the 441 reviews mentioned above does support a more cautious version of the conclusion drawn here, especially when considered alongside the specific comments left by reviewers that were mentioned in the results and information in Table S.3. However, I recommend toning the language down to match the strength of the data.

      I have used more cautious language throughout, including a new title. The new text analysis presented in the updated version also supports a more cautious approach.

      Reviewer #4 (Public review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      Barring a few issues discussed below, the methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      It is surprising that even in these investigated journals where referee names are public, there is prevalence of such citation-related behaviors.

      Weaknesses:

      Some overall claims are questionable:

      "Reviewers who were cited were more likely to approve the article, but only after version 1" It also appears that referees who were cited were less likely to approve the article in version 1. This null or slightly negative effect undermines the broad claim of citations swaying referees. The paper highlights only the positive results while not including the absence (and even reversal) of the effect in version 1 in its narrative.

      The reversed effect for version 1 is interesting, but the adjusted 99.4% confidence interval includes 1 and hence it’s hard to be confident that this is genuinely in the reverse direction. However, it is certainly far from the strongly positive association for versions 2+.

      "To the best of our knowledge, this is the first analysis to use a matched design when examining reviewer citations" Does not appear to be a valid claim based on the literature reference [18]

      This previous paper used a matched design but then did not used a matched analysis. Hence, I’ve changed the text in my paper to “first analysis to use a matched design and analysis”. This may seem a minor claim of novelty, but not using a matched analysis for matched data could discard much of the benefits of the matching.

      It will be useful to have a control group in the analysis associated to Figure 5 where the control group comprises matched reviews that did not ask for a self citation. This will help demarcate words associated with approval under self citation (as compared to when there is no self citation). The current narrative appears to suggest an association of the use of these words with self citations but without any control.

      Thanks for this useful suggestion. I have added a control group of reviewers who requested citations to articles other than their own. The words requested were very similar to the previous analysis, hence I’ve needed to reinterpret the results from the text analysis as “please” and “need” are not exclusively used by those requesting selfcitations. I also fixed a minor error in the text analysis concerning the exclusion of abstracts of shorter than 100 characters.

      More discussion on the recommendations will help:

      For the suggestion that "the reviewers initially see a version of the article with all references blinded and no reference list" the paper says "this involves more administrative work and demands more from peer reviewers". I am afraid this can also degrade the quality of peer review, given that the research cannot be contextualized properly by referees. Referees may not revert back to all their thoughts and evaluations when references are released afterwards.

      This is an interesting point, but I don’t think it’s certain that this would happen. For example, revisiting the review may provide a fresh perspective and new ideas; this sometimes happens for me when I review the second version of an article. Ideally an experiment is needed to test this approach, as it is difficult to predict how authors and reviewers will react.

      Recommendations for the Authors:

      Editorial feedback:

      I wonder if the article would benefit from a shorter title, such as the one suggested below. However, please feel free to not change the title if you prefer.

      [i] Are peer reviewers influenced by their work being cited (or not)?

      I like the slightly simpler: “Are peer reviewers influenced by their work being cited?”

      [ii] To better reflect the findings in the article, please revise the abstract along the following lines:

      Peer reviewers for journals sometimes write that one or more of their own articles should have been cited in the article under review. In some cases such comments are justified, but in other cases they are not. Here, using a sample of more than 37000 peer reviews for four journals that use open peer review and make all article versions available, we use a matched study design to explore this and other phenomena related to citations in the peer review process. We find that reviewers who were cited in the article under review were less likely to approve the original version of an article compared with reviewers who were not cited (odds ratio = 0.84; adjusted 99.4% CI: 0.69-1.03), but were more likely to approve a revised article in which they were cited (odds ratio = 1.61; adjusted 99.4% CI: 1.16-2.23). Moreover, for all versions of an article, reviewers who asked for their own articles to be cited were much less likely to approve the article compared with reviewers who did not do this (odds ratio = 0.15; adjusted 99.4% CI: 0.08-0.30). However, reviewers who had asked for their own articles to be cited were much more likely to approve a revised article that cited their own articles compared to a revised article that did not (odds ratio = 3.5; 95% CI: 2.0-6.1).

      I have re-written the abstract along the lines suggested. I have not included the finding that cited reviewers were less likely to approve the article due to the adjusted 99.4% interval including 1.

      [iii] The use of the phrase "self-citation" to describe an author citing an article by one of the reviewers is potentially confusing, and I suggest you avoid this phrase if possible.

      I have removed “self-citation” everywhere and instead used “citations to their own articles”.

      [iv] I think the captions for figures 2, 3 and 4 from benefit from rewording to more clearly describe what is being shown in the figure. Please consider revising the caption for figure 2 as follows, and revising the captions for figures 3 and 4 along similar lines. Please also consider replotting some of the panels so that the values on the horizontal axes of the top panel align with the values on the bottom panel.

      I have aligned the odds and probability axes as suggested which better highlights the important differences. I have updated the figure captions as outlined.

      Figure 2: Odds ratios and probabilities for reviewers giving a more or less favourable recommendation depending on whether they were cited in the article.

      Top left: Odds ratios for reviewers giving a more favourable (Approved) or less favourable (Reservations or Not approved) recommendation depending on whether they were cited in the article. Reviewers who were cited in version 1 of the article (green) were less likely to make a favourable recommendation (odds ratio = 0.84; adjusted 99.4% CI: 0.691.03), but they were more likely to make a favourable recommendation (odds ratio = 1.61; adjusted 99.4% CI: 1.16-2.23) if they were cited in a subsequent version (blue). Top right: Same data as top left displayed in terms of probabilities. From the top, the lines show the probability of a reviewer approving: a version 1 article in which they are not cited (please give mean value and CI); a version 1 article in which they are cited (mean value and CI); a version 2 (or higher) article in which they are not cited (mean value and CI); and a version 2 (or higher) article in which they are cited (mean value and CI).

      Bottom left: Same data as top left except that more favourable is now defined as Approved or Reservations, and less favourable is defined as Not approved. Again, reviewers who were cited in version 1 were less likely to make a favourable recommendation (odds ratio = 0.84; adjusted 99.4% CI: 0.57-1.23),and reviewers who were cited in subsequent versions were more likely to make a favourable recommendation (odds ratio = 1.12; adjusted 99.4% CI: 0.59-2.13).

      Bottom right: Same data as bottom left displayed in terms of probabilities. From the top, the lines show the probability of a reviewer approving: a version 1 article in which they are not cited (please give mean value and CI); a version 1 article in which they are cited (mean value and CI); a version 2 (or higher) article in which they are not cited (mean value and CI); and a version 2 (or higher) article in which they are cited (mean value and CI).

      This figure is based on an analysis of [Please state how many articles, reviewers, reviews etc are included in this analysis].

      In all the panels a dot represents a mean, and a horizontal line represents an adjusted 99.4% confidence interval.

      Reviewer #1 (Recommendations for the Authors):

      A big recommendation to the author would be to consider putting a lot of the statistical analysis in an appendix and describing the methods and results in more accessible terms in the main text. This would help more readers see the baby through the bath water

      I have moved four paragraphs from the methods to the supplement. These are the sample size, the two sensitivity analyses on including co-reviewers and confounding by reviewers’ characteristics, and the analysis examining potential bias for the reviewers with no OpenAlex record.

      One possibility, that may have been accounted for, but it is hard to say given the density of the analysis, is the possibility that an author who follows the recommendations to cite the reviewer has also followed all the other reviewer requests. This could account for the much higher likelihood of acceptance. Conversely an author who has rejected the request to cite the reviewer may be more likely to have rejected many of the other suggestions leading to a rejection. I couldn't discern whether the analysis had accounted for this possibility. If it has it need to be said more prominently, if it hasn't this possibility at least needs to be discussed. It would be good to see other alternative explanations for the results discussed (and if possible dismissed) in the discussion section too.

      This is an interesting idea. It’s also possible that authors more often accept and include any citation requests as it gives them more license to push back on other more involved changes that they would prefer not to make, e.g., running a new analysis. To examine this would require an analysis of the authors’ responses to the reviewers, and I have now added this as a limitation.

      I hope this paper will have an impact on scientific publishing but I fear that it won't. This is no reflection on the paper but a more a reflection on the science publishing system.

      I do not have any additional references (written by myself or others!) I would like the author to include

      Thanks. I appreciate that extra thought is needed when peer reviewing papers on peer review. I do not know the reviewers’ names! I have added one additional reference suggested by the reviewers which had relevant results on previous surveys of coercive citations for the section on “Related research”.

      Reviewer #2 (Recommendations for the Authors):

      (1) Would it be possible for the author to control for academic discipline? Some disciplines cite at different rates and have different citation sub-cultures; for example, Wilhite and Fong (2012) show that editorial coercive citation differs among the social science and business disciplines. Is it possible that reviewers from different disciplines just take a totally different view of requesting self-citations?

      Wilhite, A.W., & Fong, E.A. 2012. Coercive citation in academic publishing. Science, 335: 542-543.

      This is an interesting idea, but the number of disciplines would need to be relatively broad to keep a sufficient sample size. The Catch-22 is then whether broad disciplines are different enough to show cultural differences. Overall, this is an idea for future work.

      (2) I would like the author to be much more clear about their results in the discussion section. In line 214, they state that "Reviewers who requested a self-citation were much less likely to approve the article for all versions." Maybe in the discussion some language along the lines of "Although reviewers who requested self-citation were actually much less likely to approve an article, my more detailed analyses show that this was not the case when reviewers requested a self-citation without reason or with the inclusion of coercive language such as 'need' or 'please'." Again, word it as you like, but I think it should be made clear that requests for self-citation alone is not a problem. In fact, I would argue that what the author says in lines 250 to 255 in the discussion reflects that reviewers who request self-citations (maybe for good reasons) are more likely to be the real experts in the area and why those who did not request a self-cite did not notice the omission. It is my understanding that editors are trying to get warm bodies to review and thus reviewers are not all equally qualified. Could it be that requesting self-citations for a good reason is a proxy for someone who actually knows the literature better? I'm not saying this is s fact, but it is a possibility. I get this is said in the abstract, but worth fleshing out in the discussion.

      I have updated the discussion after a new text analysis and have addressed this important question of whether self-citations are different from citations to other articles. The idea that some self-citers are more aware of the relevant literature is interesting, although this is very hard to test because they could also just be more aware of their own work. The question of whether self-citations are justified is a key question and one that I’ve tried to address in an updated discussion.

      Reviewer #3 (Recommendations for the Authors):

      Data and code availablility are in good shape. At a high level, I recommend:

      Toning down the interpretation of reviewers' motivation, especially since some of this is mitigated by findings presented in the paper.

      I have reworded the discussion and included a warning on the observational study design.

      Devote more time detailing exactly what data are being presented in each figure/table and results section as described in more detail in the main review (n, selection criteria, conditional subsetting, etc.).

      I agree and have provided more details in each figure legend.

      Reviewer #4 (Recommendations for the Authors):

      A few aspects of the paper are not clear:

      I did not follow Figure 4. Are the "self citation" labels supposed to be "citation to other research"?

      Thanks for picking up this error which has now been fixed.

      I did not understand how to parse the left column of Figure 2

      As per the editor’s suggestion, the figure legend has been updated.

      Table 3: Please use different markers for the different curves so that it is clearly demarcated even in grayscale print

      I presume you meant Figure 3 not Table 3. I’ve varied the symbols in all three odds ratio plots.

      Supplementary S3: Typo "Approvep" Fixed, thanks.

      OTHER CHANGES: As well as the four reviews, my paper was reviewed by an AI-reviewer which provided some useful suggestions. I have mentioned this review in the acknowledgements. I have reversed the order of figure 5 to show the probability of “Approved” as this is simpler to interpret.

    1. eLife Assessment

      This study presents a valuable finding regarding the role of Arp2/3 and the actin nucleators N-WASP and WAVE complexes in myoblast fusion. The data presented is convincing, and the work will be of interest to biologists studying skeletal muscle stem cell biology in the context of skeletal muscle regeneration.

    2. Reviewer #1 (Public review):

      Overall, the manuscript reveals the role for actin polymerization to drive fusion of myoblasts during adult muscle regeneration. This pathway regulates fusion in many contexts, but whether it was conserved in adult muscle regeneration remained unknown. Robust genetic tools and histological analyses were used to convincingly support the claims.

    3. Reviewer #2 (Public review):

      To fuse, differentiated muscle cells must rearrange their cytoskeleton and assemble actin-enriched cytoskeletal structures. These actin foci are proposed to generate mechanical forces necessary to drive close membrane apposition and the fusion pore formation. While the study of these actin-rich structures has been conducted mainly in drosophila and in vertebrate embryonic development, the present manuscript present clear evidence this mechanism is necessary for fusion of adult muscle stem cells in vivo, in mice. The data presented here clearly demonstrate that ARP2/3 and SCAR/WAVE complexes are required for differentiating satellite cells fusion into multinucleated myotubes, during skeletal muscle regeneration.

    4. Reviewer #3 (Public review):

      This manuscript addresses an important biological question regarding the mechanisms of muscle cell fusion during regeneration. The primary strength of this work lies in the clean and convincing experiments, with the major conclusions being well-supported by the data provided.

      The authors have satisfactorily addressed my inquiries.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Public review):

      The authors have satisfactorily addressed my inquiries. However, I had to look quite hard to find where they responded to my final comment regarding the potential role of Arpc2 post-fusion during myofiber growth and/or maintenance, which I eventually located on page 7. I would appreciate it if the authors could state this point more explicitly, perhaps by adding a sentence such as "However, we cannot rule out the possibility that Arpc2 may also play a role in....." to improve clarity of communication. 

      While I understood from the original version that this issue falls beyond the immediate scope of the study, I believe it is important to adopt a more cautious and rigorous interpretative framework, especially given the widespread use of this experimental approach. In particular, when a gene could potentially have additional roles in myofibers, it may be helpful to explicitly acknowledge that possibility. Even if Arpc2 may not necessarily be one of them, such roles cannot be fully excluded without direct testing.  

      We appreciate the reviewer’s comments and have included several sentences at the end of the “Branched actin polymerization is required for SCM fusion” section to address this question:

      “The severe myoblast fusion defects observed in early stages of regeneration (e.g. dpi 4.5) provide a good explanation for the presence of thin muscle fibers in ArpC2 cKO mice at dpi 14 (Fig. 2B and 2C) and dpi 28 (Fig. S4A and S4B). These thin muscle fibers could be either elongated mononucleated muscle cells or multinucleated myofibers each containing a small number of nuclei due to occasional fusion events (comparable to those in Myomixer cKO muscles) (Fig. 2B and 2C; Fig. S4A and S4B). Whether Arp2/3 and branched actin polymerization play a role in the growth and/or maintenance of post-fusion multinucleated myofibers requires future loss-of-function studies in which ArpC2 cKO is generated using a myofiber-specific cre driver.”

    1. eLife Assessment

      This study presents significant and novel insights into the roles of zinc in mammalian meiosis/fertilization events. These findings are useful to our understanding of these processes. The evidence presented is solid, with experiments being well-designed, carefully described, and interpreted with appropriate rigor. The authors acknowledge the lack of mechanistic insight which represents the main limitation of the study.

    2. Reviewer #1 (Public review):

      The revised manuscript addresses several reviewer concerns, and the study continues to provide useful insights into how ZIP10 regulates zinc homeostasis and zinc sparks during fertilization in mice. The authors have improved the clarity of the figures, shifted emphasis in the abstract more clearly to ZIP10, and added brief discussion of ZIP6/ZIP10 interactions and ZIP10's role in zinc spark-calcium oscillation decoupling. However, some critical issues remain only partially addressed.

      (1) Oocyte health confound: The use of Gdf9-Cre deletes ZIP10 during oocyte growth, meaning observed defects could result from earlier disruptions in zinc signaling rather than solely from the absence of zinc sparks at fertilization. The authors acknowledge this and propose transcriptome profiling as a future direction. However, since mRNA levels often do not accurately reflect protein levels and activity in oocytes, transcriptomics may not be particularly informative in this context. Proteomic approaches that directly assess the molecular effects of ZIP10 loss seem more promising. Although current sensitivity limitations make proteomics from small oocyte samples challenging, ongoing improvements in this area may soon allow for more detailed mechanistic insights.

      (2) ZIP6 context and focus: The authors clarified the abstract to emphasize ZIP10, enhancing narrative clarity. This revision is appropriate and appreciated.

      (3) Follicular development effects: The biological consequences of ZIP6 and ZIP10 knockout during folliculogenesis are still unknown. The authors now say these effects will be studied in the future, but this still leaves a major mechanistic gap unaddressed in the current version.

      (4) Zinc spark imaging and probe limitations: The addition of calcium imaging enhances the clarity of Figure 3. However, zinc fluorescence remains inadequate, and the authors depend solely on FluoZin-3AM, a dye known for artifacts and limited ability to detect subcellular labile zinc. The suggestion that C57BL/6J mice may differ from CD1 in vesicle appearance is plausible but does not fully address concerns about probe specificity and resolution. As the authors acknowledge, future studies with more selective probes would increase confidence in both the spatial and quantitative analysis of zinc dynamics.

      (5) Mechanistic insight remains limited: The revised discussion now recognizes the lack of detailed mechanistic understanding but does not significantly expand on potential signaling pathways or downstream targets of ZIP10. The descriptive data are useful, but the inability to pinpoint how ZIP10 mediates zinc spark regulation remains a key limitation. Again, proteomic profiling would probably be more informative than transcriptomic analysis for identifying ZIP10-dependent pathways once technical barriers to low-input proteomics are overcome.

      Overall, the authors have reasonably revised and clarified key points raised by reviewers, and the manuscript now reads more clearly. However, the main limitation, lack of mechanistic insight and the inability to distinguish between developmental and fertilization-stage roles of ZIP10, remains unresolved. These should be explicitly acknowledged when framing the conclusions.

      Comments on revisions: I have no further comments to add to this review.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The revised manuscript addresses several reviewer concerns, and the study continues to provide useful insights into how ZIP10 regulates zinc homeostasis and zinc sparks during fertilization in mice. The authors have improved the clarity of the figures, shifted emphasis in the abstract more clearly to ZIP10, and added brief discussion of ZIP6/ZIP10 interactions and ZIP10's role in zinc spark-calcium oscillation decoupling. However, some critical issues remain only partially addressed. 

      Thank you for your valuable inputs. We plan to address the issues that could not be clarified in this report going forward.

      (1) Oocyte health confound: The use of Gdf9-Cre deletes ZIP10 during oocyte growth, meaning observed defects could result from earlier disruptions in zinc signaling rather than solely from the absence of zinc sparks at fertilization. The authors acknowledge this and propose transcriptome profiling as a future direction. However, since mRNA levels often do not accurately reflect protein levels and activity in oocytes, transcriptomics may not be particularly informative in this context. Proteomic approaches that directly assess the molecular effects of ZIP10 loss seem more promising. Although current sensitivity limitations make proteomics from small oocyte samples challenging, ongoing improvements in this area may soon allow for more detailed mechanistic insights.

      Thank you for your suggestions. We will keep that in mind for the future.

      (2) ZIP6 context and focus: The authors clarified the abstract to emphasize ZIP10, enhancing narrative clarity. This revision is appropriate and appreciated. 

      Thanks to your feedback, my paper has improved. Thank you for your evaluation.

      (3) Follicular development effects: The biological consequences of ZIP6 and ZIP10 knockout during folliculogenesis are still unknown. The authors now say these effects will be studied in the future, but this still leaves a major mechanistic gap unaddressed in the current version. 

      As you mentioned, we have not been able to clarify the effects of ZIP6 and ZIP10 knockout on follicle formation. The effects of ZIP6 and ZIP10 knockout on follicle formation will be discussed in the future.

      (4) Zinc spark imaging and probe limitations: The addition of calcium imaging enhances the clarity of Figure 3. However, zinc fluorescence remains inadequate, and the authors depend solely on FluoZin-3AM, a dye known for artifacts and limited ability to detect subcellular labile zinc. The suggestion that C57BL/6J mice may differ from CD1 in vesicle appearance is plausible but does not fully address concerns about probe specificity and resolution. As the authors acknowledge, future studies with more selective probes would increase confidence in both the spatial and quantitative analysis of zinc dynamics. 

      Thank you for your comment. Moving forward, we plan to conduct spatial and quantitative analyses of zinc dynamics using various other zinc probes.

      (5) Mechanistic insight remains limited: The revised discussion now recognizes the lack of detailed mechanistic understanding but does not significantly expand on potential signaling pathways or downstream targets of ZIP10. The descriptive data are useful, but the inability to pinpoint how ZIP10 mediates zinc spark regulation remains a key limitation. Again, proteomic profiling would probably be more informative than transcriptomic analysis for identifying ZIP10-dependent pathways once technical barriers to low-input proteomics are overcome. 

      Thank you for your helpful advice. I'll use it as a reference for future analysis.

      Future studies should assess the transcriptomic or proteomic profile of Zip10<sup>d/d</sup> mouse oocytes (P.11 Line 349-350).

      Overall, the authors have reasonably revised and clarified key points raised by reviewers, and the manuscript now reads more clearly. However, the main limitation, lack of mechanistic insight and the inability to distinguish between developmental and fertilization-stage roles of ZIP10, remains unresolved. These should be explicitly acknowledged when framing the conclusions.

      We have added the two limitations you pointed out to the conclusion section of the main text.

      However, the role of ZIP6 remained uncertain. Additionally, the absence of mechanistic insight for zinc spark and the inability to distinguish between the developmental and fertilization stage roles of ZIP10 remain unresolved. These challenges necessitate further investigation (P.11-12 Line 354-357).

    1. eLife Assessment

      This important study addresses a topic that is frequently discussed in the literature but is under-assessed, namely correlations among genome size, repeat content, and pathogenicity in fungi. Contrary to previous assertions, the authors found that repeat content is not associated with pathogenicity. Rather, pathogenic lifestyle was found to be better explained by the number of protein-coding genes, with other genomic features associated with insect association status. The results are considered solid, although there remain concerns about potential biases stemming from the underlying data quality of the analyzed genomes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analyses of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of life styles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of protein coding genes, including total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated to insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The size of the dataset is impressive and likely makes the conclusions robust. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strengths of this study lie in (i) the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes, (ii) the quality of the analyses and the quality of the presentation of the results, (iii) the importance of the authors' findings.

      Weaknesses:

      The weakness is a common issue in most comparative genomics studies in fungi, but it remains important and valid to highlight it. Defining lifestyles is complex because many fungi go through different lifestyles during their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In many fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which does not necessarily mean that this substrate is a key part of the life cycle. The authors discuss this issue, but they do not eliminate the underlying uncertainties.

      [Editors' note: this version was assessed by the editors, without involving the reviewers again.]

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      I think the authors did a fantastic job investigating the annotation issues I brought up in the first round. I am somewhat assured that the size of the dataset has prevented any real systematic issues from impacting their results. However, there are many clear underlying biases in the data, as the authors show, which could have a number of unexpected impacts on the results. For example, the consistently lower gene numbers could be biased towards certain types of genes or in certain lineages, making the CAZyme analysis unreliable. I do not agree with the author's choice to put these results in as a supplement with little or no other references to it in the main manuscript. Many of the conclusions that are drawn should be hedged by these findings. There should at least be a rational given for why the authors took the approach they did, such as mentioning the points they brought up in the response.

      We thank the reviewer for the positive assessment of our revision. We added text in the Discussion acknowledging limitations of the gene annotation approach. 

      “Because of the uniform yet simplified gene annotation approach, the total number of genes may be underestimated in some assemblies in our dataset, as observed when comparing the same species in JGI Mycocosm. Although this pattern is not biased toward any particular group of species, access to high-quality, well-annotated genomes could provide a clearer picture of the relative contributions of specific gene families.”

      We also added more text in the Methods (section "Sordariomycetes genomes") mentioning in more detail the investigation of potential biases related to assembly quality and annotation (with reference to Supplementary Results).

      A couple minor corrections:

      Figure 1C, both axes say PC1?

      Fixed.

      Figure S12, scales don't match so it's hard to compare, axis labels are inconsistent.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      I congratulate the authors on the revision work. Their manuscript is very interesting and reads very well.

      I found several occurrences of « saprophyte ». Note that « saprotoph » is much better since fungi are not « phytes ».

      We thank the reviewer for positive feedback. The occurrences of “saprophytes” were corrected.

    1. eLife Assessment

      This potentially valuable work aimed at a better understanding of the mechanisms of response and resistance to androgen deprivation therapy in prostate cancer using genetically engineered mouse models. A key observation relates to the timing of TNF blockage therapy and the concept of a "TNF switch." The solid data were collected using conventional approaches and the conclusions are mostly justified, particularly with the inclusion of more detailed statistics in the revision. The work will be of interest to the prostate cancer research community.

    2. Joint Public Review:

      Summary:

      Sha K et al aimed at identifying mechanism of response and resistance to castration in the Pten knock out GEM model. They found elevated levels of TNF overexpressed in castrated tumors associated to an expansion of basal-like stem cells during recurrence, which they show occurring in prostate cancer cells in culture upon enzalutamide treatment. Further, the authors carry on timed dependent analysis of the role of TNF in regression and recurrence to show that TNF regulates both processes. Similarly, CCL2, which the authors had proposed as a chemokine secreted upon TNF induction following enzalutamide treatment, is also shown elevated during recurrence and associate it to the remodeling of an immunosuppressive microenvironment through depletion of T cells and recruitment of TAMs.

      Strengths:

      The paper exploits a well stablished GEM model to interrogate mechanisms of response to standard of care treatment. This of utmost importance since prostate cancer recurrence after ADT or ARSi marks the onset of an incurable disease stage for which limited treatments exist. The work is relevant in the confirmation that recurrent prostate cancer is mostly an immunologically "cold" tumor with an immunosuppressive immune microenvironment.

      Comments on revised version:

      The Reviewing Editor has reviewed the response letter and revised manuscript and has the following recommendations (all text revisions) prior to the Version of Record.

      More information for Panel 4A:

      For the most part, the authors have addressed the statistical concerns raised in the initial review through inclusion of p values in the relevant figure legends. One important exception is Fig 4A which includes some of the most impactful data in the paper. The response letter and the new Fig4A legend refers to statistical in Supp Table 3. I could not find this in the package. Because this is such an important panel, I would urge the authors to include the statistics in the main figure. The display should include a fourth panel with castration alone, as requested by at least one reviewer.

      I would also urge the authors to place a schema of the experimental design at the top of the figure to clarify the timing of anti-TNF therapy and the fact that it is administered continuously rather than as a single dose (I was confused by this upon first reading). Last, it is hard to reconcile the curves in the day +3 panel with the conclusion that there is no effect (the red curve in particular).

      Include a model cartoon of the TNF switch:

      A key concept in the report is the concept of a "TNF switch". I recommend the authors include a model cartoon that lays out this out visually in an easily understandable format. The cartoon in Supp Fig 8 touches on this but is more biochemically focused and does not easily convey the "switch" concept.

      Add a "study limitations" paragraph at the end of the discussion:

      The authors noted that several other concerns expressed by the reviewers were considered beyond the scope of this report. These include the inclusion of additional tumor response endpoints beyond US-guided assessment of tumor volume (e.g., histology, proliferation markers, etc.) and the purely correlative association of macrophage and T cell infiltration with recurrence, in the absence of immune cell depletion experiments. To this point, the subheading "Immune suppression is a key consequence of increased tumor cell stemness" in the Discussion is too strongly worded.

      Similarly, there is no experimental proof that CCL2 from stroma (vs from tumor cell) is required for late relapse. Prior to formal publication, I suggest the authors include a "limitations of the study" paragraph at the end of the discussions that delineates several of these points.

      Other points:

      For concerns that several reviewers raised about basal versus luminal cells and stemness, the authors have modified the text to soften the conclusions and not assign specific lineage identities.

      The answer to the question regarding timing of castration (based on tumor size, not age) needs more detail. This is particularly relevant for the Hi-MYC model that is exquisitely castration sensitive and not known to relapse, except perhaps at very late time points (9-12 months). Surely the authors can include some information on the age range of the mice.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sha K et al aimed at identifying the mechanism of response and resistance to castration in the Pten knockout GEM model. They found elevated levels of TNF overexpressed in castrated tumors associated with an expansion of basal-like stem cells during recurrence, which they show occurring in prostate cancer cells in culture upon enzalutamide treatment. Further, the authors carry on a timed dependent analysis of the role of TNF in regression and recurrence to show that TNF regulates both processes. Similarly, CCL2, which the authors had proposed as a chemokine secreted upon TNF induction following enzalutamide treatment, is also shown to be elevated during recurrence and associated with the remodeling of an immunosuppressive microenvironment through depletion of T cells and recruitment of TAMs.

      Strengths:

      The paper exploits a well-established GEM model to interrogate mechanisms of response to standard-of-care treatment. This is of utmost importance since prostate cancer recurrence after ADT or ARSi marks the onset of an incurable disease stage for which limited treatments exist. The work is relevant in the confirmation that recurrent prostate cancer is mostly an immunologically "cold" tumor with an immunosuppressive immune microenvironment

      Weaknesses:

      While the data is consistent and the conclusions are mostly supported and justified, the findings overall are incremental and of limited novelty. The role of TNF and NF-kB signaling in tumor progression and the role of the CCL2-CCR2 in shaping the immunosuppressive microenvironment are well established.

      We contend there is novelty in: the experimental design; our finding of a TNF signaling ‘switch’ and the role of androgen-deprivation induced immunosuppression.    

      On the other hand, it is unclear why the authors decided to focus on the basal compartment when there is a wealth of literature suggesting that luminal cells are if not exclusively, surely one of the cells of origin of prostate cancer and responsible for recurrence upon antiandrogen treatment. As a result, most of the later shown data has to be taken with caution as it is not known if the same phenomena occur in the luminal compartment.

      While we appreciate the reviewer’s interest in the cancer stem cell biology occurring in the tumor in response to androgen deprivation, our focus in this report is identifying mechanisms that account for a switch in TNF signaling.  Specifically, our previous studies showed a rapid increase in TNF mRNA following castration (in the normal murine prostate) but in the current report we also observe an increase in TNF at late times post-castration (in a murine prostate cancer model).  We propose that the increase in TNF at late times is due to plasticity (increased stemness) in the tumor cell population, rather than - for example - a change in signal-driven TNF mRNA transcription.  While a possible mechanism is expansion of a recurrent tumor stem-cell population, a careful investigation is beyond the scope of this report.  Therefore, in the revised manuscript, we have altered the text in multiple places to indicate a suggestive, rather than definitive, role for tumor stem cells.  Indeed, we did include caveats regarding the role of tumor stem cells in the original discussion (lines 425-429 in the revised manuscript), and this is now made more explicit in the revised manuscript.   

      Reviewer #2 (Public Review):

      Summary:

      In this study, Sha and Zhang et al. reported that androgen deprivation therapy (ADT) induces a switch to a basal-stemness status, driven by the TNF-CCL2-CCR2 axis. Their results also reveal that enhanced CCL2 coincides with increased macrophages and decreased CD8 T cells, suggesting that ADT resistance may be related to the TNF/CCL2/CCR2-dependent immunosuppressive tumor microenvironment (TME). Overall, this is a very interesting study with a significant amount of data.

      Strengths:

      The strengths of the study include various clinically relevant models, cutting-edge technology (such as single-cell RNA-seq), translational potential (TNF and CCR2 inhibitors), and novel insights connecting stemness lineage switch to an immunosuppressive TME. Thus, I believe this work would be of significant interest to the field of prostate cancer and journal readership.

      Weaknesses:

      (1) One of the key conclusions/findings of this study is the ADT-induced basal-stemness lineage switch driving ADT resistance. However, most of the presented evidence supporting this conclusion only selects a couple of marker genes. What exacerbates this issue is that different basal-stemness markers were often selected with different results. For example, Figure S1A uses CD166/EZH2 as markers, while Figure S1B uses ITGb1/EZH2. In contrast, Figure 1D uses Sca1/CD49, and Figure 2B-C uses CD49/CD166. Since many basal-stemness lineage gene signatures have been previously established, the study should examine various basal-stemness gene signatures rather than a couple of selected markers. Moreover, why were none of the stemness/basal-gene signatures significantly changed in the GO enrichment analysis in Figure 6A/B?

      Mice and human cells express similar but also partially distinct prostate stem cell markers.  For example, Sca1 is predominantly used as a stem cell marker in mice but not in human prostate epithelial cells.  CD166 and CD49f are expressed in both human and murine prostate epithelium and therefore we used these in both sets of studies.  Also see the response to R1-2.

      (2) A related weakness is the lack of functional results supporting the stemness lineage switch. Although the authors present colony formation assay results, these could be influenced simply by promoted cell proliferation, which is not a convincing indicator of stemness. To support this key conclusion, widely accepted stemness assays, such as the prostasphere formation assay (in vitro) and Extreme Limiting Dilution Analysis (ELDA) xenograft assay (in vivo), should be carried out.

      See the response to R1-2 and R2-1, above.

      (3) Another significant concern is that this study uses concurrency to demonstrate a causal relationship in many key results, which is entirely different. For example, Figure S4A and S4B only show increased CCL2 and TNF secretion simultaneously, which cannot support that CCL2 is dependent on TNF. Similarly, Figure 5A only shows that CCL2 increased coincidently with a rise in TNF, which cannot support a causal relationship. To support the causal relationship of this conclusion, it is necessary to show that TNF-KO/KD would abolish the increased CCL2 secretion.

      Regarding Fig. S4A and S4B: We previously demonstrated (Sha et al, 2015; reference 10) that CCL2 secretion is dependent on TNF, in the same cell lines.  We have added additional data (new Fig. S4B) in this report to confirm this dependency.  

      Regarding Fig 5: In Fig 5B we demonstrated that the increase in CCL2-staining cells in recurrent tumors from castrated animals (the equivalent of human CRPC in our model) was significantly inhibited in animals receiving etanercept, demonstrating TNF dependency for CCL2 in this context.  

      While the use of TNF KO cell lines and animals could provide additional insights, the creation of such cell lines and tumor models is arduous.  Moreover, we previously demonstrated that administration of anti-TNF drugs such as etanercept are as effective as the KO phenotypes (Davis et al 2011; ref. 11).  

      (4) Some of the selective data presentations are not explained and are difficult to understand. For example, why does CD49 staining in Figure S3A have data for all four time points, while CD166 in Figure S3D only has data for the last time point (day 21)? Similarly, although several TNF_UP gene signatures were highlighted in Figure 4B, several TNF_DN signatures were also enriched in the same table, such as RUAN_RESPONSE_TO_TNF_DN. What is the explanation for these contrasting results?

      Regarding Fig. S3A and S3D: The cell-staining studies in Fig. S3 are confirmatory of the FACS studies in Figs. 2 and 3.  We were not able to stain all of the CD166 time-points for technical reasons (difficulty optimizing the automated staining protocol) but we were able to successfully stain key late time-points, so we have included this data in the supplementary figure.  There was no attempt to selectively present data; this was just a practical limitation of the time and funds that we could devote to confirmatory studies.   

      Regarding Fig 4B: The highlighting identifies a common (i.e., identical) group of gene sets in the two GSEA analyses, demonstrating that these very same gene sets are all up-regulated in one instance, and down-regulated in the other.  The ‘TNF DN’ genes were not identical in the two GSEA analyses and so we cannot draw any conclusions about these.  Note that we are scoring the TNF-related genes sets with the 10 largest (positive or negative) normalized enrichment scores (NES), and are not relying on DN or UP designations in the gene set name (identifier).  In this analysis up- and down-regulation refers to the sign and magnitude of the NES, not the gene set names.  

      Reviewer #3 (Public Review):

      Summary:

      The current manuscript evaluates the role of TNF in promoting AR targeted therapy regression and subsequent resistance through CCL2 and TAMs. The current evidence supports a correlative role for TNF in promoting cancer cell progression following AR inhibition. Weaknesses include a lack of descriptive methodology of the pre-clinical GEM model experiments and it is not well defined which cell types are impacted in this pre-clinical model which will be quite heterogenous with regards to cancer, normal, and microenvironment cells.

      Strengths:

      (1) Appropriate use of pre-clinical models and GEM models to address the scientific questions.

      (2) Novel finding of TNF and interplay of TAMs in promoting cancer cell progression following AR inhibition.

      (3) Potential for developing novel therapeutic strategies to overcome resistance to AR blockade.

      Weaknesses:

      (1) There is a lack of description regarding the GEM model experiments - the age at which mice experiments are started.

      Table S1 in the supplementary data summarizes the salient characteristics of the GEM models.  Note that as described in the M&M, we selected animals for experimental groups based on the tumor volume (determined by HFUS) and not based on the age of the mouse, since there is some variability in the kinetics of tumor growth in genetically identical mice, as shown by our HFUS observations of hundreds of mice harboring the genetic changes (PTEN loss, MYC gain) in the models we have studied most extensively.  Although admittedly an imperfect criteria, we reasoned that tumor volume would be the best surrogate criteria for tumor biology.  

      (2) Tumor volume measurements are provided but in this context, there is no discussion on how the mixed cancer and normal epithelial and microenvironment is impacted by AR therapy which could lead to the subtle changes in tumor volume.

      The reviewer’s criticism is well-founded - most of our studies involved bulk analysis, which makes it difficult to probe the cellular interactions within the TME.  Future studies - beyond the scope of this report - using single cell technical approaches - are needed to investigate these subtle changes.  We have added a statement to this effect to the manuscript (lines 464-468).

      (3) There are no readouts for target inhibition across the therapeutic pre-clinical trials or dosing time courses.

      The reviewer’s criticism is well-founded, since we cannot be 100% certain of drug delivery in the TNF and CCL2 blockade experiments.  Two points in this regard.  First, with the assistance of institutional veterinarian staff, we have had good success in training multiple scientists (PhD student, technicians) to deliver both biological and small molecule drugs i.p.  Second, the observation that the drugs did ‘work’ in most animals in well-defined experimental protocols strongly suggests that the delivery methodology is reliable.  If sporadic delivery failures do occur, this would tend to underestimate the magnitude of the ‘positive’ (i.e., blocking) effects rather than leading to false negatives.   

      (4) The terminology of regression and resistance appears arbitrary. The data seems to demonstrate a persistence of significant disease that progresses, rather than a robust response with minimal residual disease that recurs within the primary tumor.

      We explain our rationale for the criteria defining regression and recurrence in the M&M and in the legend to Table S2.  In the revised version of the manuscript, we now explicitly reference these descriptions in the relevant RESULTS section (lines 222-223).  Note that we use the term ‘recurrence’ rather than ‘resistance’ as the former does not necessarily imply a particular biological mechanism.  

      (5) It is unclear if the increase in basal-like stem cells is from normal basal cells or cancer cells with a basal stem-like property.

      See the response to R1-2 and R2-1.

      (6) In the Hi-MYC model, MYC expression is regulated by AR inhibition and is profoundly ARi responsive at early time points.

      We agree that this is the likely mechanism of castration-induced regression (so-called ‘MYC addiction’) but it is unclear what the reviewer’s concern is vis-a-vis our manuscript.  

      Reviewer #4 (Public Review):

      In this manuscript by Sha et al. the authors test the role of TNFa in modulating tumor regression/recurrence under therapeutic pressure from castration (or enzalutamide) in both in vitro and in vivo models of prostate cancer. Using the PTEN-null genetic mouse model, they compare the effect of a TNFα ligand trap, etanercept, at various points pre- and post-castration. Their most interesting findings from this experiment were that etanercept given 3 days prior to castration prevented tumor regression, which is a common phenotype seen in these models after castration, but etanercept given 1 day prior to castration prevented prostate cancer recurrence after castration. They go on to perform RNA sequencing on tumors isolated from either sham or castrate mice from two time points post-castration to study acute and delayed transcriptional responses to androgen deprivation. They found enrichment of gene sets containing TNF-targets which initially decrease post-castration but are elevated by 35 days, the time at which tumors recur. The authors conduct a similar set of experiments using human prostate cancer cell lines treated with the androgen receptor inhibitor enzalutamide and observe that drug treatment leads to cells with basal stem-like features that express high levels of TNF. They noticed that CCL2 levels correlate with changes in TNF levels raising the possibility that CCL2 might be a critical downstream effector for disease recurrence. To this end, they treated PTEN-null and hi-MYC castrated mice with a CCR2-antagonist (CCR2a) because CCR2 is one receptor of CCL2 and monitors tumor growth dynamics. Interestingly, upon treatment with CCR2a, tumors did not recur according to their measurements. They go on to demonstrate that the tumors pre-treated with CCR2a had reduced levels of putative TAMs and increased CTLs in the context of TNF or CCR2 inhibition providing a cellular context associated with disease regression. Lastly, they perform single-cell RNA sequencing to further characterize the tumor microenvironment post-castration and report that the ratio of CTLs to TAMs is lower in a recurrent tumor.

      While the concepts behind the study have merit, the data are incomplete and do not fully support the authors' conclusions. The author's definition of recurrence is subjective given that the amount of disease regression after castration is both variable (Figure 8) and relatively limited

      See the response to R3-4, above.

      particularly in the PTEN loss model. Critical controls are missing. For example, both drug experiments were completed without treating non-castrate plus drug controls

      In these experiments, we are investigating the effect of anti-TNF or anti-CCL2 therapy on the response to the castration.  The appropriate controls are castrated mice which received vehicle or no treatment.  The response of intact animals (with tumors still increasing in size) is not only irrelevant to the question we are asking, but also impractical, as the tumor size would be too large for mouse viability. 

      which raises the question of how specific these findings are to castration resistance. No validation was performed to ensure that either the TNF ligand trap or the CCR2 agonist was acting on target. 

      See the response to R3-3, above.

      The single-cell sequencing experiments were done without replicates which raises concern about its interpretation. 

      The goal in these experiments is to address a relatively narrow question concerning changes in a few key TAM-associated transcripts versus changes in a few CTL-associated transcripts.  This is not meant to provide rigorous single cell transcriptomic analysis that is required - for example - to definitely assess the levels of various cell populations.   As noted in R3-2 (and in the DISCUSSION , lines 467-468) future single cell analysis is ongoing, but beyond the scope of this manuscript.

      At a conceptual level, the authors say that a major cause of disease recurrence in the immunosuppressive TME, but provide little functional data that macrophages and T cells are directly responsible for this phenotype.   

      The requirement for CCL2-CCR2 signaling for recurrence suggests that TAMs drive recurrence, presumably due to immunosuppression in the TME.  However, CCR2 is expressed by other cell types.  Therefore, in future studies we will need to examine the response to additional inhibitors and also employ single cell ‘omics to more thoroughly characterize the changes in the cellular components of the tumor immune microenvironment.  Functional analysis of T-cell subsets is an even more formidable experimental challenge.  

      Statistical analyses were performed on only select experiments. 

      See the response to R1-3, below.

      In summary, further work is recommended to support the conclusions of this story.

      Reviewer #1 (Recommendations For The Authors):

      I suggest the authors address the following:

      (1) Throughout the figures, statistical analysis needs to be made clear including n numbers, replicates, and whether or not differences shown are statistically significant. These includes Figure 1c, and d,; Figure 2 A and B, Figure 3A; Figure 4A; Figure 5A, C and D; Figure 7B.

      We thank the reviewer for identifying these issues and we have inserted statistical analyses into the text as follows: 

      Figure 1C-D: Statistical analysis added to the legend of Fig. 1.  

      FIgure 2A: Statistical analysis added to the legend of Fig. 2.

      Figures 2B: These are representative FACS scatter plots –  the corresponding statistical analysis is shown in Fig. 2C (left panel).  

      Figure 3A: Statistical comparisons are not relevant to this figure – the data is presented to document the cell sorting enrichment process.

      Figure 4A and Figure 5C-D:  For the small n, categorical data sets related to the studies using GEM prostate cancer models shown in Figures 4A, 5C and 5D, we employed the exact binomial test to determine the Clopper-Pearson confidence interval for the proportion and Fisher’s exact test to determine the p-values and now present these analyses in a new Supplementary Table 3.  We have included this information in the M&M section and edited the Figure legends to direct the reader to the new Supplementary Table.  

      We would like to emphasize that the reported p-values are exact probabilities from Fisher’s exact test. Given the small sample sizes and the discrete nature of the distribution, these values should not be interpreted as if they strictly conform to conventional thresholds such as p<0.05. Instead, they represent the exact probability of observing data as extreme as (or more extreme than) what we obtained under the null hypothesis.

      Figure 5A: The legend of Fig. 5A was edited to clarify the statistical analysis.  

      Figure 7B: The differences in CD8+ T cells and F4/80 macrophages due to CCR2a-35d treatment were not statistically different (p>0.05) - we have now stated this explicitly in the figure legend.  

      (2) Several experiments either lack appropriate controls or the choice of data presentation is confusing. In Figure 4A vehicle controls should 

      We have not observed any effect of IP administration of vehicle in any experiments across multiple published studies employing these GEMMs, and so we conclude that the injection of vehicle is very unlikely to modify the outcome of these experiments.

      be included in the graphs and for ease of interpretation perhaps average tumor growth should be shown with individual tumor growth can be shown in the supplement. In Figure 5 the vehicle control is missing and in Figure 5D 4 out of 5 CX+vehicle tumors are said to have recurred but the trend line in the graph shows otherwise.

      We thank the reviewer for noting this issue - the color designations were inadvertently reversed in the legend text.  This error has been corrected in the revised version of the manuscript.  

      In Figure 8B flow cytometry would actually be more convincing than scRNAseq. If scRNAseq is chosen, a higher quality UMAP or t_SNE plot is needed with a broader color palette.

      We did consider the FACS approach suggested by the reviewer, but decided against it as we could not readily identify and validate a TAM-specific antibody to allow such measurements. 

      Reviewer #3 (Recommendations For The Authors):

      (1)  A clear description of the GEM model experiments will be helpful in interpreting the data as it is unclear what age the PTEN or MYC mice were when therapy was started. PTEN are generally intrinsically resistant to ARi whereas MYC are robustly sensitive.

      (2) Prostate organoid technology of the GEM prostate cell, and normal prostate cells may allow for a better evaluation of which basal stem-like cells are expressing TNF - dissecting out normal basal from cancer with basal-like properties.

      (3) Experiments to demonstrate targeting inhibition should be performed for AR and TNF inhibition. Especially across the spectrum of TNF blockade timing given the differences in proposed responsiveness over an acute change in dosing schedule.

      (4) Detailed histology and pathologic evaluation should be provided to characterize the impact on cancer and TME as well as normal prostate mixed in these tumors.

      (5) Prostate organoid development with genetic manipulation (PTEN ko) and transplant back into immunocompetent mice may provide experiments to prove causality and address the impact on the immune microenvironment.

      (6) The descriptive of regression and recurrence need to be defined as based on the kinetics and presented data this seems to be associated with minimal responsiveness and progression from a substantial volume of persistent cells.

      (7) The authors should also explore the impact of TNF inhibition on the cancer cell directly and evaluate downstream PI3K signaling.

      Responding to this set of recommendations:  A number of these recommendations (R3-7, -9, -12) are similar or identical to those already noted in Reviewer 3’s public review and have been addressed above.  The remaining recommendations (R3-8, -10, -11; organoids, histological approaches to the TME, etc.) are potentially interesting experimental approaches but beyond the scope of the current manuscript.  

      Reviewer #4 (Recommendations For The Authors):

      Major comments:

      (1) Figure 1A-B: While the decrease in tumor growth post-castration is apparent, the increase in tumor growth that has been designated as the point of androgen-independence is a mild increase from the 28 measurements and would benefit from statistical support. Further time points demonstrating that the tumors continue to increase in size would better support the claim that these tumors appropriately model disease recurrence.

      This data meets our criteria for recurrence (outlined in the M&M and in the legend to Table S2).

      (2) Figure 2A: Statistical analysis should be performed and why is this figure shown twice (also in the S2A right panel)?

      We added statistical analysis to the legend of Fig. 2A.  The data from Fig 2 (C4-2 cell line) is replicated in Supplementary Fig S2 to allow the reader to directly compare the response of the C4-2 cell line with the response of the LNCaP cell line.   

      (3) Figure 4A: Non-castrate + etan control is needed here. Also, the data should be statistically assessed.

      Regarding non-castrate controls, see our response to R4-2.  Statistical analysis has been added - see Supplementary Table S3.   

      (4) It appears that at least two of the mice shown in Figure 5C have the same level of disease recurrence as was demonstrated in Figure 1B, yet the analysis defines recurrence in 0/6 mice.

      Again, similar to R4-7, None of the mice in Figure 5C meet our criteria for recurrence (outlined in the M&M and in the legend to Table S2).

      (5) The text for Figure 5D states that vehicle-treated tumors (red) regress then recur while mice pre-treated with a CCR2 antagonist (blue) don't recur, but in the figure, these groups appear to be reversed. In addition, it would be good to have noncastrate + CCR2a control for Figure 5C and 5D.

      We corrected the labeling error in the legend to Figure 5.

      (6) It would be good to validate major RNAseq findings using orthogonal approaches.

      We agree that it is valuable to validate our findings but these experiments are beyond the scope of the manuscript

      (7) Figure 7B is quite puzzling. It appears to show the opposite of what was written.

      We thank the reviewer for bringing this error to our attention.  Our internal review of previous versions of the manuscript showed that the corresponding author (JJK) inadvertently mis-edited this figure when preparing the BioRxiv submission.  Figure 7B has been corrected and now aligns with the Results text. We have also appended a PDF documenting the editing error/ mistake.  

      (8) Figure 8: This experiment appears to have been done without replicates making the current interpretation questionable.

      A more detailed scRNAseq analysis of the GEMM response to castration (with replicated) is already underway.  The analysis in Fig. 8 includes 1000’s of cells, capturing the variation in mRNA levels.  However, it does not capture animal-to-animal variation.  Given the supporting role of this data in this manuscript, we believe that the single animal approach is adequate in this case.  

      (9) The level of detail included in the mechanism described in Figure S8 is not supported by the work shown.

      Fig. S8 is not presented as a summary of our findings but as a model that is consistent with our data - since it is by definition somewhat speculative, we present it in the supplementary data.   

      Minor Comments:

      (1) Figure 6S title is written incorrectly.

      We thank the reviewer for noticing this - we have corrected this in the revised manuscript.

      (2) Images shown in Figure S7C need scale bars.

      These images are at 40X magnification - this has been added to the legend.

    1. eLife Assessment

      This useful study uses a combination of experimental and modeling approaches to investigate the role of actomyosin in epithelial invagination during Ciona siphon tube morphogenesis. Several types of solid quantitative analyses are presented, yet the evidence supporting the central claim of bidirectional translocation of actomyosin remains incomplete. Since epithelial invagination contributes to the morphogenesis of many developing organs, this work has the potential to appeal to both cell biologists and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the physical basis of epithelial invagination in the morphogenesis of the ascidian siphon tube. The authors observe changes in actin and myosin distribution during siphon tube morphogenesis using fixed specimens and immunohistochemistry. They discover that there is a biphasic change in the actomyosin localization that correlates with changes in cell shapes. Initially, there is the well-known relocation of actomyosin from the lateral sides to the apical surface of cells that will invaginate, accompanied by a concomitant lengthening of the central cells within the invagination, but not a lot of invagination. Coincident with a second, more rapid, phase of invagination, the authors see a relocalization of actomyosin back to the lateral sides of the cells. This 2nd "bidirectional" relocation of actin appears to be important because optogenetic inhibition of myosin in the lateral domain after the initial invaginations phase resulted in a block of further invagination. Although not noted in the paper, that the second phase of siphon invagination is dependent on actomyosin is interesting and important because it has been shown that during Drosophila mesoderm invagination that a second "folding" phase of invagination is independent of actomyosin contraction (Guo et al. elife 2022), so there appear to be important differences between the Drosophila mesoderm system and the ascidian siphon tube systems.

      Using the experimental data, the authors create a vertex model of the invagination, and simulations reveal a coupled mechanism of apicobasal tension imbalance and lateral contraction that creates the invagination. The resultant model appears to recapitulate many aspects of the observed cell behaviors, although there are some caveats to consider (described below).

      Strengths:

      The studies and presented results are well done and provide important insights into the physical forces of epithelial invagination, which is important because invaginations are how a large fraction of organs in multicellular organisms are formed.

      Weaknesses:

      (1) This reviewer has concerns about two aspects of the computational model. First, the model in Figure 5D shows a simulation of a flat epithelial sheet creating an invagination. However, the actual invagination is occurring in a small embryo that has significant curvature, such that nine or so cells occupy a 90-degree arc of the 360-degree circle that defines the embryo's cross-section (e.g., see Figure 1A). This curvature could have important effects on cell behavior.

      (2) The second concern about the model is that Figure 5 D shows the vertex model developing significant "puckering" (bulging) surrounding the invagination. Such "puckering" is not seen in the in vivo invagination (Figure 1A, 2A). This issue is not discussed in the text, so it is unclear how big an issue this is for the developed model, but the model does not recapitulate all aspects of the siphon invagination system.

      (3) In Figure 2A, Top View, and the schematic in Figure 2C, the developing invagination is surrounded by a ring of aligned cell edges characteristic of a "purse string" type actomyosin cable that would create pressure on the invaginating cells, which has been documented in multiple systems. Notably, the schematic in Figure 2C shows myosin II localizing to aligned "purse string" edges, suggesting the purse string is actively compressing the more central cells. If the purse string consistently appears during siphon invagination, a complete understanding of siphon invagination will require understanding the contributions of the purse string to the invagination process.

      (4) The introduction and discussion put the work in the context of work on physical forces in invagination, but there is not much discussion of how the modeling fits into the literature.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose that bidirectional translocation of actomyosin drives tissue invagination in Ciona siphon tube formation. They suggest a two-stage model where actomyosin first accumulates apically to drive a slow initial invagination, followed by translocation to lateral domains to accelerate the invagination process through cell shortening. They have shown that actomyosin activity is important for invagination - modulation of myosin activity through expression of myosin mutants altered the timing and speed of invagination; furthermore, optogenetic inhibition of myosin during the transition of the slow and fast stages disrupted invagination. The authors further developed a vertex model to validate the relationship between contractile force distribution and epithelial invagination.

      Strengths:

      (1) The authors employed various techniques to address the research question, including optogenetics, the use of MRLC mutants, and vertex modelling.

      (2) The authors provide quantitative analyses for a substantial portion of their imaging data, including cell and tissue geometry parameters as well as actin and myosin distributions. The sample sizes used in these analyses appear appropriate.

      (3) The authors combined experimental measurements with computer modeling to test the proposed mechanical models, which represents a strength of the study. It provides a framework to explore the mechanical principles underlying the observed morphogenesis.

      Weaknesses:

      (1) The concept of coordinated and sequential action of apical and lateral actomyosin in support of epithelial folding has been documented through a combination of experimental and modeling approaches in other contexts, such as ascidian endoderm invagination (PMID: 20691592) and gastrulation in Drosophila (PMIDs: 21127270, 22511944, 31273212). While the manuscript addresses an important question, related findings have been reported in these previous studies. This overlap reduces the degree of novelty, and it remains to be clarified how their work advances beyond these prior contributions.

      (2) One of the central statements made by the authors is that the translocation of actomyosin between the apical and lateral domains mediates invagination. The use of the term "translocation" infers that the same actomyosin structures physically move from one location to another location, which is not demonstrated by the data. Given the time scale of the process (several hours), it is also possible that the observed spatiotemporal patterns of actomyosin intensity result from sequential activation/assembly and inactivation/disassembly at specific locations on the cell cortex, rather than from the physical translocation of actomyosin structures over time.

      (3) Some aspects of the data on actomyosin localization require further clarification. (1) The authors state that actomyosin translocation is bidirectional, first moving from the lateral domain to the apical domain; however, the reduction of the lateral actomyosin at this step was not rigorously tested. (2) During the slow invagination stage, it is unclear whether myosin consistently localizes to the apical cell-cell borders or instead relocalizes to the medioapical domain, as suggested by the schematic illustration presented in Figure 2C. (3) It is unclear how many cells along the axis orthogonal to the furrow accumulate apical and lateral myosin.

      (4) The overexpression of MRLC mutants appears to be rather patchy in some cases (e.g., in Figure 3A, 17.0 hpf, only cells located at the right side of the furrow appeared to express MRLC T18ES19E). It is unclear how such patchy expression would impact the phenotype.

      (5) In the optogenetic experiment, it appears that after one hour of light stimulation, the apical side of the tissue underwent relaxation (comparing 17 hpf and 16 hpf in Figure 4B). It is therefore unclear whether the observed defect in invagination is due to apical relaxation or lack of lateral contractility, or both. Therefore, the phenotype is not sufficient to support the authors' statement that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".

      (6) The vertex model is designed to explore how apical and lateral tensions contribute to distinct morphological outcomes. While the authors raise several interesting predictions, these are not further tested, making it unclear to what extent the model provides new insights that can be validated experimentally. In addition, modeling the epithelium as a flat sheet and not accounting for cell curvature is a simplification that may limit the model's accuracy. Finally, the model does not fully recapitulate the deeply invaginated furrow configuration as observed in a real embryo (comparing 18 hpf in Figure 5D and 18 hpf in Figure 1A) and does not fully capture certain mutant phenotypes (comparing 18 hpf in Figure 5F and 18 hpf in Figure 3B right panel).

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript by Qiao et al., the authors seek to uncover force and contractility dynamics that drive tissue morphogenesis, using the Ciona atrial siphon primordium as a model. Specifically, the authors perform a detailed examination of epithelial folding dynamics. Generally, the authors' claims were supported by their data, and the conceptual advances may have broader implications for other epithelial morphogenesis processes in other systems.

      Strengths:

      The strengths of this manuscript include the variety of experimental and theoretical methods, including generally rigorous imaging and quantitative analyses of actomyosin dynamics during this epithelial folding process, and the derivation of a mathematical model based on their empirical data, which they perturb in order to gain novel insights into the process of epithelial morphogenesis.

      Weaknesses:

      There are concerns related to wording and interpretations of results, as well as some missing descriptions and details regarding experimental methods.

    5. Author response:

      Reviewing Editor Comments:

      Based on the feedback from the reviewers, a focus on the following major points has the potential to improve the overall assessment of the significance of the findings and the strength of the evidence:

      (1) It would be helpful to clearly articulate how these findings advance the field beyond what has already been demonstrated or suggested in other systems.

      We will revise the Introduction and Discussion to better contextualize our findings. We will provide a careful comparison of the Ciona atrial siphon invagination with the other established systems to elucidate the unique aspects of our model. Highlighting our discovery of a novel bidirectional "lateral-apical-lateral" contractility as a distinct mechanical paradigm for sequential morphogenesis.

      (2) It would be helpful to clarify the meaning of "translocation" and more explicitly describe the temporal and spatial patterns of active myosin localization during the two steps of invagination.

      We will replace “translocation” with the more accurate and conservative term “redistribution” throughout the manuscript, including in the title. We will also revise the text in Result and Discussion sections to avoid overinterpretation. To provide a more explicit description of the spatiotemporal patterns, we will add new quantitative analyses of active myosin intensity from earlier time points (13-14 hpf) to rigorously support the initial lateral-to-apical redistribution phase. Then, we will add high-resolution top-view images to unambiguously show the ring-like localization of myosin at the apical cell-cell junctions during the initial stage. Finally, we will correct the schematic in Figure 2C to accurately reflect the predominant localization of active myosin at the apical cell-cell borders.

      (3) It would be helpful to explain how the optogenetic data support the conclusion that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".

      We acknowledge the limitation of the original global inhibition experiment. We will perform additional experiments that combine optogenetic inhibition with subsequent immunostaining of the active myosin. By quantitatively comparing the distribution of actomyosin in light-stimulated versus dark-control embryos, we will be able to demonstrate whether the inhibition prevents the establishment of the lateral contractility domain. This will allow us to refine our conclusion.

      (4) It would be helpful to describe how the modeling work fits within the existing literature on modeling epithelial folding and to address discrepancies between the model and the actual biological observations, such as tissue curvature, limited invagination depth in the model, and the "puckering" surrounding the invagination. In addition, certain descriptions of the modeling results should be clarified, as suggested by Reviewer #3.

      We fully agree that we should discuss the existing theoretical work on epithelial folding more clearly. Clarifying how physical forces contribute to invagination is central to interprete the underlying mechanisms, and we appreciate the opportunity to better connect our framework to existing studies. In the revision, we will expand the Introduction and Discussion to place our model in the appropriate theoretical context and highlight how it relates to and differs from previous approaches. At the same time, we will extend the model to a curved geometric framework to more accurately reproduce the experimental observations, which will improve its predictive value. We will also revise the descriptions and schematic representations of the modeling results to enhance clarity and better align them with the biological data.

      (5) It would be helpful to elaborate on the methods for quantitative image analysis and statistical tests.

      We will thoroughly expand the Methods section to provide a detailed step-by-step description of image quantification procedures, including precise definitions of the apical, lateral, and basal domains used for intensity measurements and the measurement of cell surface areas and invagination depths.

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the physical basis of epithelial invagination in the morphogenesis of the ascidian siphon tube. The authors observe changes in actin and myosin distribution during siphon tube morphogenesis using fixed specimens and immunohistochemistry. They discover that there is a biphasic change in the actomyosin localization that correlates with changes in cell shapes. Initially, there is the well-known relocation of actomyosin from the lateral sides to the apical surface of cells that will invaginate, accompanied by a concomitant lengthening of the central cells within the invagination, but not a lot of invagination. Coincident with a second, more rapid, phase of invagination, the authors see a relocalization of actomyosin back to the lateral sides of the cells. This 2nd "bidirectional" relocation of actin appears to be important because optogenetic inhibition of myosin in the lateral domain after the initial invaginations phase resulted in a block of further invagination. Although not noted in the paper, that the second phase of siphon invagination is dependent on actomyosin is interesting and important because it has been shown that during Drosophila mesoderm invagination that a second "folding" phase of invagination is independent of actomyosin contraction (Guo et al. elife 2022), so there appear to be important differences between the Drosophila mesoderm system and the ascidian siphon tube systems.

      Using the experimental data, the authors create a vertex model of the invagination, and simulations reveal a coupled mechanism of apicobasal tension imbalance and lateral contraction that creates the invagination. The resultant model appears to recapitulate many aspects of the observed cell behaviors, although there are some caveats to consider (described below).

      We sincerely thank you for this insightful comment and for bringing the important study by Guo et al. (2022) to our attention. We fully agree that a direct comparison between these two mechanisms is important of our findings. As you astutely point out, the fundamental difference lies in the autonomy and driving force of the second, rapid invagination phase. To highlight this important conceptual advance, we will add a dedicated paragraph in the Discussion section to explicitly discuss this point.

      Strengths:

      The studies and presented results are well done and provide important insights into the physical forces of epithelial invagination, which is important because invaginations are how a large fraction of organs in multicellular organisms are formed.

      Thank you for this positive assessment and for recognizing the significance of our work in elucidating the physical mechanisms underlying fundamental morphogenetic processes. We have striven to provide a comprehensive and rigorous analysis, and are grateful for this encouraging feedback.

      Weaknesses:

      (1) This reviewer has concerns about two aspects of the computational model. First, the model in Figure 5D shows a simulation of a flat epithelial sheet creating an invagination. However, the actual invagination is occurring in a small embryo that has significant curvature, such that nine or so cells occupy a 90-degree arc of the 360-degree circle that defines the embryo's cross-section (e.g., see Figure 1A). This curvature could have important effects on cell behavior.

      Thank you for bringing up the issue of tissue curvature. In this initial version of the model, we treated the tissue as flat because although the anterior epidermis indeed has significant curvature, the region that actually undergoes invagination occupies only a small arc of the embryo's cross-section—roughly 30-degree arc of the 360-degree circle. In addition, the embryo elongates anisotropically, and by 16.5 hpf the curvature has largely diminished (Fig.1A), leaving this local region effectively flattened. We agree that this simplification may overlook contributions from early curvature, and we will examine curvature changes more carefully in the data and incorporate curved geometry into the model to evaluate their impact.

      (2) The second concern about the model is that Figure 5 D shows the vertex model developing significant "puckering" (bulging) surrounding the invagination. Such "puckering" is not seen in the in vivo invagination (Figure 1A, 2A). This issue is not discussed in the text, so it is unclear how big an issue this is for the developed model, but the model does not recapitulate all aspects of the siphon invagination system.

      Thank you for pointing out the issue regarding the accuracy of the deformation pattern in our simulations. We do observe a mild puckering in vivo around 17 hpf (Fig. 1A), but it is clearly less pronounced than in the current model. The presence of such deformation suggests that bending stiffness of the epithelial sheet contributes to the mechanics of the invagination, which is included in our current model. While the discrepancy reflects limitations in our mechanical assumptions and geometric simplifications, including oversimplified interactions between the apical cell layer and the underlying basal cells, as well as the omission of tissue curvature. We will refine these aspects in the revised model to better reproduce the deformation patterns observed in vivo.

      (3) In Figure 2A, Top View, and the schematic in Figure 2C, the developing invagination is surrounded by a ring of aligned cell edges characteristic of a "purse string" type actomyosin cable that would create pressure on the invaginating cells, which has been documented in multiple systems. Notably, the schematic in Figure 2C shows myosin II localizing to aligned "purse string" edges, suggesting the purse string is actively compressing the more central cells. If the purse string consistently appears during siphon invagination, a complete understanding of siphon invagination will require understanding the contributions of the purse string to the invagination process.

      Thank you for this excellent observation. We agree that the ring-like actomyosin structure is a prominent feature during the initial stages of invagination, and its potential role warrants discussion. We carefully re-examined our data. Our analysis confirms that this myosin ring is most pronounced during the early initial invagination stage (approximately 13-14 hpf). This inward compression from the periphery would work in concert with apical constriction to help shape the initial invagination. However, this ring-like myosin pattern significantly diminishes in the accelerated invagination stage. We feel that the purse string may play a collaborative role in the early phase, however, its dissolution at the accelerated invagination stage indicates that Ciona atrial siphon invagination does not entirely rely on the sustained compression from the purse string of surrounding cells. These data will be included in the supplementary materials.

      (4) The introduction and discussion put the work in the context of work on physical forces in invagination, but there is not much discussion of how the modeling fits into the literature.

      We apologize for not providing sufficient context on how our theoretical framework relates to prior work on the mechanics of invagination. You are absolutely right that the Introduction and Discussion sessions should more clearly situate our model within the existing literature, including the classical formulations it builds upon and the more recent models that address similar morphogenetic processes. In the revision, we will expand this section to acknowledge relevant work, clarify how our approach connects to and differs from previous models, and explicitly discuss the strengths and limitations of our framework. We appreciate this helpful suggestion and will make these connections much clearer.

      Reviewer #2 (Public review):

      Summary:

      The authors propose that bidirectional translocation of actomyosin drives tissue invagination in Ciona siphon tube formation. They suggest a two-stage model where actomyosin first accumulates apically to drive a slow initial invagination, followed by translocation to lateral domains to accelerate the invagination process through cell shortening. They have shown that actomyosin activity is important for invagination - modulation of myosin activity through expression of myosin mutants altered the timing and speed of invagination; furthermore, optogenetic inhibition of myosin during the transition of the slow and fast stages disrupted invagination. The authors further developed a vertex model to validate the relationship between contractile force distribution and epithelial invagination.

      Thank you for your thoughtful and accurate summary of our work and for your constructive critique.

      Strengths:

      (1) The authors employed various techniques to address the research question, including optogenetics, the use of MRLC mutants, and vertex modelling.

      (2) The authors provide quantitative analyses for a substantial portion of their imaging data, including cell and tissue geometry parameters as well as actin and myosin distributions. The sample sizes used in these analyses appear appropriate.

      (3) The authors combined experimental measurements with computer modeling to test the proposed mechanical models, which represents a strength of the study. It provides a framework to explore the mechanical principles underlying the observed morphogenesis.

      We are grateful for your positive assessment of the multidisciplinary approaches, quantitative analyses, and the integration of modeling with experiments.

      Weaknesses:

      (1) The concept of coordinated and sequential action of apical and lateral actomyosin in support of epithelial folding has been documented through a combination of experimental and modeling approaches in other contexts, such as ascidian endoderm invagination (PMID: 20691592) and gastrulation in Drosophila (PMIDs: 21127270, 22511944, 31273212). While the manuscript addresses an important question, related findings have been reported in these previous studies. This overlap reduces the degree of novelty, and it remains to be clarified how their work advances beyond these prior contributions.

      We thank you for raising this important point regarding the novelty of our work and for directing us to the key literature on ascidian endoderm invagination (PMID: 20691592) and Drosophila gastrulation (PMIDs: 21127270, 22511944, 31273212). We agree with the reviewer that the sequential activation of contractility in different cellular domains is a fundamental mechanism driving epithelial morphogenesis, as elegantly demonstrated in these prior studies. Our work builds upon this foundational concept. However, we believe we reveals a novel and distinct mechanical model: The ascidian endoderm and the atrial siphon involve a sequential shift of actomyosin contractility. However, the spatial pattern and functional outcomes are fundamentally different. In the ascidian endoderm (PMID: 20691592), the transition is from apical constriction to basolateral contraction. Basolateral contraction works in concert with a persistent circumferential to overcome tissue resistance and drive invagination. In contrast, our study of the atrial siphon reveals a bidirectional actomyosin redistribution between the apical and lateral domains. The basal domain in our system appears to play a more passive, structural role. While, Drosophila gastrulation also involves apical and lateral myosin, the mechanisms and dependencies differ. As supported by recent work (Guo et al. elife 2022), ventral furrow invagination can proceed even when lateral contractility is compromised, indicating that it is not an absolute requirement. In our system, however, optogenetic inhibition and our vertex model strongly suggest that the acquisition of lateral contractility is essential for the accelerated invagination stage. We will revise the text to better articulate these points of distinction and novelty in the Introduction and Discussion sections.

      (2) One of the central statements made by the authors is that the translocation of actomyosin between the apical and lateral domains mediates invagination. The use of the term "translocation" infers that the same actomyosin structures physically move from one location to another location, which is not demonstrated by the data. Given the time scale of the process (several hours), it is also possible that the observed spatiotemporal patterns of actomyosin intensity result from sequential activation/assembly and inactivation/disassembly at specific locations on the cell cortex, rather than from the physical translocation of actomyosin structures over time.

      Your critique regarding the term "translocation" was well-founded. We will replace “translocation” with the more accurate and conservative term “redistribution” throughout the manuscript, including in the title. We will also revise the text in the Results and Discussion sections to avoid overinterpretation.

      (3) Some aspects of the data on actomyosin localization require further clarification. (1) The authors state that actomyosin translocation is bidirectional, first moving from the lateral domain to the apical domain; however, the reduction of the lateral actomyosin at this step was not rigorously tested. (2) During the slow invagination stage, it is unclear whether myosin consistently localizes to the apical cell-cell borders or instead relocalizes to the medioapical domain, as suggested by the schematic illustration presented in Figure 2C. (3) It is unclear how many cells along the axis orthogonal to the furrow accumulate apical and lateral myosin.

      Thank you for your insightful comments, which will help us significantly improve the clarity and rigor of our actomyosin localization analysis. To address the points raised, we will undertake several key revisions: First, we will add new quantitative analyses of active myosin intensity from earlier time points (13-14 hpf) to rigorously support the initial lateral-to-apical redistribution phase. Second, we will correct the schematic in Figure 2C to accurately reflect the predominant localization of active myosin at the apical cell-cell borders. Finally, we will clarify that the actomyosin redistribution occurs within a broader domain of approximately 15-20 cells in the invagination primordium, not being restricted to the single central cell on which our quantitative measurements were focused.

      (4) The overexpression of MRLC mutants appears to be rather patchy in some cases (e.g., in Figure 3A, 17.0 hpf, only cells located at the right side of the furrow appeared to express MRLC T18ES19E). It is unclear how such patchy expression would impact the phenotype.

      Thank you for your observation. We acknowledge that mosaic expression is common in Ciona electroporation. For all quantitative analyses, we only selected embryos in which the central cell, along with more than half of the surrounding cells in the primordium, showed clear expression of the plasmid.

      (5) In the optogenetic experiment, it appears that after one hour of light stimulation, the apical side of the tissue underwent relaxation (comparing 17 hpf and 16 hpf in Figure 4B). It is therefore unclear whether the observed defect in invagination is due to apical relaxation or lack of lateral contractility, or both. Therefore, the phenotype is not sufficient to support the authors' statement that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".

      We agree that our optogenetic inhibition experiment does not distinguish between apical and lateral roles. To directly address this point, we will perform additional experiments in which we conduct the optogenetic inhibition and subsequently fix and stain the embryos for active myosin and F-actin. This will allow us to quantitatively compare the distribution of actomyosin in the light-stimulated experimental group versus the dark control group. We expect that light activation will have a more pronounced inhibitory effect on the lateral domains than on the apical domain, as the latter is naturally undergoing a reduction in contractility at this stage.

      (6) The vertex model is designed to explore how apical and lateral tensions contribute to distinct morphological outcomes. While the authors raise several interesting predictions, these are not further tested, making it unclear to what extent the model provides new insights that can be validated experimentally. In addition, modeling the epithelium as a flat sheet and not accounting for cell curvature is a simplification that may limit the model's accuracy. Finally, the model does not fully recapitulate the deeply invaginated furrow configuration as observed in a real embryo (comparing 18 hpf in Figure 5D and 18 hpf in Figure 1A) and does not fully capture certain mutant phenotypes (comparing 18 hpf in Figure 5F and 18 hpf in Figure 3B right panel).

      Thank you for raising these important points. We agree that several model predictions require stronger experimental grounding, and that the flat-sheet assumption is an oversimplification that likely contributes to the model not fully capturing certain morphological features. Our current simulations of myosin perturbation are largely consistent with the optogenetic experiments and the behavior of the myosin mutant. However, the predictions obtained by theoretically decoupling apical and lateral tension are difficult to validate experimentally, given the challenges of selectively manipulating these two components in vivo. Based on your helpful suggestions, we will extend the model to incorporate tissue curvature and examine how initial bending influences the mechanics of invagination, which we expect will improve the accuracy of the model’s morphological predictions.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript by Qiao et al., the authors seek to uncover force and contractility dynamics that drive tissue morphogenesis, using the Ciona atrial siphon primordium as a model. Specifically, the authors perform a detailed examination of epithelial folding dynamics. Generally, the authors' claims were supported by their data, and the conceptual advances may have broader implications for other epithelial morphogenesis processes in other systems.

      Thank you for your positive summary and for recognizing the broader implications of our work.

      Strengths:

      The strengths of this manuscript include the variety of experimental and theoretical methods, including generally rigorous imaging and quantitative analyses of actomyosin dynamics during this epithelial folding process, and the derivation of a mathematical model based on their empirical data, which they perturb in order to gain novel insights into the process of epithelial morphogenesis.

      Thank you for highlighting the strengths of our multidisciplinary methodology.

      Weaknesses:

      There are concerns related to wording and interpretations of results, as well as some missing descriptions and details regarding experimental methods.

      We will revise the manuscript to address your concerns regarding wording and methodological details. Your feedback led us to improve clarity, precision, and the depth of methodological description throughout the text.

    1. eLife Assessment

      This important study presents a technically rigorous and carefully controlled analysis of the signalling potential of cancer-associated gain-of-function Notch alleles. The work is clearly presented, and the experiments are robust, comprehensive, and well-controlled. While some data primarily establish the system or report negative findings, the comparative approach in a well-characterized model provides convincing mechanistic evidence for how these Notch variants function. This study will be of interest to researchers in both developmental and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      In their paper, Shimizu and Baron describe the signaling potential of cancer gain-of-function Notch alleles using the Drosophila Notch transfected in S2 cells. These cells do not express Notch or the ligand Dl or Dx, which are all transfected. With this simple cellular system, the authors have previously shown that it is possible to measure Notch signaling levels by using a reporter for the 3 main types of signaling outputs, basal signaling, ligand-induced signaling and ligand-independent signaling regulated by deltex. The authors proceed to test 22 cancer mutations for the above-mentioned 3 outputs. The mutation is considered a cluster in the negative regulatory region (NRR) that is composed of 3 LNR repeats wrapping around the HD domain. This arrangement shields the S2 cleavage site that starts the activation reaction.

      The main findings are:

      (1) Figure 1: the cell system can recapture ectopic activation of 3 existing Drosophila alleles validated in vivo.

      (2) Figure 2: Some of the HD mutants do show ectopic activation that is not induced by Dl or Dx, arguing that these mutations fully expose the S2 site. Some of the HD mutants do not show ectopic activation in this system, a fact that is suggested to be related to retention in the secretory pathway.

      (3) Figure 3: Some of the LNR mutants do show ectopic activation that is induced by Dl or Dx, arguing that these might partially expose the S2 site.

      (4) Figure 4-6: 3 sites of the LNR3 on the surface that are involved in receptor heterodimerization, if mutated to A, are found to cause ectopic activation that is induced by Dl or Dx. This is not due to changes in their dimerization ability, and these mutants are found to be expressed at a higher level than WT, possibly due to decreased levels of protein degradation.

      Strengths and Weaknesses:

      The paper is very clearly written, and the experiments are robust, complete, and controlled. It is somewhat limited in scope, considering that Figure 1 and 5 could be supplementary data (setup of the system and negative data). However, the comparative approach and the controlled and well-known system allow the extraction of meaningful information in a field that has struggled to find specific anticancer approaches. In this sense, the authors contribute limited but highly valuable information.

    3. Reviewer #2 (Public review):

      Summary:

      This ambitious study introduced 22 mutations corresponding to amino acid substitution mutations known to induce cancer in human Notch1, located within the Negative Regulatory Region, into the Drosophila Notch gene. It comprehensively examined their effects on activity, intracellular transport, protein levels, and stability. The results revealed that the impact of amino acid substitutions within the Negative Regulatory Region can be grouped based on their location, differing between the Heterodimerization Domain and the Lin12/Notch Repeat. These findings provide important insights into elucidating the mechanisms by which amino acid substitution mutations in human Notch1 cause leukemia and cancer.

      Strengths:

      In this study, the authors successfully measured the activity of amino acid-substituted Notch with high precision by effectively leveraging the advantages of their previously established experimental system. Furthermore, they clearly demonstrated ligand-dependent and Deltex-dependent properties.

      Weaknesses:

      Amino acid substitution mutations exhibit interesting effects depending on their position, so interest naturally turns to the mechanisms generating these differences. Unfortunately, however, elucidating these mechanisms will require considerable time in the future. Therefore, it is reasonable to conclude that questions regarding the mechanism fall outside the scope of this paper.

    4. Reviewer #3 (Public review):

      Summary:

      Overall, the work is fine; however, I find it very preliminary. To the best of my understanding, to make any claims for altered Notch signaling from this study that is physiologically relevant remains to be discerned.

      Strengths:

      This manuscript systematically analyzes cancer-associated mutations in the Negative Regulatory Region (NRR) of Drosophila Notch to reveal diverse regulatory mechanisms with implications for cancer modelling and therapy development. The study introduces cancer-associated mutations equivalent to human NOTCH1 mutations, covering a broad spectrum across the LNR and HD domains. The authors use rigorous phenotypic assays to classify their functional outcomes. By leveraging the S2 cell-based assay platform, the work identifies mechanistic differences between mutations that disrupt the LNR-HD interface, core HD, and LNR surface domains, enhancing understanding of Notch regulation. The discovery that certain HD and LNR-HD interface mutations (e.g., R1626Q and E1705P) in Drosophila mirror the constitutive activation and synergy with PEST deletion seen in mammalian T-ALL is nice and provides a platform for future cancer modelling. Surface-exposed LNR-C mutations were shown to increase Notch protein stability and decrease turnover, suggesting a previously unappreciated regulatory layer distinct from canonical cleavage-exposure mechanisms. By linking mutant-specific mechanistic diversity to differential signaling properties, the work directly informs targeted approaches for modulating Notch activity in cancer cells.

      Weaknesses:

      While this is indeed an exciting set of observations, the work is entirely cell-line-based, and is the primary reason why this approach dampens the enthusiasm for the study. The analysis is confined to Drosophila S2 cells, which may not fully recapitulate tissue or organism-level regulatory complexity observed in vivo. Some Drosophila HD domain mutants accumulate in the secretory pathway and do not phenocopy human T-ALL mutations. Possibly due to limitations on physiological inputs that S2 cells cannot account for, or species-specific differences such as the absence of S1 cleavage.

      Thus, the findings may not translate directly to understanding Notch 1 function in mammalian cancer models. While the manuscript highlights mechanistic variety, the functional significance of these mutations for hematopoietic malignancies or developmental contexts in live animals remains untested. Overall, the work does not yet provide evidence for altered Notch signaling that is physiologically relevant.

    1. eLife Assessment

      This study investigates the influence of genomic information and timing of vaccine strain selection on the accuracy of influenza A/H3N2 forecasting. The authors utilised appropriate statistical methods and have provided convincing evidence, which amounts to an important contribution to the evidence base. Substantial revisions have been made to the manuscript and issues of concern have been clarified, with the necessary study limitations appropriately discussed.

    2. Reviewer #1 (Public review):

      Summary:

      In the paper, the authors investigate how the availability of genomic information and the timing of vaccine strain selection influence the accuracy of influenza A/H3N2 forecasting. The manuscript presents three key findings:

      (1) Using real and simulated data, the authors demonstrate that shortening the forecasting horizon and reducing submission delays for sharing genomic data improve the accuracy of virus forecasting.

      (2) Reducing submission delays also enhances estimates of current clade frequencies.

      (3) Shorter forecasting horizons, for example allowed by the proposed use of "faster" vaccine platforms such as mRNA, result in the most significant improvements in forecasting accuracy.

      Strengths:

      The authors present a robust analysis, using statistical methods based on previously published genetic based techniques to forecast influenza evolution. Optimizing prediction methods is crucial from both scientific and public health perspectives. The use of simulated as well as real genetic data (collected between April 1, 2005, and October 1, 2019) to assess the effects of shorter forecasting horizons and reduced submission delays is valuable and provides a comprehensive dataset. Moreover, the accompanying code is openly available on GitHub and is well-documented.

      Limitations of the authors genomic-data-only approach are discussed in depth and within the context of existing literature. In particular, the impact of subsampling, necessary for computational reasons in this study, or restriction to Northen/Southern Hemisphere data is explored and discussed.

      Weaknesses:

      Although the authors acknowledge these limitations in their discussion, the impact of the analysis is somewhat constrained by its exclusive reliance on methods using genomic information, without incorporating or testing the impact of phenotypic data. The analysis with respect to more integrative models remains open and the authors do not empirically validate how the inclusion of phenotypic information might alter or impact the findings. Instead, we must rely on the authors' expectation that their findings are expected to hold across different forecasting models, including those integrating both phenotypic and genetic data. This expectation, while reasonable, remains untested within the scope of the current study.

      Comments on latest version:

      Thanks to the authors for the revised version of the manuscript, which addresses and clarifies all of my previously raised points.

      In particular, the exploration of how subsampling of genomic information, hemisphere-specific forecasting, and the check for time dependence potentially influence the findings is now included and adds to the discussion. The manuscript also benefits from a look at these limitations when relying only on genomic data.

      The authors have carefully placed these limitations within the context of existing literature, especially on the raised concern to not include phenotypic data. As a minor comment, the conclusion that the findings potentially stay across different forecasting models, including those integrating both phenotypic and genetic data, rely on the author's expectation. While this expectation might be plausible, it remains to be validated empirically in future work.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary: 

      In the paper, the authors investigate how the availability of genomic information and the timing of vaccine strain selection influence the accuracy of influenza A/H3N2 forecasting. The manuscript presents three key findings: 

      (1) Using real and simulated data, the authors demonstrate that shortening the forecasting horizon and reducing submission delays for sharing genomic data improve the accuracy of virus forecasting. 

      (2) Reducing submission delays also enhances estimates of current clade frequencies. 

      (3) Shorter forecasting horizons, for example, allowed by the proposed use of "faster" vaccine platforms such as mRNA, resulting in the most significant improvements in forecasting accuracy. 

      Strengths: 

      The authors present a robust analysis, using statistical methods based on previously published genetic-based techniques to forecast influenza evolution. Optimizing prediction methods is crucial from both scientific and public health perspectives. The use of simulated as well as real genetic data (collected between April 1, 2005, and October 1, 2019) to assess the effects of shorter forecasting horizons and reduced submission delays is valuable and provides a comprehensive dataset. Moreover, the accompanying code is openly available on GitHub and is well-documented. 

      Thank you for this summary! We worked hard to make this analysis robust, reproducible, and open source.

      Weaknesses: 

      While the study addresses a critical public health issue related to vaccine strain selection and explores potential improvements, its impact is somewhat constrained by its exclusive reliance on predictive methods using genomic information, without incorporating phenotypic data. The analysis remains at a high level, lacking a detailed exploration of factors such as the genetic distance of antigenic sites.

      We are glad to see this acknowledgment of the critical public health issue we've addressed in this project. The goal for this study was to test effects of counterfactual scenarios with realistic public health interventions and not to introduce methodological improvements to forecasting methods. The final forecasting model we analyzed in this study (lines 301-330 and Figure 6) was effectively an "oracle" model that produced the optimal forecast for each given current and future timepoint. We expect any methodological improvements to forecasting models to converge toward the patterns we observed in this final section of the results.

      We've addressed the reviewer's concerns in more detail in response to their numbered comments 4 and 5 below.

      Another limitation is the subsampling of the available dataset, which reduces several tens of thousands of sequences to just 90 sequences per month with even sampling across regions. This approach, possibly due to computational constraints, might overlook potential effects of regional biases in clade distribution that could be significant. The effect of dataset sampling on presented findings remains unexplored. Although the authors acknowledge limitations in their discussion section, the depth of the analysis could be improved to provide a more comprehensive understanding of the underlying dynamics and their effects.

      We have addressed this comment in the numbered comment 1 below.

      Suggestions to enhance the depth of the manuscript: 

      Thank you again for these thoughtful suggestions. They have encouraged us to revisit aspects of this project that we had overlooked by being too close to it and have helped us improve the paper's quality.

      (1) Subsampling and Sampling Strategies: It would be valuable to comment on the rationale behind the strong subsampling of the available GISAID data. A discussion of the potential effects of different sampling strategies is necessary. Additionally, assessing the stability of the results under alternative sequence sampling strategies would strengthen the robustness of the conclusions. 

      We agree with the reviewer's point that our subsampled sequences only represent a fraction of those available in the GISAID EpiFlu database and that a more complete representation would be ideal. We designed the subsampling approach we used in this study for two primary reasons.

      (1) First, we sought to minimize known regional and temporal biases in sequence availability. For example, North America and Europe are strongly overrepresented in the GISAID EpiFlu database, while Africa and Asia are underrepresented (Figure 1A). Additionally, the number of sequences in the database has increased every year since 2010, causing later years in this study period to be overrepresented compared to earlier years. A major limitation of our original forecasting model from Huddleston et al. 2020 is its inability to explicitly estimate geographic-specific clade fitnesses. Because of this limitation, we trained that original model on evenly subsampled sequences across space and time. We used the same approach in this study to allow us to reuse that previously trained forecasting model. Despite this strong subsampling approach, we still selected an average of 50% of all available sequences across all 10 regions and the entire study period (Figure 1B). Europe and North America were most strongly downsampled with only 7% and 8% of their total sequences selected for the study, respectively. In contrast, we selected 91% of all sequences from Southeast Asia.

      (2) Second, our forecasting model relies on the inference of time-scaled phylogenetic trees which are computationally intensive to infer. While new methods like CMAPLE (Ly-Trong et al. 2024) would allow us to rapidly infer divergence trees, methods to infer time trees still do not scale well to more than ~20,000 samples. The subsampling approach we used in this study allowed us to build the 35 six-year H3N2 HA trees we needed to test our forecasting model in a reasonable amount of time.

      We have expanded our description of this rationale for our subsampling approach in the discussion and described the potential effects of geographic and temporal biases on forecasting model predictions (lines 360-376). Our original discussion read:

      "Another immediate improvement would be to develop models that can use all available data in a way that properly accounts for geographic and temporal biases. Current models based on phylogenetic trees need to evenly sample the diversity of currently circulating viruses to produce unbiased trees in a reasonable amount of time. Models that could estimate sample fitness and compare predicted and future populations without trees could use more available sequence data and reduce the uncertainty in current and future clade frequencies."

      The section now reads:

      "Another immediate improvement would be to develop models that can use all available data in a way that properly accounts for geographic and temporal biases. For example, virus samples from North America and Europe are overrepresented in the GISAID EpiFlu database, while samples from Africa and Asia are underrepresented (McCarron et al. 2022). As new H3N2 epidemics often originate from East and Southeast Asia and burn out in North America and Europe (Bedford et al. 2015), models that do not account for this geographic bias are more likely to incorrectly predict the success of lower fitness variants circulating in overrepresented regions and miss higher fitness variants emerging from underrepresented regions. Additionally, the number of H3N2 HA sequences per year in the GISAID EpiFlu database has increased consistently since 2010, creating a temporal bias where any given season a model forecasts to will have more sequences available than the season from which forecasts occur. The model we used in this study does not explicitly account for geographic variability of viral fitness and relies on time-scaled phylogenetic trees which can be computationally costly to infer for large sample sizes. As a result, we needed to evenly sample the diversity of currently circulating viruses to produce unbiased trees in a reasonable amount of time. Models that could estimate viral fitness per geographic region without inferring trees could use more available sequence data and reduce the uncertainty in current and future clade frequencies."

      We also added a brief explanation of our subsampling method to the corresponding section of the methods (lines 411-415). These lines read:

      "This sampling approach accounts for known regional biases in sequence availability through time (McCarron et al. 2022) and makes inference of divergence and time trees computationally tractable. This approach also exactly matches our previous study where we first trained the forecast models used in this study (Huddleston et al. 2020), allowing us to reuse those previously trained models."

      Although our forecast model is limited to a small proportion of sequences that we evenly sample across regions and time, we agree that we could improve the robustness of our conclusions by repeating our analysis for different subsets of the available data. To assess the stability of the results under alternative sequence sampling strategies, we ran a second replicate of our entire analysis of natural H3N2 populations with three times as many sequences per month (270) than our original replicate. With this approach, we selected between 17% (Europe) and 97% (Southeast Asia) of all sequences per region with an average of 72% and median of 83% (Figure 1C). We compared the effects of realistic interventions for this high-density subsampling analysis with the effects from the original subsampling analysis (Figure 6). We have added the results from this analysis to the main text (lines 313-321) which now reads:

      "For natural A/H3N2 populations, the average improvement of the vaccine intervention was 1.1 AAs and the improvement of the surveillance intervention was 0.27 AAs or approximately 25% of the vaccine intervention. The average improvement of both interventions was only slightly less than additive at 1.28 AAs. To verify the robustness of these results, we replicated our entire analysis of A/H3N2 populations using a subsampling scheme that tripled the number of viruses selected per month from 90 to 270 (Figure 1—figure supplement 4C). We found the same pattern with this replication analysis, with average improvements of 0.93 AAs for the vaccine intervention, 0.21 AAs for the surveillance intervention, and 1.14 AAs for both interventions (Figure 6—figure supplement 2)."

      We updated our revised manuscript to include the summary of sequences available and subsampled as Figure 1—figure supplement 4 and the effects of interventions with the high-density analysis as Figure 6—figure supplement 2. For reference, we have included Figure 2 showing both the original Figure 6 (original subsampling) and Figure 6—figure supplement 2 (high-density subsampling).

      (2) Time-Dependent Effects: Are there time-dependent patterns in the findings? For example, do the effects of submission lag or forecasting horizon differ across time periods, such as [2005-2010, +2010-2015,2015-2018]? This analysis could be particularly interesting given the emergence of co-circulation of clades 3c.2 and 3c.3 around 2012, which marked a shift to less "linear" evolutionary patterns over many years in influenza A/H3N2. 

      This is an interesting question that we overlooked by focusing on the broader trends in the predictability of A/H3N2 evolution. The effects of realistic interventions that we report in Figure 6 span future timepoints of 2012-04-01 to 2019-10-01. Since H1N1pdm emerged in 2009 and 3c3 started cocirculating with 3c2 in 2012, we can't inspect effects for the specific epochs mentioned above. However, there have been many periods during this time span where the number of cocirculating clades varied in ways that could affect forecast accuracy. The streamgraph, Author response image 1, shows the variation in clade frequencies from the "full tree" that we used to define clades for A/H3N2 populations.

      Author response image 1.

      Streamgraph of clade frequencies for A/H3N2 populations demonstrating variability of clade cocirculation through time.

      We might expect that forecasting models would struggle to accurately predict future timepoints with higher clade diversity, since much of that diversity would not have existed at the time of the forecast. We might also expect faster surveillance to improve our ability to detect that future variation by detecting those variants at low frequency instead of missing them completely.

      To test this hypothesis, we calculated the Shannon entropy of clade frequencies per future timepoint represented in Figure 6 (under no submission lag) and plotted the change in optimal distance to the predicted future by the entropy per timepoint. If there was an effect of future clade complexity on forecast accuracy, we expected greater improvements from interventions to be associated with higher future entropy.

      There was a trend for some of the greatest improvements per intervention to occur at higher future clade entropy timepoints, but we didn’t find a strong relationship between clade entropy and improvement in forecast accuracy by any intervention (Figure 4). The highest correlation was for improved surveillance (Pearson r=0.24).

      We have added this figure to the revised manuscript as Figure 6—figure supplement 3 and updated the results (lines 321-323) to reflect the patterns we described above. The updated results (which partially includes our response to the next reviewer comment) read:

      "These effects of realistic interventions appeared consistent across the range of genetic diversity at future timepoints (Figure 6—figure supplement 3) and for future seasons occurring in both Northern and Southern Hemispheres (Figure 6—figure supplement 4)."

      (3) Hemisphere-Specific Forecasting: Do submission lags or forecasting horizons show different performance when predicting Northern versus Southern Hemisphere viral populations? Exploring this distinction could add significant value to the analysis, given the seasonal differences in influenza circulation.

      Similar to the question above, we can replot the improvements in optimal distances to the future for the realistic interventions, grouping values by the hemisphere that has an active season in each future timepoint. Much like we expected forecasts to be less accurate when predicting into a highly diverse season, we might also expect forecasts to be less accurate when predicting into a season for a more densely populated hemisphere. Specifically, we expected that realistic interventions would improve forecast accuracy more for Northern Hemisphere seasons than Southern Hemisphere seasons. For this analysis, we labeled future timepoints that occurred in October or January as "Northern" and those that occurred in April or July as "Southern". We plotted effects of interventions on optimal distances to the future by intervention and hemisphere.

      In contrast to our original expectation, we found a slightly higher median improvement for the Southern Hemisphere seasons under both of the interventions that improved the vaccine timeline (Figure 5). The median improvement for the combined intervention was 1.42 AAs in the Southern Hemisphere and 0.93 AAs in the Northern Hemisphere. Similarly, the improvement with the "improved vaccine" intervention was 1.03 AAs in the South and 0.74 AAs in the North. However, the range of improvements per intervention was greater for the Northern Hemisphere across all interventions. The median increase in forecast accuracy was similar for both hemispheres in the improved surveillance intervention, with a single Northern Hemisphere season showing an unusually greater improvement that was also associated with higher clade entropy (Figure 4). These results suggest that both an improved vaccine development timeline and more timely sequence submissions would most improve forecast accuracy for Southern Hemisphere seasons compared to Northern Hemisphere seasons.

      We have added this figure to the revised manuscript as Figure 6—figure supplement 4 and updated the results (lines 321-326) to reflect the patterns we described above. The new lines in the results read:

      "These effects of realistic interventions appeared consistent across the range of genetic diversity at future timepoints (Figure 6—figure supplement 3) and for future seasons occurring in both Northern and Southern Hemispheres (Figure 6—figure supplement 4). We noted a slightly greater median improvement in forecast accuracy associated with both improved vaccine interventions for the Southern Hemisphere seasons (1.03 and 1.42 AAs) compared to the Northern Hemisphere seasons (0.74 and 0.93 AAs)."

      (4) Antigenic Sites and Submission Delays: It would be interesting to investigate whether incorporating antigenic site information in the distance metric amplifies or diminishes the observed effects of submission delays. Such an analysis could provide a first glance at how antigenic evolution interacts with forecasting timelines. 

      This would be an interesting area to explore. One hypothesis along these lines would be that if 1) viruses with more substitutions at antigenic sites are more likely to represent the future population and 2) viruses with more antigenic substitutions originate in specific geographic locations and 3) submissions of sequences for those viruses are more likely to be lagged due to their geographic origin, then 4) decreasing submission lags should improve our forecasting accuracy by detecting antigenically-important sequences earlier. If there is not a direct link between viruses that are more likely to represent the future and higher submission lags, we would not expect to see any additional effect of reducing submission lags for antigenic sites. Based on our work in Huddleston et al. 2020, it is also not clear that assumption 1 above is consistently true, since the specific antigenic sites associated with high fitness change over time. In that earlier work, we found that models based on these antigenic (or "epitope") sites could only accurately predict the future when the relevant sites for viral success were known in advance. This result was shown by our "oracle" model which accurately predicted the future during the model validation period when it knew which sites were associated with success and failed to predict the future in the test period when the relevant sites for success had changed (Figure 6).

      To test the hypothesis above, we would need sequences to have submission lags that reflect their geographic origin. For this current study, we intentionally decoupled submission lags from geographic origin to allow inclusion of historical A/H3N2 HA sequences that were originally submitted as part of scientific publications and not as part of modern routine surveillance. As a result, the original submission dates for many sequences are unrealistically lagged compared to surveillance sequences.

      (5) Incorporation of Phenotypic Data: The authors should provide a rationale for their choice of a genetic-information-only approach, rather than a model that integrates phenotypic data. Previous studies, such as Huddleston et al. (2020, eLife), demonstrate that models combining genetic and phenotypic data improve forecasts of seasonal influenza A/H3N2 evolution. It would be interesting to probe the here observed effects in a more recent model.

      The primary goal of this study was not to test methodological improvements to forecasting models but to test the effects of realistic public health policy changes that could alter forecast horizons and sequence availability. Most influenza collaborating centers use a "sequence-first" approach where they sequence viral isolates first and use those sequences to prioritize viruses for phenotypic characterization (Hampson et al. 2017). The additional lag in availability of phenotypic data means that a forecasting model based on genetic and phenotypic data will necessarily have a greater lag in data availability than a model based on genetic data only. Since the policy changes we're testing in this study only affect the availability of sequence data and not phenotypic data, we chose to test the relative effects of policy changes on sequence-based forecasting models.

      We have updated the abstract (lines 18-26 and 30-32), introduction (lines 87-88), and discussion (lines 332-334) to emphasize the focus of this study on effects of policy changes. The updated abstract lines read as follows with new content in bold:

      "Despite continued methodological improvements to long-term forecasting models, these constraints of a 12-month forecast horizon and 3-month average submission lags impose an upper bound on any model's accuracy. The global response to the SARS-CoV-2 pandemic revealed that the adoption of modern vaccine technology like mRNA vaccines can reduce how far we need to forecast into the future to 6 months or less and that expanded support for sequencing can reduce submission lags to GISAID to 1 month on average. To determine whether these public health policy changes could improve long-term forecasts for seasonal influenza, we quantified the effects of reducing forecast horizons and submission lags on the accuracy of forecasts for A/H3N2 populations. We found that reducing forecast horizons from 12 months to 6 or 3 months reduced average absolute forecasting errors to 25% and 50% of the 12-month average, respectively. Reducing submission lags provided little improvement to forecasting accuracy but decreased the uncertainty in current clade frequencies by 50%. These results show the potential to substantially improve the accuracy of existing influenza forecasting models through the public health policy changes of modernizing influenza vaccine development and increasing global sequencing capacity."

      The updated introduction now reads:

      "These technological and public health policy changes in response to SARS-CoV-2 suggest that we could realistically expect the same outcomes for seasonal influenza."

      The updated discussion now reads:

      "In this work, we showed that realistic public health policy changes that decrease the time to develop new vaccines for seasonal influenza A/H3N2 and decrease submission lags of HA sequences to public databases could improve our estimates of future and current populations, respectively."

      We have also updated the introduction (lines 57-65) and the discussion (lines 345-348) to specifically address the use of sequence-based models instead of sequence-and-phenotype models. The updated introduction now reads:

      "For this reason, the decision process is partially informed by computational models that attempt to predict the genetic composition of seasonal influenza populations 12 months in the future (Morris et al. 2018). The earliest of these models predicted future influenza populations from HA sequences alone (Luksza and Lassig 2014, Neher et al. 2014, Steinbruck et al. 2014). Recent models include phenotypic data from serological experiments (Morris et al. 2018, Huddleston et al. 2020, Meijers et al. 2023, Meijers et al. 2025). Since most serological experiments occur after genetic sequencing (Hampson et al. 2017) and all forecasting models depend on HA sequences to determine the viruses circulating at the time of a forecast, sequence availability is the initial limiting factor for any influenza forecasts."

      The updated discussion now reads:

      "Since all models to date rely on currently available HA sequences to determine the clades to be forecasted, we expect that decreasing forecast horizons and submission lags will have similar relative effect sizes across all forecasting models including those that integrate phenotypic and genetic data."

      Reviewer #2 (Public review): 

      Summary: 

      The authors have examined the effects of two parameters that could improve their clade forecasting predictions for A(H3N2) seasonal influenza viruses based solely on analysis of haemagglutinin gene sequences deposited on the GISAID Epiflu database. Sequences were analysed from viruses collected between April 1, 2005 and October 1, 2019. The parameters they investigated were various lag periods (0, 1, 3 months) for sequences to be deposited in GISAID from the time the viruses were sequenced. The second parameter was the time the forecast was accurate over projecting forward (for 3,6,9,12 months). Their conclusion (not surprisingly) was that "the single most valuable intervention we could make to improve forecast accuracy would be to reduce the forecast horizon to 6 months or less through more rapid vaccine development". This is not practical using conventional influenza vaccine production and regulatory procedures. Nevertheless, this study does identify some practical steps that could improve the accuracy and utility of forecasting such as a few suggested modifications by the authors such as "..... changing the start and end times of our long-term forecasts. We could change our forecasting target from the middle of the next season to the beginning of the season, reducing the forecast horizon from 12 to 9 months.' 

      Strengths: 

      The authors are very familiar with the type of forecasting tools used in this analysis (LBI and mutational load models) and the processes used currently for influenza vaccine virus selection by the WHO committees having participated in a number of WHO Influenza Vaccine Consultation meetings for both the Southern and Northern Hemispheres. 

      Weaknesses: 

      The conclusion of limiting the forecasting to 6 months would only be achievable from the current influenza vaccine production platforms with mRNA. However, there are no currently approved mRNA influenza vaccines, and mRNA influenza vaccines have also yet to demonstrate their real-world efficacy, longevity, and cost-effectiveness and therefore are only a potential platform for a future influenza vaccine. Hence other avenues to improve the forecasting should be investigated. 

      We recognize that there are no approved mRNA influenza vaccines right now. However, multiple mRNA vaccines have completed phase 3 trials indicating that these vaccines could realistically become available in the next few years. A primary goal of our study was to quantify the effects of switching to a vaccine platform with a shorter timeline than the status quo. Our results should further motivate the adoption of any modern vaccine platform that can produce safe and effective vaccines more quickly than the egg-passaged standard. We have updated the introduction (lines 88-91) to note the mRNA vaccines that have completed phase 3 trials. The new sentence in the introduction reads:

      "Work on mRNA vaccines for influenza viruses dates back over a decade (Petsch et al. 2012, Brazzoli et al. 2016, Pardi et al. 2018, Feldman et al. 2019), and multiple vaccines have completed phase 3 trials by early 2025 (Soens et al. 2025, Pfizer 2022)."

      While it is inevitable that more influenza HA sequences will become available over time a better understanding of where new influenza variants emerge would enable a higher weighting to be used for those countries rather than giving an equal weighting to all HA sequences. 

      This is definitely an important point to consider. The best estimates to date (Russell et al. 2008, Bedford et al. 2015) suggest that most successful variants emerge from East or Southeast Asia. In contrast, most available HA sequence data comes from Europe and North America (Figure 1A). Our subsampling method explicitly tries to address this regional bias in data availability by evenly sampling sequences from 10 different regions including four distinct East Asian regions (China, Japan/Korea, South Asia, and Southeast Asia). Instead of weighting all HA sequences equally, this sampling approach ensures that HA sequences from important distinct regions appear in our analysis.

      We have updated our methods (lines 411-423) to better describe the motivation of our subsampling approach and proportions of regions sampled with our original approach (90 viruses per month) and a second high-density sampling approach (270 viruses per month). These new lines read:

      "This sampling approach accounts for known regional biases in sequence availability through time (McCarron et al. 2022) and makes inference of divergence and time trees computationally tractable. This approach also exactly matches our previous study where we first trained the forecast models used in this study (Huddleston et al. 2020), allowing us to reuse those previously trained models. With this subsampling approach, we selected between 7% (Europe) and 91% (Southeast Asia) of all available sequences per region across the entire study period with an average of 50% and median of 52% across all 10 regions (Figure 1—figure Supplement 4). To verify the reproducibility and robustness of our results, we reran the full forecasting analysis with a high-density subsampling scheme that selected 270 sequences per month with the same even sampling across regions and time as the original scheme. With this approach, we selected between 17% (Europe) and 97% (Southeast Asia) of all available sequences per region with an average of 72% sampled and a median of 83% (Figure 1—figure Supplement 4C)."

      We added Figure 1—figure Supplement 4 to document the regional biases in sequence availability and the proportions of sequences we selected per region and year.

      Also, other groups are considering neuraminidase sequences and how these contribute to the emergence of new or potentially predominant clades.

      We agree that accounting for antigenic evolution of neuraminidase is a promising path to improving forecasting models. We chose to focus on hemagglutinin sequences for several reasons, though. First, hemagglutinin is the only protein whose content is standardized in the influenza vaccine (Yamayoshi and Kawaoka 2019), so vaccine strain selection does not account for a specific neuraminidase. Additionally, as we noted in response to Reviewer 1 above, the goal of this study was to test effects of counterfactual scenarios with realistic public health interventions and not to introduce methodological improvements to forecasting models like the inclusion of neuraminidase sequences.

      We have updated the introduction to provide the additional context about hemagglutinin's outsized role in the current vaccine development process (lines 40-44):

      "The dominant influenza vaccine platform is an inactivated whole virus vaccine grown in chicken eggs (Wong and Webby, 2013) which takes 6 to 8 months to develop, contains a single representative vaccine virus per seasonal influenza subtype including A/H1N1pdm, A/H3N2, and B/Victoria (Morris et al., 2018), and for which only the HA protein content is standardized (Yamayoshi and Kawaoka, 2019)."

      We have updated the abstract (lines 18-26 and 30-32), introduction (lines 87-88), and discussion (lines 332-334) to emphasize our goal of testing effects of public health policy changes on forecasting accuracy rather than methodological changes. The updated abstract lines read as follows with new content in bold:

      "Despite continued methodological improvements to long-term forecasting models, these constraints of a 12-month forecast horizon and 3-month average submission lags impose an upper bound on any model's accuracy. The global response to the SARS-CoV-2 pandemic revealed that the adoption of modern vaccine technology like mRNA vaccines can reduce how far we need to forecast into the future to 6 months or less and that expanded support for sequencing can reduce submission lags to GISAID to 1 month on average. To determine whether these public health policy changes could improve long-term forecasts for seasonal influenza, we quantified the effects of reducing forecast horizons and submission lags on the accuracy of forecasts for A/H3N2 populations. We found that reducing forecast horizons from 12 months to 6 or 3 months reduced average absolute forecasting errors to 25% and 50% of the 12-month average, respectively. Reducing submission lags provided little improvement to forecasting accuracy but decreased the uncertainty in current clade frequencies by 50%. These results show the potential to substantially improve the accuracy of existing influenza forecasting models through the public health policy changes of modernizing influenza vaccine development and increasing global sequencing capacity."

      The updated introduction now reads:

      "These technological and public health policy changes in response to SARS-CoV-2 suggest that we could realistically expect the same outcomes for seasonal influenza."

      The updated discussion now reads:

      "In this work, we showed that realistic public health policy changes that decrease the time to develop new vaccines for seasonal influenza A/H3N2 and decrease submission lags of HA sequences to public databases could improve our estimates of future and current populations, respectively."

      Figure 1a. I don't understand why the orange dot 1-month lag appears to be on the same scale as the 3-month/ideal timeline. 

      We apologize for the confusion with this figure. Our original goal was to show how the two factors in our study design (forecast horizons and sequence submission lags) interact with each other by showing an example of 3-month forecasts made with no lag (blue), ideal lag (orange), and realistic lag (green). To clarify these two factors, we have removed the two lines at the 3-month forecast horizon for the ideal and realistic lags and have updated the caption to reflect this simplification. The new figure looks like this:

      The authors should expand on the line "The finding of even a few sequences with a potentially important antigenic substitution could be enough to inform choices of vaccine candidate viruses." While people familiar with the VCM process will understand the implications of this statement the average reader will not fully understand the implications of this statement. Not only will it inform but it will allow the early production of vaccine seeds and reassortants that can be used in conventional vaccine production platforms if these early predictions were consolidated by the time of the VCM. This is because of the time it takes to isolate viruses, make reassortants and test them - usually a month or more is needed at a minimum. 

      Thank you for pointing out this unclear section of the discussion. We have rewritten this section, dropping the mention of prospective measurements of antigenic escape which now feels off-topic and moving the point about early detection of important antigenic substitutions to immediately follow the description of the candidate vaccine development timeline. This new placement should clarify the direct causal relationship between early detection and better choices of vaccine candidates. The original discussion section read:

      "For example, virologists must choose potential vaccine candidates from the diversity of circulating clades well in advance of vaccine composition meetings to have time to grow virus in cells and eggs and measure antigenic drift with serological assays (Morris et al., 2018; Loes et al., 2024). Similarly, prospective measurements of antigenic escape from human sera allow researchers to predict substitutions that could escape global immunity (Lee et al., 2019; Greaney et al., 2022; Welsh et al., 2023). The finding of even a few sequences with a potentially important antigenic substitution could be enough to inform choices of vaccine candidate viruses."

      The new section (lines 386-391) now reads:

      "For example, virologists must choose potential vaccine candidates from the diversity of circulating clades months in advance of vaccine composition meetings to have time to grow virus in cells and eggs and measure antigenic drift with serological assays (Morris et al. 2018; Loes et al. 2024). Earlier detection of viral sequences with important antigenic substitutions could determine whether corresponding vaccine candidates are available at the time of the vaccine selection meeting or not."

      A few lines in the discussion on current approaches being used to add to just the HA sequence analysis of H3N2 viruses (ferret/human sera reactivity) would be welcome.

      We have added the following sentences to the last paragraph (lines 391-397) to note recent methodological advances in estimating influenza fitness and the relationship these advances have to timely genomic surveillance.

      "Newer methods to estimate influenza fitness use experimental measurements of viral escape from human sera (Lee et al., 2019; Welsh et al., 2024; Meijers et al., 2025; Kikawa et al., 2025), measurements of viral stability and cell entry (Yu et al., 2025), or sequences from neuraminidase, the other primary surface protein associated with antigenic drift (Meijers et al., 2025). These methodological improvements all depend fundamentally on timely genomic surveillance efforts and the GISAID EpiFlu database to identify relevant influenza variants to include in their experiments."

    1. eLife Assessment

      This manuscript reports on an FLIM-based calcium biosensor, G-CaFLITS. It represents an important contribution to the field of genetically-encoded fluorescent biosensors, and will serve as a practical tool for the FLIM imaging community. The paper provides convincing evidence of G-CaFLITS's photophysical properties and its advantages over previous biosensors such as Tq-Ca-FLITS. Although the benefits of G-Ca-FLITS over Tq-Ca-FLITS are limited by the relatively small wavelength shift, it presents some advantages in terms of compatibility with available instrumentation and brightness consistency.

    2. Reviewer #1 (Public review):

      Summary:

      van der Linden et al. report on the development of a new green-fluorescent sensor for calcium, following a novel rational design strategy based on the modification of the cyan-emissive sensor mTq2-CaFLITS. Through a mutational strategy similar to the one used to convert EGFP into EYFP, coupled with optimization of strategic amino acids located in proximity of the chromophore, they identify a novel sensor, G-CaFLITS. Through a careful characterization of the photophysical properties in vitro and the expression level in cell cultures, the authors demonstrate that G-CaFLITS combines a large lifetime response with a good brightness in both the bound and unbound states. This relative independence of the brightness on calcium binding, compared with existing sensors that often feature at least one very dim form, is an interesting feature of this new type of sensors, which allows for a more robust usage in fluorescence lifetime imaging. Furthermore, the authors evaluate the performance of G-CaFLITS in different subcellular compartments and under two-photon excitation in Drosophila. While the data appears robust and the characterization thorough, the interpretation of the results in some cases appears less solid, and alternative explanations cannot be excluded.

      Strengths:

      The approach is innovative and extends the excellent photophysical properties of the mTq2-based to more red-shifted variants. While the spectral shift might appear relatively minor, as the authors correctly point out, it has interesting practical implications, such as the possibility to perform FLIM imaging of calcium using widely available laser wavelengths, or to reduce background autofluorescence, which can be a significant problem in FLIM.

      The screening was simple and rationally guided, demonstrating that, at least for this class of sensors, a careful choice of screening positions is an excellent strategy to obtain variants with large FLIM responses without the need of high-throughput screening.

      The description of the methodologies is very complete and accurate, greatly facilitating the reproduction of the results by others, or the adoption of similar methods. This is particularly true for the description of the experimental conditions for optimal screening of sensor variants in lysed bacterial cultures.

      The photophysical characterization is very thorough and complete, and the vast amount of data reported in the supporting information is a valuable reference for other researchers willing to attempt a similar sensor development strategy. Particularly well done is the characterization of the brightness in cells, and the comparison on multiple parameters with existing sensors.

      Overall, G-CaFLITS displays excellent properties for a FLIM sensor: very large lifetime change, bright emission in both forms and independence from pH in the physiological range.

      Comment on revised version:

      The authors have significantly improved the manuscript, and overall I fully agree in maintaining the assessment as it is now.

    3. Reviewer #2 (Public review):

      Summary:

      Van der Linden et al. describe the addition of the T203Y mutation to their previously described fluorescence lifetime calcium sensor Tq-Ca-FLITS to shift the fluorescence to green emission. This mutation was previously described to similarly red-shift the emission of green and cyan FPs. Tq-Ca-FLITS_T203Y behaves as a green calcium sensor with opposite polarity compared with the original (lifetime goes down upon calcium binding instead of up). They then screen a library of variants at two linker positions and identify a variant with slightly improved lifetime contrast (Tq-Ca-FLITS_T203Y_V27A_N271D, named G-Ca-FLITS). The authors then characterize the performance of G-Ca-FLITS relative to Tq-Ca-FLITS in purified protein samples, in cultured cells, and in the brains of fruit flies.

      Strengths:

      This work is interesting as it extends their prior work generating a calcium indicator scaffold for fluorescent protein-based lifetime sensors with large contrast at a single wavelength, which is already being adopted by the community for production of other FLIM biosensors. This work effectively extends that from cyan to green fluorescence. While the cyan and green sensors are not spectrally distinct enough (~20-30nm shift) to easily multiplex together, it at least shifts the spectra to wavelengths that are more commonly available on commercial microscopes.

      The observations of organellar calcium concentrations were interesting and could potentially lead to new biological insight if followed up.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a variant of a previously described fluorescence lifetime sensor for calcium. Much of the manuscript describes the process of developing appropriate assays for screening sensor variants, and thorough characterization of those variants (inherent fluorescence characteristics, response to calcium and pH, comparisons to other calcium sensors). The final two figures show how the sensor performs in cultured cells and in vivo drosophila brains.

      Strengths:

      The work is presented clearly and the conclusion (this is a new calcium sensor that could be useful in some circumstances) is supported by the data.

      Weaknesses:

      There are probably few circumstances where this sensor would facilitate experiments (calcium measurements) that other sensors would prove insufficient.

      Comment on revised version:

      I think the manuscript has been significantly improved and I concur with the eLife Assessment statement.

      [Editors' note: There are no further requests by the reviewers. All of them expressed their approval of the new version of the manuscript.]

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      van der Linden et al. report on the development of a new green-fluorescent sensor for calcium, following a novel rational design strategy based on the modification of the cyan-emissive sensor mTq2-CaFLITS. Through a mutational strategy similar to the one used to convert EGFP into EYFP, coupled with optimization of strategic amino acids located in proximity of the chromophore, they identify a novel sensor, GCaFLITS. Through a careful characterization of the photophysical properties in vitro and the expression level in cell cultures, the authors demonstrate that G-CaFLITS combines a large lifetime response with a good brightness in both the bound and unbound states. This relative independence of the brightness on calcium binding, compared with existing sensors that often feature at least one very dim form, is an interesting feature of this new type of sensors, which allows for a more robust usage in fluorescence lifetime imaging. Furthermore, the authors evaluate the performance of G-CaFLITS in different subcellular compartments and under two-photon excitation in Drosophila. While the data appears robust and the characterization thorough, the interpretation of the results in some cases appears less solid, and alternative explanations cannot be excluded.

      Strengths:

      The approach is innovative and extends the excellent photophysical properties of the mTq2-based to more red-shifted variants. While the spectral shift might appear relatively minor, as the authors correctly point out, it has interesting practical implications, such as the possibility to perform FLIM imaging of calcium using widely available laser wavelengths, or to reduce background autofluorescence, which can be a significant problem in FLIM.

      The screening was simple and rationally guided, demonstrating that, at least for this class of sensors, a careful choice of screening positions is an excellent strategy to obtain variants with large FLIM responses without the need of high-throughput screening.

      The description of the methodologies is very complete and accurate, greatly facilitating the reproduction of the results by others, or the adoption of similar methods. This is particularly true for the description of the experimental conditions for optimal screening of sensor variants in lysed bacterial cultures.

      The photophysical characterization is very thorough and complete, and the vast amount of data reported in the supporting information is a valuable reference for other researchers willing to attempt a similar sensor development strategy. Particularly well done is the characterization of the brightness in cells, and the comparison on multiple parameters with existing sensors.

      Overall, G-CaFLITS displays excellent properties for a FLIM sensor: very large lifetime change, bright emission in both forms and independence from pH in the physiological range.

      Weaknesses:

      The paper demonstrates the application of G-CaFLITS in various cellular subcompartments without providing direct evidence that the sensor's response is not affected by the targeting. Showing at least that the lifetime values in the saturated state are similar in all compartments would improve the robustness of the claims.

      In some cases, the interpretation of the results is not fully convincing, leaving alternative hypotheses as a possibility. This is particularly the case for the claim of the origin of the strongly reduced brightness of G-CaFLITS in Drosophila. The explanation of the intensity changes of G-CaFLITS also shows some inconsistency with the basic photophysical characterization.

      While the claims generally appear robust, in some cases they are conveyed with a lack of precision. Several sentences in the introduction and discussion could be improved in this regard. Furthermore, the use of the signal-to-noise ratio as a means of comparison between sensors appears to be imprecise, since it is dependent on experimental conditions.

      We thank the reviewer for a thorough evaluation and for suggestions to improve our manuscript. We are happy with the recognition of the strengths of this work. The list with weaknesses has several valid points which will be addressed in a point-by-point reply and a revision.

      Reviewer #2 (Public review):

      Summary:

      Van der Linden et al. describe the addition of the T203Y mutation to their previously described fluorescence lifetime calcium sensor Tq-Ca-FLITS to shift the fluorescence to green emission. This mutation was previously described to similarly red-shift the emission of green and cyan FPs. Tq-Ca-FLITS_T203Y behaves as a green calcium sensor with opposite polarity compared with the original (lifetime goes down upon calcium binding instead of up). They then screen a library of variants at

      two linker positions and identify a variant with slightly improved lifetime contrast (TqCa-FLITS_T203Y_V27A_N271D, named G-Ca-FLITS). The authors then characterize the performance of G-Ca-FLITS relative to Tq-Ca-FLITS in purified protein samples, in cultured cells, and in the brains of fruit flies.

      Strengths:

      This work is interesting as it extends their prior work generating a calcium indicator scaffold for fluorescent protein-based lifetime sensors with large contrast at a single wavelength, which is already being adopted by the community for production of other FLIM biosensors. This work effectively extends that from cyan to green fluorescence. While the cyan and green sensors are not spectrally distinct enough (~20-30nm shift) to easily multiplex together, it at least shifts the spectra to wavelengths that are more commonly available on commercial microscopes.

      The observations of organellar calcium concentrations were interesting and could potentially lead to new biological insight if followed up.

      Weaknesses:

      (1) The new G-Ca-FLITS sensor doesn't appear to be significantly improved in performance over the original Tq-Ca-FLITS, no specific benefits are demonstrated.

      (2) Although it was admirable to attempt in vivo demonstration in Drosophila with these sensors, depolarizing the whole brain with high potassium is not a terribly interesting or physiological stimulus and doesn't really highlight any advantages of their sensors; G-Ca-FLITS appears to be quite dim in the flies.

      We thank the reviewer for a thorough evaluation and for suggestions to improve our manuscript. Although the spectral shift of the green variant is modest, we have added new data (figure 7) to the manuscript that demonstrates multiplex imaging of G-Ca-FLITS and Tq-Ca-FLITS.

      As for the listed weaknesses we respond here:

      (1) Although we agree that the performance in terms of dynamic range is not improved, the advantage of the green sensor over the cyan version is that the brightness is high in both states.

      (2) We agree that the performance of G-Ca-FLITS is disappointing in Drosophila. We feel that this is important data to report, and it makes it clear that Tq-Ca-FLITS is a better choice for this system. Depolarization of the entire brain was done to measure the maximal lifetime contrast.

      Reviewer #3 (Public review):

      Summary:

      The authours present a variant of a previously described fluorescence lifetime sensor for calcium. Much of the manuscript describes the process of developing appropriate assays for screening sensor variants, and thorough characterization of those variants (inherent fluorescence characteristics, response to calcium and pH, comparisons to other calcium sensors). The final two figures show how the sensor performs in cultured cells and in vivo drosophila brains.

      Strengths:

      The work is presented clearly and the conclusion (this is a new calcium sensor that could be useful in some circumstances) is supported by the data.

      Weaknesses:

      There are probably few circumstances where this sensor would facilitate experiments (calcium measurements) that other sensors would prove insufficient.

      We thank the reviewer for the evaluation of our manuscript. As for the indicated weakness, we agree that the main application of genetically encoded calcium biosensors is to measure qualitative changes in calcium. However, it can be argued that due to a lack of tools the absolute quantification has been very challenging. Now, thanks to large contrast lifetime biosensors the quantitative measurements are simplified, there are new opportunities, and the probe reported here is an improvement over existing probes as it remains bright in both states, further improving quantitative calcium measurements.

      Reviewer #1 (Recommendations for the authors):

      While the science in the paper appears solid, the methods well grounded and excellently documented, the manuscript would benefit from a revision to improve the clarity of the exposition. In particular:

      Part of the introduction appears like a patchwork of information with poor logical consequentiality. The authors rapidly pass from the impact of brightness on FLIM accuracy, to mitochondrial calcium in pathology, to the importance of the sensor's affinity, to a sentence on sensor's kinetics, to fluorescent dyes and bioluminescence, to conclude that sensors should be stable at mitochondrial pH. I highly recommend rewriting this part.

      We thank the referee for the comment and we have adjusted to introduction to better connect the parts and increase the logic. The updated introduction addresses all the feedback by the reviewers on different aspects of the introductory text, and we have removed the section on dyes and bioluminescence. We feel that the introduction is better structured now.

      The reference to particular amino acid positions would greatly benefit from including images of the protein structure in which the positions are highlighted, similar to what the same authors do in their fluorescent protein development papers. While in the case of sensors a crystal structure might be lacking, highlighting the positions with respect to an AlphaFold-generated structure or the structure of mTq2 might still be helpful.

      We appreciate this remark and we have added a sequence alignment of the FLITS probes to supplemental Figure S4. This shows the residues with number, and we have also highlighted the different domains, linkers and mutations. We think that this linear representation works better than a 3D structure (one issue is that alphafold fails to display the chromophore and it has usually poor confidence for linker residues).

      The use of SNR, as defined by the authors (mean of the lifetime divided by standard deviation) appears a poorly suited parameter to compare sensors, as it depends on the total number of collected photons and on the strength of the algorithms used to retrieve the lifetime value. In an extreme example, if one would collect uniform images with millions of photons per pixel, most likely SNR would be extremely good for all sensors in all states, irrespective of the fact that some states are dimmer (within reasonable limits). On the other hand, if the same comparison would be performed at a level of thousands or hundreds of photons per pixel, the effect of different brightness on the SNR would be much more dramatic. While in general I fully agree with the core concept of the paper, i.e. that avoiding low-brightness forms leads more easily to experiments with higher SNR, I would suggest to stick to comparing the sensors in terms of brightness and refer to SNR (if needed) only when describing the consequences on measurements.

      The reviewer is right that in absolute terms the SNR is not meaningful. In addition to acquisition time, it depends on expression levels. Yet, it is possible to compare the change in SNR between the apo- and saturated states, and that is what is shown in figure 5. We have added text to better explain that the change in SNR is relevant here:

      “The absolute SNR is not relevant here, as it will depend on the expression level and acquisition time. But since we have measured the two extremes in the same cells, we can evaluate how the SNR changes between these states for each separate probe”

      Some statements from the authors or aspects of the paper appear problematic:

      (1) "Additionally, the fluorescence of most sensors is a non-linear function of calcium concentration, usually with Hill coefficients between 2 and 3. This is ideal when the probe is used as a binary detector for increases in Ca2+ concentrations, but it makes robust quantification of low, or even intermediate, calcium concentrations extremely challenging."

      To the best of my knowledge, for all sensors the fluorescence response is a nonlinear function of calcium concentrations. If the authors have specific examples in mind in which this is not true, they should cite them specifically. Furthermore, the Hill coefficient defines the range of concentrations in which the sensor operates, while the fact that "low concentrations" might be hard to detect depends only on the dim fluorescence of some sensors in the unbound form.

      We agree with the reviewer that this part is not clearly written and confusing, as the sentence “Additionally, the fluorescence of most sensors is a non-linear function of calcium concentration, usually with Hill coefficients between 2 and 3” was not relevant in this section and so we removed it. Now it reads:

      “Many GECIs harboring a single fluorescent protein (FP), like GCaMPs, are optimized for a large intensity change, and have a (very) dim state when calcium levels are below the KD of the probe (Akerboom et al., 2013; Dana et al., 2019; Shen et al., 2018; Zhang et al., 2023; Zhao et al., 2011). This is ideal when the probe is used as a binary detector for increases in Ca2+ concentrations, but it makes robust quantification of low, or even intermediate, calcium concentrations extremely challenging”

      (2) "The affinity of a sensor is of major importance: a low KD can underestimate high concentrations and vice versa."

      It is not clear to me why the concentrations would be underestimated, rather than just being less precise. Also, if a calibration curve is plotted in linear scale rather than logarithmic scale, it appears that the precision problem is much more severe near saturation (where low lifetime changes result in large concentration changes) than near zero (where low concentration changes produce large lifetime changes).

      We agree that this could be better explained, what we meant to say that concentrations that are ~10x lower or higher than the KD cannot be precisely measured. See also our reply to the next comment.

      (3) "Differences can also arise due to the method of calibration, i.e. when the absolute minimum and maximum signal are not reached in the calibration procedure (Fernandez-Sanz et al., 2019)."

      Unless better explained, this appears obvious and not worth mentioning.

      What may be obvious to the reviewer (and to us) may not be obvious to the reader, and that’s why this is included. To make it clearer we rephrased this part as a list of four items:

      “Accurate determination of the affinity of a sensor is important and there are several issues that need to be considered during the calibration and the measurements: (i) the concentrations can only be measured with sufficient precision when it is in the range between 10x K<sub>D</sub> and 1/10x K<sub>D</sub>, (ii) the calibration is only valid when the two extremes are reached during the calibration procedure (Fernandez-Sanz et al., 2019), (iii) the sensor’s kinetics should be sufficiently fast enough to be able to track the calcium changes, and (iv) the biosensor should be compatible with the high mitochondrial pH of 8 (Cano Abad et al., 2004; Llopis et al., 1998).”

      (4) In the experiments depicted in Figure 6C the underlying assumption is that the sensor behaves in the same way independently of the compartment to which it is targeted. This is not necessarily the case. It would be valuable to see the plots of Figure 6C and D discussed in terms of lifetime. Is the saturating lifetime value the same in all compartments?

      This is a valid point and we have now included a plot with the actual lifetime data for each of the organelles (figure S15). 

      We have also added text to discuss this point: “We note that the underlying assumption of the quantification of organellar calcium concentrations is that the lifetime contrast is the same. This is broadly true for most of the measurements (Figure S15). Yet, there are also differences. It is currently unclear whether the discrepancies are due to differences in the physicochemical properties of the compartments, or whether there is a technical reason (the efficiency of ionomycin for saturating the biosensor in the different compartments is unknown, as far as we know). This is something that is worth revisiting. A related issue that deserves attention is the level of agreement between in vitro and in vivo calibrations.”

      (5) A similar problem arises for the observation of different calcium levels in peripheral mitochondria. In figure S11b, the values of the two lifetime components of a biexponential fit are displayed. Both the long and short components seem to be different. This is an interesting observation, as in an ideal sensor (in which the "long lifetime conformation" is the same whether the sensor is bound to the analyte or not, and similarly for the short lifetime one) those values should be identical. While it is entirely possible that this is not the case for G-CaFLITS, since the authors have conducted a calibration experiment using time-domain FLIM, could they show the behavior of the lifetimes and preamplitudes? Are the trends consistent with their interpretation of a different calcium level in the two mitochondrial populations?

      We have analyzed the calibration data from TCSPC experiments done with the Leica Stellaris. From these data (acquired at high photon counts as it is purified protein in solution), we infer that both the short and long lifetime do change as a function of calcium concentration. In particular the long lifetime shows a substantial change, which we cannot explain at this moment. We agree that this is interesting and may potentially give insight in the conformation changes that give rise to the lifetime change.

      The lifetime data of the mitochondria has been acquired with a different FLIM setup, but the trend is consistent, both the long and short lifetime decrease in the peripheral mitochondria that have a higher calcium concentration.

      Author response image 1

      (6) "The lifetime response of Tq-Ca-FLITS and the ΔF/F response of jGCaMP7f resembled each other, with both signals gradually increasing over the span of 3-4 minutes after we increased external [K+]; the two signals then hit a plateau for ~1 min, followed by a return to baseline and often additional plateaus (Figure 8B-C). By comparison, G-Ca-FLITS responses were more variable, typically exhibiting a smaller ramping phase and seconds-long spikes of activity rather than minutes-long plateaus (Figure 8C)."

      This statement does not appear fully consistent with the data in Figure 8. While in figure 8B it looks like GCaMP and mTq-CaFLITS have very similar profiles, these curves come from one single experiment out of a very variable dataset (see Figure 8C). If one would for example choose the second curve of GCaMP in Figure 8C, it would look very similar to the response of G-CaFLITS in figure 8B, and the argument would be reversed. How do the averages look like?

      Indeed, the dynamics of the responses are very variable and we do not want to draw attention to these differences in the dynamics, so we have removed the comparison. Instead, the difference in intensity change and lifetime contrast are of importance here. To answer the question of the reviewer, we have added a new panel (D) which shows the average responses for each of the GECIs.  

      (7) "Although the calibration is equipment independent under ideal conditions, and only needs to be performed once, we prefer to repeat the calibration for different setups to account for differences in temperature or pulse frequency."

      While I generally agree with the statement, it is imprecise. A change in temperature is generally expected to affect the Kd, so rather than "preferring to repeat", it is a requirement for accurate quantification at different concentrations. I am not sure I understand what the pulse frequency is in this context, and how it affects the Kd.

      We thank the referee for pointing out that our text is imprecise and confusing. What we meant to say is that we see differences between different set-ups and we have clarified this by changing the text. We have also added that it is “necessary” to repeat the calibration:

      “Although the calibration is equipment independent under ideal conditions, and only needs to be performed once, we do see differences between different set-ups. Therefore, it is necessary to repeat the calibration for different set-ups.”

      (8) "A recent effort to generate a green emitting lifetime biosensor used a GFP variant as a template (Koveal et al., 2022), and the resulting biosensor was pH sensitive in the physiological range. On the other hand, biosensors with a CFP-like chromophore are largely pH insensitive (van der Linden et al., 2021; Zhong et al., 2024)."

      The dismissal of the use of T-Sapphire as a pH independent template is inaccurate. The same group has previously reported other sensors (SweetieTS for glucose and Peredox for redox ratio) that are not pH sensitive. Furthermore, in Koveal et al. also many of the mTq2-based variants showed a pH response, suggesting that the pHdependence for the Lilac sensor might be more complex. Still, G-CaFLITS present advantages in terms of the possibility to excite at longer wavelengths, which could be mentioned instead.

      We only want to make the point that adding the T203Y mutation to Turquoise-based lifetime biosensors may be a good approach for generating pH insensitive green biosensors. There is no point in dismissing other green biosensors and we have changed the text to: “Since biosensors with a CFP-like chromophore are largely pH insensitive (van der Linden et al., 2021; Zhong et al., 2024), and we show here that the pH independence is retained for the Green Ca-FLITS, we expect that adding the T203Y mutation to a cyan sensor is a good approach for generating pH-insensitive green lifetime-based sensors.”

      (9) "Usually, a higher QY results in a higher intensity; however, in G-Ca-FLITS the open state has a differential shaped excitation spectrum which leads to a decreased intensity. These effects combined have resulted in a sensor where the two different states have a similar intensity despite displaying a large QY and lifetime contrast."

      This statement does not seem to reflect the excitation spectra of Figure 1. If this explanation would be true, wouldn't there be an isoemissive point in the excitation spectrum (i.e. an excitation wavelength at which emission intensity would not change)?

      The excitation spectra in figure 1 are not ideal for the interpretation as these are not normalized. The normalized spectra are shown in figure S10, but for clarity we show the normalized spectra here below as well. For the FD-FLIM experiments we used a 446 nm LED that excites the calcium bound state more efficiently. Therefore, the lower brightness due to a lower QY of the calcium bound state is compensated by increased excitation. So the limited change in intensity is excitation wavelength dependent. We have added a sentence to the discussion to stress this:

      “The smallest intensity change is obtained when the calcium-bound state is preferably excited (i.e. near 450 nm) and the effect is less pronounced when the probe is excited near its peak at 474 nm”   

      (10) "We evaluated the use of Tq-Ca-FLITS and G-Ca-FLITS for 2P-FLIM and observed a surprisingly low brightness of the green variant in an intact fly brain. This result is consistent with a study finding that red-shifted fluorescent-protein variants that are much brighter under one-photon excitation are, surprisingly, dimmer than their blue cousins in multi-photon microscopy (Molina et al., 2017). The responses of both probes were in line with their properties in single photon FLIM, but given the low brightness of G-Ca-FLITS under 2-photon excitation, the Tq-Ca-FLITS may be a better choice for 2P-FLIM experiments."

      The differences appear strikingly high, and it seems improbable that a reduction in two-photon absorption coefficient might be the sole cause. How can the authors rule out a problem in expression (possibly organism-specific)?

      The reviewers are correct that the changes in brightness between G-Ca-FLITS and Tq-Ca-FLITS may arise from changes in expression levels. It is difficult to calibrate for these changes explicitly without a stable reference fluorophore. However, both the G-Ca-FLITS and Tq-Ca-FLITS transgenic flies produced used the same plasmid backbone (the Janelia 20x-UAS-IVS plasmid), landed in the same insertion site (VK00005) of the same genetic background and were crossed to the same Janelia driver line (R60D05-Gal4), so at the level of the transcriptional machinery or genetic regulatory landscape the two lines are probably identical except for the few base pair differences between the G-Ca-FLITS and Tq-Ca-FLITS sequence. But the same level of transcription may not correspond to the same amount of stable protein in the ellipsoid body. So, we cannot rule out any organism-specific problems in expression. To examine the 2P excitation efficiency relative to 1P excitation efficiency, we have measured the fluorescence intensity of purified G-Ca-FLITS and Tq-Ca-FLITS on beads. See also response to reviewer 3 and supplemental figure S14

      Suggestions

      (1) The underlying assumption of any experiment using a biosensor is that the concentration of the biosensor should be roughly 2 orders of magnitude lower than the concentration of the analyte, otherwise the calibration equations do not hold. When measuring nM concentrations of calcium, this problem can be in principle very significant, as the concentration of the sensor in cells is likely in the low micromolar range. Calcium regulation by the cell should compensate for the problem, and the equations should hold. However, this might not hold true during experimental conditions that would disrupt this tight regulation. It might be a good thing to add a sentence to inform users about the limitations in interpreting calcium concentration data under such conditions.

      Good point. We have added this to the discussion: “All calcium indicators also act as buffers, and this limits the accuracy of the absolute measurements, especially for the lower calcium concentrations (Rose et al., 2014), as the expression of the biosensor is usually in the low micromolar range.”

      (2) Different methods of lifetime "averaging", such as intensity or amplitude-weighted lifetime in time domain FLIM or phase and modulation in frequency domain might lead to different Kd in the same calibration experiment. This is an underappreciated factor that might lead to errors by users. Since the authors conducted calibrations using both frequency and time-domain, it would be useful to mention this fact and maybe add a table in the Supporting Information with the minima, maxima and Kds calculated using different lifetime averaging methods.

      To avoid biases due to fitting we prefer to use the phasor plot, this can be used for both frequency and time-domain methods and we added a sentence to the discussion to highlight this: “We prefer to use the phasor analysis (which can be used for both frequency- and time-domain FLIM), as it makes no assumptions about the underlying decay kinetics.”

      (3) The origin of the redshift observed in G-CaFLITS is likely pi-stacking, similar to the EGFP-to-EYFP case. While previous studies suggest that for mTq2 based sensors a change in rigidity would lead to a change in the non-radiative rate, which would result in similar changes in quantum yield and (amplitude-weighted average) lifetime. If pi-stacking plays a role, there could be an additional change in the radiative rate (as suggested also by the change in absorption spectra). Could this play a role in the relation between brightness and lifetime in G-CaFLITS? Given the extensive data collected by the authors, it should be possible to comment on these mechanistical aspects, which would be useful to guide future design.

      We do appreciate this suggestion, but we currently do not have the data to answer this question. The inverted response that we observe, solely due to the introduction of the tyrosine is puzzling. Perhaps introduction of the mutation that causes the redshift in other cyan probes will provide more insight.

      Reviewer #2 (Recommendations for the authors):

      Specific points:

      The first section of Results is basically a description of how they chose the lysis conditions for screening in bacteria. I didn't see anything particularly novel or interesting about this, anyone working with protein expression in bacteria likely needs to optimize growth, lysis, purification, etc. This section should be moved to the Methods.

      As reviewer 1 lists the thorough documentation of this approach as one of the strengths, we prefer to keep it like this. We see this section as method development, rather than purely a method. When this section would be moved to methods, it remains largely invisible and we think that’s a shame. Readers that are not interested can easily skip this section.

      In the Results section Characterization of G-Ca-FLITS, the authors state "Here, the calcium affinity was KD = 339 nM, higher compared to the calibration at 37{degree sign}C. This is in line with the notion that binding strength generally increases with decreasing temperature." However, the opposite appears to be true - at 37C they measured a KD of 209 nM which would represent higher binding strength at higher temperature.

      Thanks for catching this, we’ve made a mistake. We rephrase this to “higher compared to the calibration at 37 ˚C. This is unexpected as it not in line with the notion that binding strength generally increases with decreasing temperature.”

      In Figure 8c, there should be a visual indicator showing the onset of application of high potassium, as there is in 8b.

      This is a good suggestion; a grey box is added to indicates time when high K+ saline was perfused.

      Reviewer #3 (Recommendations for the authors):

      I think the science of the manuscript is sound and the presentation is logical and clear. I have some stylistic recommendations.

      Supp Fig 1: The figure requires a bit of "eyeballing" to decide which conditions are best, and figuring out which spectra matched the final conditions took a little effort. Is there a way to quantify the fluorescence yield to better show why the one set of conditions was chosen? If it was subjective, then at least highlight the final conditions with a box around the spectra, making it a different colour, or something to make it stand out.

      Thanks for the comment; we added a green box.

      Supp Fig 3: Similar suggestion. Highlight the final variant that was carried forward (T203Y). The subtle differences in spectra are hard to discern when they are presented separately. How would it look if they were plotted all on one graph? Or if each mutant were presented as a point on a graph of Peak Em vs Peak Ex? Would T203Y be in the top right?

      We have added a light blue box for reference to make the differences clearer.

      Supp Fig 4 & Fig 1: Too much of the graph show the uninteresting tails of the spectra and condenses the interesting part. Plotting from 400 nm to 600 nm would be more informative.

      We appreciate the suggestion but disagree. We prefer to show the spectra in its entirety, including the tails. The data will be available so other plots can be made by anyone.

      Fig 3a: People who are not experts in lifetime analysis are probably not very familiar with the phase/modulation polar plot. There should be an additional sentence or two in the main text that _briefly_ describes the basis for making the polar plot and the transformation to the fractional saturation plot in 3B. I can't think of a good way to transform Eq 3 from Supp Info into a sentence, but that's what I think is needed to make this transformation clearer.

      We appreciate the suggestion and feel that it is well explained here:

      "The two extreme values (zero calcium and 39 μM free calcium) are located on different coordinates in the polar plot and all intermediate concentrations are located on a straight line between these two extremes. Based on the position in the polar plot, we determined the fraction of sensor in the calcium-bound state, while considering the intensity contribution of both states"  

      Fig 4: The figure is great, and I love the comparison of different calcium sensors. But where is Tq-Ca-FLITS? I get that this is a figure of green calcium sensors, but it would be nice to see Tq-Ca-FLITS in there as well. The G-Ca-FLITS is compared to Tq-Ca-FLITS in Fig 5. Maybe I'm just missing why the bottom panel of Fig 5 cannot be replotted and included in Fig 4.

      The point is that we compare all the data with identical filter sets, i.e. for green FPs.using these ex/em settings, the Tq probe would seriously underperform. Note that the data in fig. 5 is not normalized to a reference RFP and can therefore not be compared with data presented in figure 4.

      Fig 6: The BOEC data could easily be moved to Supp Figs. It doesn't contribute much relevant info.

      We are not keen of moving data to supplemental, as too often the supplemental data is ignored. Moreover, we think that the BOEC data is valuable (as BOEC are primary cells and therefore a good model of a healthy human cell) and deserves a place in the main manuscript.

      2P FLIM / Fig 8 / Fig S4: The lack of brightness of G-Ca-FLITS in the 2P FLIM of fruit fly brain could have been predicted with a 2P cross section of the purified protein. If the equipment to perform such measurements is available, it could be incorporated into Fig S4.

      Unfortunately, we do not have access to equipment that measures the 2P cross section. As an alternative, we compared the 2P excitation efficiency with 1P excitation efficiency. To this end, we have used beads that were loaded with purified G-Ca-FLITS or Tq-Ca-FLITS. We have evaluated the fluorescence intensity of the beads using 1P (460 nm) and 2P (920 nm) excitation. Although the absolute intensity cannot be compared (the G-Ca-FLITS beads have a lower protein concentration), we can compare the relative intensities when changing from 1P to 2P. The 2P excitation efficiency of G-Ca-FLITS is comparable (if not better) to that of Tq-Ca-FLITS. This excludes the option that the G-Ca-FLITS has poor 2P excitability. We will include this data as figure S12.

      We also have added text to the results: “We evaluated the relative brightness of purified Tq-Ca-FLITS and G-Ca-FLITS on beads by either 1-Photon Excitation (1PE) (at 460 nm) or 2-Photon Excitation (2PE) (at 920 nm) and observed a similar brightness between the two modes of excitations (figure S14). This shows that the two probes have similar efficiencies in 2PE and suggest that the low brightness of GCa-FLITS in Drosophila is due to lower expression or poor folding.” and discussion: “The responses of both probes were in line with their properties in single photon FLIM, but given the low brightness of G-Ca-FLITS under 2-photon excitation in Drosphila, the Tq-Ca-FLITS is a better choice in this system. Yet, the brightness of G-Ca-FLITS with 2PE at 920 nm is comparable to Tq-Ca-FLITS, so we expect that 2P-FLIM with G-Ca-FLITS is possible in tissues that express it well.”

    1. eLife Assessment

      The manuscript by Mancl et al. provides important mechanistic insights into the conformational dynamics of Insulin Degrading Enzyme (IDE), a zinc metalloprotease involved in the clearance of amyloid peptides. In the revised version, the authors have substantially expanded their analysis by incorporating time-resolved cryo-EM and coarse-grained molecular dynamics simulations, which reveal an insulin-induced allosteric transition and transient β-sheet interactions underlying IDE's unfoldase activity. Supported by a convincing combination of cryo-EM, SEC-SAXS, enzymatic assays, and both all-atom and coarse-grained simulations, this work refines our understanding of IDE's functional cycle and offers a structural framework for developing substrate-selective modulators of M16 metalloproteases.

    2. Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Mancl et al. present cryo-EM structures of the Insulin Degrading Enzyme (IDE) dimer and characterize its conformational dynamics by integrating structures with SEC-SAXS, enzymatic activity assays, and all-atom molecular dynamics (MD) simulations. They present five cryo-EM structures of the IDE dimer at 3.0-4.1 Å resolution, obtained with one of its substrates, insulin, added to IDE in a 1:2 ratio. The study identified R668 as a key residue mediating the open-close transition of IDE, a finding supported by simulations and experimental data. The work offers a refined model for how IDE recognizes and degrades amyloid peptides, incorporating the roles of IDE-N rotation and charge-swapping events at the IDE-N/C interface. 

      Strengths: 

      The study by Mancl et al. uses a combination of experimental (cryoEM, SEC-SAXS, enzymatic assays) and computational (MD simulations, multibody analysis, 3DVA) techniques to provide a comprehensive characterization of IDE dynamics. The identification of R668 as a key residue mediating the open-to-close transition of IDE is a novel finding, supported by both simulations and experimental data presented in the manuscript. The work offers a refined model for how IDE recognizes and degrades amyloid peptides, incorporating the roles of IDE-N rotation and chargeswapping events at the IDE-N/C interface. The study identifies the structural basis and key residues for IDE dynamics that were not revealed by static structures. 

      Weaknesses: 

      Based on MD simulations and enzymatic assays of IDE, the authors claim that the R668A mutation in IDE affects the conformational dynamics governing the open-closed transition, which leads to altered substrate binding and catalysis. The functional importance of R668 would be substantiated by enzymatic assays that included some of the other known substrates of IDE than insulin such as amylin and glucagon. 

      We have included amyloid beta in our enzymatic assays, as shown in Figure 5D, and have updated the manuscript text accordingly. The R668A mutation results in a loss of dose-dependent competition with amyloid beta, but not with insulin. To further substantiate this unexpected finding, we plan to undertake a comprehensive biochemical characterization of the R668A mutation across a variety of substrates, followed by structural analysis of this mutant. However, these investigations are beyond the scope of the current study and, if successful, warrant a separate publication.

      It is unclear to what extent the force field (FF) employed in the MD simulations favors secondary structures and if the lack of any observed structural changes within the IDE domains in the simulations - which is taken to suggest that the domains behave as rigid bodies - stems from bias by the FF. 

      We utilized the widely adopted CHARMM36 force field, whose parameters have been validated by thousands of previous studies. As shown in Figure 2A, our simulations reveal small but noticeable fluctuations in intradomain RMSD values. However, after careful examination, we found that these changes do not correspond to any biologically meaningful motions based on previously reported structural and biophysical characterizations of IDE (e.g., Shen et al., Nature 2006; Noinaj et al., PLOS One 2011; McCord et al., PNAS 2013; Zhang et al., eLife 2018, and references therein).

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed-state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by Cdomains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. The authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography, and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from a high degree of intrinsic motion among the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. A total of five structures were generated by cryo-EM. The authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involve R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complemented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay the ground for future exploration of the protease structure-function relationship. 

      Reviewer #1 (Recommendations for the authors): 

      The manuscript reads well, however, there are minor details throughout that would tighten it up and, in some cases, make it easier to approach for a broader readership: 

      Abstract 

      (1) R668 is referred to by its one-letter code throughout the main text but referred to as arginine-668 in the abstract. The abstract should be corrected to R668. 

      This has been corrected.

      (2) The authors should consider reordering the significance of their work as it is listed at the end of the abstract. As the work first and foremost "offers the molecular basis of unfoldase activity of IDE and provides a new path forward towards the development of substrate-specific modulators of IDE activity" these should come before "the power of integrating experimental and computational methodologies to understand protein dynamics". 

      We have revised abstract substantially to incorporate the new findings. Consequently, the sentence for "the power of integrating experimental and computational methodologies to understand protein dynamics" has been removed.  

      Main text 

      (1) Cryo-EM is consistently referred to as cryoEM throughout the text. The commonly accepted format for referring to cryogenic electron microscopy is cryo-EM. The authors are asked to consider revising the text accordingly. 

      The text has been revised.

      (2) Introduction: The authors are asked to consider including a figure (panel) that provides the general reader with an overview of IDE architecture and topology as a point of reference in the introduction to understanding the pseudo symmetry in IDE, domains, and IDE-C relative to IDE-N, etc. This is relevant for reading most of the figures. 

      We have added a new figure 1 to provide the background and questions to be answered.

      (3) The authors should consider renaming some of the headers in the results section to include the main conclusion. For instance, "CryoEM structures of IDE in the presence of a sub-saturating concentration of insulin" is not really helpful for the reader to understand the work, while "R668A mediates IDE conformational dynamics in vitro" is. 

      The headings have been altered in an effort to be more informative.

      (4) It is unclear what the timescale for insulin cleavage is for IDE. Clearly, it is possible for the authors to capture an insulin-bound IDE from within the 7 million particles, but what is the chance of this? The authors emphasize the IDE:insulin ratio relative to previous experiments, but surely the kinetics would be the same in the two experiments that were presumably set up exactly the same way. In the context of this, the authors should disclose how concentrations were estimated experimentally. The authors are encouraged to touch upon the subject of time scales to tie up cryo-EM and enzyme experiments with MD simulations. 

      Both reviewers posted the question about time-scale relevant to IDE catalysis. In response to this request, we have revised the manuscript to address the relevance of key kinetic timescales. Specifically, we now discuss the open/closed transition (~0.1 second) and insulin cleavage (~2/sec), both established experimentally in prior studies (McCord et al PNAS 2013). 

      IDE concentrations were determined by spectrometry (Nanodrop and/or Bradford assay), and its purity was confirmed to be greater than 90% by SDS-PAGE. Insulin was purchased commercially, weighed, and dissolved in buffer, with its concentration subsequently verified using Nanodrop. Catalytically inactive IDE and insulin were mixed and incubated for at least 30 minutes. Given IDE’s low nanomolar affinity for insulin, and the sub-stoichiometric insulin concentrations used, sufficient time was allowed for insulin to bind IDE and remain bound.

      To distinguish between IDE’s unfoldase and protease activities, all structural analyses were performed in the presence of EDTA, which chelates catalytic zinc, thereby inactivating IDE. This approach inhibits the enzyme’s catalytic cycle and allows us to capture the fully unfolded state of insulin bound to IDE in its closed conformation, representing the endpoint of the reaction. Under these conditions, the only meaningful kinetic parameter available for investigation was the unfolding of insulin by IDE.

      To elaborate the interaction between IDE and insulin in the catalytically relevant time regime, we investigated IDE–insulin interactions within the millisecond time regime by rapidly mixing IDE with a large molar excess of insulin for approximately 120 milliseconds for the cryo-EM single particle analysis. Under these conditions, we observed that both IDE subunits in the dimer predominantly adopt open states, which are distinct from those previously reported. This observation suggests a potential mechanism of allostery in IDE function. 

      (5) It should be included in the main text that the data was processed with C1 symmetry and not just in Table 1. This is more useful information for understanding the study than the number of micrographs.  

      We have stated that the data was processed with C1 symmetry at the start of the results section.

      (6) The authors should consider adding speculation on what the approximately 6 million particles that did not yield a high-resolution structure represent. 

      In cryo-EM single particle analysis, particle selection is typically performed automatically using software such as Relion. Due to the low signal-to-noise ratio, many “junk particles”—originating from contaminants such as ice, impurities, aggregates, or incomplete particles—are inevitably included along with the particles of interest. It is standard practice to filter out these junk particles during data processing. In our case, we estimate that the majority of the 6 million particles are likely junk. However, we cannot fully exclude the possibility that some of these particles may originate from IDE and carry potentially useful information about its conformational heterogeneity. Nonetheless, current cryo-EM single particle analysis methods face significant challenges in objectively recovering and interpreting such particles.

      Reviewer #2 (Recommendations for the authors): 

      I have some minor comments regarding the manuscript which are given below. 

      (1) For O/O state, it will be great to see an explanation regarding why the values are dissimilar for 0.5 and 0.143 FSC. 

      All of our IDE structures (including previously published data) demonstrate a dip/plateau at moderate resolution in their FSCs. We interpret this an indicator of structural heterogeneity, as the dip/plateau is smallest in the pC/pC state, becomes larger when one of the subunits is open, and is largest when both subunits are open. Because both subunits within the O/O state are highly heterogeneous, the FSC dipped below the 0.5 threshold. Other states, such as the O/pO, display the same FSC trend, the dip remains slightly above the 0.5 threshold.

      (2) O/pO state is moderately resolved at 4.1 Å, but this state is populated with many particles (328,870). Can the resolution be improved by more extensive sorting of heterogenous particles which intrinsically causes misalignment amongst particles? 

      Unfortunately, no. As shown by the local resolution maps in Figure 1-figure supplement 1, the primary source of misalignment is the IDE-N region in the open subunit. We have found that IDE-N is nearly unconstrained in its conformational flexibility in the open state, and does not appear to adopt discrete states, our attempts to better classify particles have failed. We speculate that this may be a failing in kmeans cluster based classification, and this is part of the driving force behind our exploration of advanced methods of heterogeneity analysis.

      (3) Given the observation that capturing a substrate-bound open state is difficult, it can be assumed that the substrate capture in the catalytic cleft is a fast event. Please comment on the possible time frame of unfolding of substrate and catalysis. Can authors comment on any cryo-EM experiments that can deal with such a short time frame? If there is a possibility to include data from such experiments, then it may be considered.

      This has been addressed in conjunction with the previous reviewer’s comment (see above). Specifically, we now discuss the open/closed transition (~0.1 second) and insulin cleavage (~2/sec), both established experimentally in prior studies. Additionally, we investigated IDE–insulin interactions by rapidly mixing IDE with a large molar excess of insulin for approximately 120 milliseconds for the cryo-EM single particle analysis. Under these conditions, we observed that both IDE subunits in the dimer predominantly adopt open states, which are distinct from those previously reported. This observation suggests a potential mechanism of allostery in IDE function. 

      (4) How long was incubation time after adding any substrates, such as insulin? Can different incubation times be tested to generate additional information regarding other conformational states that lie in between open and closed states?  

      The incubation time for IDE with insulin prior to cryo-EM grid freezing was approximately 30 minutes. We agree that it would be exciting to explore shorter time frames to identify new conformational states. As discussed above, we have rapidly mixed IDE with a large molar excess of insulin for approximately 120 milliseconds for the cryo-EM single particle analysis. Under these conditions, we observed that both IDE subunits in the dimer predominantly adopt open states, which are distinct from those previously reported. This observation suggests a potential mechanism of allostery in IDE function.

      (5) A complex network of hydrogen bonding interaction initiated by R668 latching onto N-domain is mentioned in MD simulation studies but it is not clear why cryo-EM experiments did not capture such stabilized structures. 

      We believe that two main factors have prevented us from observing the hydrogen bonding network in our cryo-EM structures. The first factor is the requirement to freeze the sample in liquid ethane. According to the second law of thermodynamics, lowering the temperature reduces the effect of entropy. Our findings suggest that residue R668 interacts with several neighboring residues through a network of polar and electrostatic interactions, rather than being limited to a single partner. These interactions facilitate both the open-closed transitions and rotational movements between IDE-N and IDE-C. From a thermodynamic perspective, these interactions have both enthalpic and entropic components, and cooling the sample diminishes the entropic contribution. In line with this, we observe that the closed-state domains in our cryo-EM studies are positioned closer together than in our MD simulations, though not as tightly as in crystal structures of IDE. This implies that cryogenic data collection may constrain the interface between IDE-N and IDE-C, which can further alter the equilibrium for the network of R668 mediated interactions.

      Secondly, our cryo-EM structures represent ensemble averages of tens to hundreds of thousands of particles. MD simulations indicate that IDE-N and IDE-C can rotate relative to one another, resulting in considerable variability in residue interactions. However, the level of particle density in our cryo-EM data does not permit sufficiently fine classification to resolve these differences. As a result, distinct hydrogen bonding networks are likely averaged out in the ensemble structure, particularly in the case of R668, which is indicated to interact with multiple neighboring residues in the conformation-dependent manner. This averaging effect may also contribute to our inability to achieve resolutions below 3 Å.

      (6) Despite the observation that IDE is an intrinsically flexible protein, it seems probable that differently-sized substrates might reveal additional interaction networks formed by other novel key players apart from just R668. Will it be helpful to first try this computationally using MD simulations and then try to replicate this in cryo-EM experiments? If needed, additional simulation time may be added to the MD analysis. Please comment!  

      We agree that this is an exciting avenue to explore. Doubly so when considered in light of our R668A enzymatic results with amyloid beta. However, several challenges must be overcome before we can explore this direction effectively:

      (1) We lack experimental knowledge of the initial interaction event between IDE and substrate. All substrate-bound IDE structures have been obtained after unfolding and positioning for cleavage has occurred. Without a solid foundational model for the initial interaction event between IDE and substrate, the interpretation of subsequent MD simulations is open to question.

      (2) We have previously observed minimal effect of substrate on IDE in all-atom MD simulations. We believe that observable effects would require a much longer time scale than is currently achievable with all-atom MD, so have turned to Upside, a coarse-grained method to overcome these limitations, but Upside handles side chains with presumptive modeling, which prevent the identification of potential novel residue interactions.

      (3) Due to the conformational heterogeneity present within IDE cryo-EM datasets, we struggle to obtain sufficient resolution to clearly identify side chain interactions at the domain interface (see response to 5).

      Given these challenges, we plan to explore these directions in future manuscripts.

      (7) What is the possibility of water interaction networks and dynamism in this network to contribute to the overall dynamics of the protein in the presence and absence of substrates? How symmetric these networks be in the four domains of dimeric IDE? 

      This is an interesting idea that we have begun to explore, but consider to be outside the scope of this work. Currently, we do not have any MD simulations containing substrate with explicit solvent (Upside uses implicit solvent), and solvent atoms were removed from our all-atom simulations prior to analysis to speed up processing. That being said, preliminary WAXS data suggests that there may be a difference in water interaction interfaces between WT and R668A IDE, and this is a lead we plan to pursue in future work.

      (8) Line 214: Please fix the typo which wrongly describes closed = pO. 

      This is not a typo, but it is confusing. The pO state has previously been defined as the closed state of IDE lacking bound substrate as determined by cryo-EM. This differentiates the pO state from the pC state, where the pC state contains density indicative of bound substrate. As the MD simulations were conducted with the apo-state, the closed state the simulations were initialized from was the pO state structure, which represents the substrate-free closed state as determined by cryo-EM. We realize that this difference is probably unnecessary to the majority of readers, and have removed the (pO) specificity to avoid confusion.

      (9) It is not clear why a cryo-EM structure was not attempted for the R668A mutant. If the authors have tried to generate such a structure, it should be mentioned in the manuscript. Such a structure should yield more information when compared to SAXS experiments.

      We have not attempted to obtain a cryo-EM structure for the R668A mutant. Our SAXS analysis suggests a transition from a dominant O/pO state to a dominant O/O state. The O/O state is known to exhibit the highest degree of conformational heterogeneity, which severely limits structural insights. We are working to better handle the sample preparation of IDE and perform such analysis without the need to use Fab. We plan to further characterize IDE R668A biochemically and potentially explore other mutations that would provide insights in how IDE works. Armed with that, we will perform the structural analysis of such IDE mutant(s).

    1. eLife Assessment

      This study represents a valuable addition to the catalog of mitochondrial proteins. With the use of methodology based on the bi-genomic split-GFP technology, the authors generate convincing data, including dually localized proteins and topological information, under various growth conditions in yeast. The study represents a key basis for further functional and/or mechanistic studies on mitochondrial protein biogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study conducted by the Shouldiner's group advances the understanding of mitochondrial biology through the utilization of their bi-genomic (BiG) split-GFP assay, they had previously developed and reported. This research endeavors to consolidate the catalog of matrix and inner membrane mitochondrial proteins. In their approach, a genetic framework was employed wherein a GFP fragment (GFP1-10) is encoded within the mitochondrial genome. Subsequently, a collection of strains was created, with each strain expressing a distinct protein tagged with the GFP11 fragment. The reconstitution of GFP fluorescence occurs upon the import of the protein under examination into the mitochondria.

      Strengths:

      Notably, this assay was executed under six distinct conditions, facilitating the visualization of approximately 400 mitochondrial proteins. Remarkably, 50 proteins were conclusively assigned to mitochondria for the first time through this methodology. The strains developed and the extensive dataset generated in this study serve as a valuable resource for the comprehensive study of mitochondrial biology. Specifically, it provides a list of 50 "eclipsed" proteins whose role in mitochondrial remains to be characterized.

      The work could include some functional studies of the dually localized Gpp1 protein, as an example.

    3. Reviewer #2 (Public review):

      The authors addressed the question how mitochondrial proteins that are dually localized or only to a minor fraction localized to mitochondria can be visualized. For this they used an established and previously published method called BiG split-GFP, in which GFP strands 1-10 are encoded in the mitochondrial DNA and fused the GFP11 strand C-terminally to the yeast ORFs using the C-SWAT library. The generated library was imaged under different growth and stress conditions and yielded positive mitochondrial localization for approximately 400 proteins. The strength of this method is the detection of proteins that are dually localized with only a minor fraction within mitochondria, which was so far has hampered due to strong fluorescent signals from other cellular localizations. The weakness of this method is that due to the localization of the GFP1-10 in the mitochondrial matrix, only matrix proteins and IM protein with their C-termini facing the matrix can be detected. In addition, The C-terminal GFP11 might impact on assembly of proteins into multimeric complexes or interfere with biogenesis trapping the tagged protein in an unproductive transport intermediate. Taken these limitations into consideration, the authors provide a new library that can help in identification of eclipsed protein distribution within mitochondria, thus further increasing our knowledge on the complete mitochondrial proteome. The approach of global tagging of the yeast genome is the logical consequence after the successful establishment of the BiG split-GFP for mitochondria. The authors also propose that their approach can be applied to investigate the topology of inner membrane proteins, however, for this the inherent issue remains that even the small GFP11 tag can impact on protein biogenesis and topology. Thus, the approach will not overcome the need to assess protein topology via biochemical approaches detecting endogenous untagged proteins.

      Comments on revisions:

      The first sentence of the abstract should be changed as the statement that "The majority of the mitochondrial proteins (...) often lack clear targeting signals" is in particular for the here analysed IM and matrix protein not correct: Several N-proteomics analysis have defined N-terminal cleavable targeting signals in great detail.

      Also the statement in the title that the assay illuminates protein targeting routes should be reconsidered as experimental evidence for this statement is still scarce.

    4. Reviewer #3 (Public review):

      Summary:

      Here, Bykov et al move the bi-genomic split-GFP system they previously established to the genome-wide level in order to obtain a more comprehensive list of mitochondrial matrix and inner membrane proteins. In this very elegant split-GFP system, the longer GFP fragment, GFP1-10, is encoded in the mitochondrial genome and the shorter one, GFP11, is C-terminally attached to every protein encoded in the genome of yeast Saccharomyces cerevisiae. GFP fluorescence can therefore only be reconstituted if the C-terminus of the protein is present in the mitochondrial matrix, either as part of a soluble protein, a peripheral membrane protein or an integral inner membrane protein. The system, combined with high-throughput fluorescence microscopy of yeast cells grown under six different conditions, enabled the authors to visualize ca. 400 mitochondrial proteins, 50 of which were not visualised before and 8 of which were not shown to be mitochondrial before. The system appears to be particularly well suited for analysis of dually localized proteins and could potentially be used to study sorting pathways of mitochondrial inner membrane proteins.

      Strengths:

      Many fluorescence-based genome-wide screen were previously performed in yeast and were central to revealing the subcellular location of a large fraction of yeast proteome. Nonetheless, these screens also showed that tagging with full-length fluorescent proteins (FP) can affect both the function and targeting of proteins. The strength of the system used in the current manuscript is that the shorter tag is beneficial for detection of a number of proteins whose targeting and/or function is affected by tagging with full length FPs.

      Furthermore, the system used here can nicely detect mitochondrial pools of dually localized proteins. It is especially useful when these pools are minor and their signals are therefore easily masked by the strong signals coming from the major, nonmitochondrial pools of the proteins.

      Weaknesses:

      My only concern is that the biological significance of the screen performed appears limited. The dataset obtained is largely in agreement with several previous proteomic screens but it is, unfortunately, not more comprehensive than them, rather the opposite. For proteins that were identified inside mitochondria for the first time here or were identified in an unexpected location within the organelle, it remains unclear whether these localizations represent some minor, missorted pools of proteins or are indeed functionally important fractions and/or productive translocation intermediates. The authors also allude to several potential applications of the system but do little to explore any of these directions.

      Comments on revisions:

      The revised version of the manuscript submitted by Bykov et al addresses the comments and concerns raised by the Reviewers. It is a pity that the verification of the newly obtained data and its further biological exploration is apparently more challenging than perhaps anticipated.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study conducted by the Schuldiner's group advances the understanding of mitochondrial biology through the utilization of their bi-genomic (BiG) split-GFP assay, which they had previously developed and reported. This research endeavors to consolidate the catalog of matrix and inner membrane mitochondrial proteins. In their approach, a genetic framework was employed wherein a GFP fragment (GFP1-10) is encoded within the mitochondrial genome. Subsequently, a collection of strains was created, with each strain expressing a distinct protein tagged with the GFP11 fragment. The reconstitution of GFP fluorescence occurs upon the import of the protein under examination into the mitochondria.

      We are grateful for the positive evaluation. We would like to clarify that the bi-genomic (BiG) split-GFP assay was developed by the labs of H. Becker and Roza Kucharzyk by highly laborious construction of the strain with mtDNA-encoded GFP<sub>1-10</sub> (Bader et al, 2020). 

      Strengths:

      Notably, this assay was executed under six distinct conditions, facilitating the visualization of approximately 400 mitochondrial proteins. Remarkably, 50 proteins were conclusively assigned to mitochondria for the first time through this methodology. The strains developed and the extensive dataset generated in this study serve as a valuable resource for the comprehensive study of mitochondrial biology. Specifically, it provides a list of 50 "eclipsed" proteins whose role in mitochondria remains to be characterized.

      Weaknesses:

      The work could include some functional studies of at least one of the newly identified 50 proteins.

      In response to this we have expanded the characterization of phenotypic effects resulting from changing the targeting signal and expression levels of the dually localized Gpp1 protein and expanded the data in Fig. 3, panels H and I.

      Reviewer #2 (Public Review):

      The authors addressed the question of how mitochondrial proteins that are dually localized or only to a minor fraction localized to mitochondria can be visualized on the whole genome scale. For this, they used an established and previously published method called BiG split-GFP, in which GFP strands 1-10 are encoded in the mitochondrial DNA and fused the GFP11 strand C-terminally to the yeast ORFs using the C-SWAT library. The generated library was imaged under different growth and stress conditions and yielded positive mitochondrial localization for approximately 400 proteins. The strength of this method is the detection of proteins that are dually localized with only a minor fraction within mitochondria, which so far has hampered their visualization due to strong fluorescent signals from other cellular localizations. The weakness of this method is that due to the localization of the GFP1-10 in the mitochondrial matrix, only matrix proteins and IM proteins with their C-termini facing the matrix can be detected. Also, proteins that are assembled into multimeric complexes (which will be the case for probably a high number of matrix and inner membrane-localized proteins) resulting in the C-terminal GFP11 being buried are likely not detected as positive hits in this approach. Taking these limitations into consideration, the authors provide a new library that can help in the identification of eclipsed protein distribution within mitochondria, thus further increasing our knowledge of the complete mitochondrial proteome. The approach of global tagging of the yeast genome is the logical consequence after the successful establishment of the BiG split-GFP for mitochondria. The authors also propose that their approach can be applied to investigate the topology of inner membrane proteins, however, for this, the inherent issue remains that it cannot be excluded that even the small GFP11 tag can impact on protein biogenesis and topology. Thus, the approach will not overcome the need to assess protein topology analysis via biochemical approaches on endogenous untagged proteins.

      Reviewer #3 (Public Review):

      Summary:

      Here, Bykov et al move the bi-genomic split-GFP system they previously established to the genomewide level in order to obtain a more comprehensive list of mitochondrial matrix and inner membrane proteins. In this very elegant split-GFP system, the longer GFP fragment, GFP1-10, is encoded in the mitochondrial genome and the shorter one, GFP11, is C-terminally attached to every protein encoded in the genome of yeast Saccharomyces cerevisiae. GFP fluorescence can therefore only be reconstituted if the C-terminus of the protein is present in the mitochondrial matrix, either as part of a soluble protein, a peripheral membrane protein, or an integral inner membrane protein. The system, combined with high-throughput fluorescence microscopy of yeast cells grown under six different conditions, enabled the authors to visualize ca. 400 mitochondrial proteins, 50 of which were not visualised before and 8 of which were not shown to be mitochondrial before. The system appears to be particularly well suited for analysis of dually localized proteins and could potentially be used to study sorting pathways of mitochondrial inner membrane proteins.

      Strengths:

      Many fluorescence-based genome-wide screens were previously performed in yeast and were central to revealing the subcellular location of a large fraction of yeast proteome. Nonetheless, these screens also showed that tagging with full-length fluorescent proteins (FP) can affect both the function and targeting of proteins. The strength of the system used in the current manuscript is that the shorter tag is beneficial for the detection of a number of proteins whose targeting and/or function is affected by tagging with full-length FPs.

      Furthermore, the system used here can nicely detect mitochondrial pools of dually localized proteins. It is especially useful when these pools are minor and their signals are therefore easily masked by the strong signals coming from the major, nonmitochondrial pools of the proteins.

      Weaknesses:

      My only concern is that the biological significance of the screen performed appears limited. The dataset obtained is largely in agreement with several previous proteomic screens but it is, unfortunately, not more comprehensive than them, rather the opposite. For proteins that were identified inside mitochondria for the first time here or were identified in an unexpected location within the organelle, it remains unclear whether these localizations represent some minor, missorted pools of proteins or are indeed functionally important fractions and/or productive translocation intermediates. The authors also allude to several potential applications of the system but do little to explore any of these directions.

      We agree with the reviewer that a single method may not be used for the construction of the complete protein inventory of an organelle or its sub-compartment. We suggest that the value of our assay is in providing a complementary view to the existing data and approaches. For example, we confirm the matrix localization of several proteins that were only found in the two proteomic data and never verified before (Vögtle et al, 2017; Morgenstern et al, 2017). Given that proteomics is a very sensitive technique and false positives are hard to completely exclude, our complementary verification is valuable.

      Reviewer #1 (Recommendations for the authors):

      In my opinion, the manuscript can be published as it is, and I would expect that future work will advance the functional properties of the newly found mitochondrial proteins.

      We thank the reviewer for their positive evaluation

      Reviewer #2 (Recommendations for the authors)

      (1) Due to the localization of the GFP1-10 in the matrix, only matrix and IM proteins with C-termini facing the matrix can be detected, this should be added e.g. in the heading of the first results part and discussed earlier in the manuscript. In addition, the limitation that assembly into protein complexes will likely preclude detection of matrix and IM proteins needs to be discussed.

      To address the first point, we edited the title of the first section to only mention the visualization of the matrix-facing proteome and remove the words “inner membrane”. We also clarified early in the Results section that we only consider the matrix-facing C-termini by extending the sentence early in the results section “To compare our findings with published data, we created a unified list of 395 proteins that are observed with high confidence using our assay indicating that their C-terminus is positioned in the matrix (Fig. 2 – figure supplement 1B-D, Table S1).” (P. 6 Lines 1-3). Concluding the comparison with the earlier proteomic studies we also added the sentence “Many proteins are missing because their C-termini are facing the IMS” (P.8 Line 2). 

      To address the second point concerning the possible interference of the complex assembly and protein detection by our assay, we conducted an additional analysis. The analysis takes advantage of the protein complexes with known structures where we could estimate if the C-terminus with the GFP<sub>11</sub> tag would be available for GFP1-10 binding. We added the additional figure (Figure 3 – figure supplement 2) and following text in the Results section (P.7 Lines 22-34): 

      “To examine the influence of protein complex assembly on the performance of the BiG Mito-Split assay we analyzed the published structures of the mitoribosome and ATP synthase (Desai et al, 2017; Srivastava et al, 2018; Guo et al, 2017) and classified all proteins as either having C-termini in, or out of,  the complex. There was no difference between the “in” and “out” groups in the percentage observed in the BiG Mito-Split collection (Fig. 3 – figure supplement 2A) suggesting that the majority of the GFP11tagged proteins have a chance to interact with GFP1-10 before (or instead of) assembling into the complex. PCR and western blot verification of eight strains with the tagged complex subunits for which we observed no signal showed that mitoribosomal proteins were incorrectly tagged or not expressed, and the ATP synthase subunits Atp7, Atp19, and Atp20 were expressed (Fig. 3 – Supplement 2B). Atp19 and Atp20 have their C-termini most likely oriented towards the IMS (Guo et al, 2017) while Atp7 is completely in the matrix and may be the one example of a subunit whose assembly into a complex prevents its detection by the BiG Mito-Split assay.”

      We also consider related points on the interference of the tag and the influence of protein essentiality in the replies to points 3) and 12) of these reviews.

      (2) The imaging data is of high quality, but the manuscript would greatly benefit from additional analysis to support the claims or hypothesis brought forward by the authors. The idea that the nonmitochondrial proteins are imported due to their high sequence similarity to MTS could be easily addressed at least for some of these proteins via import studies, as also suggested by the authors.

      The idea that non-mitochondrial proteins may be imported into mitochondria due to occasional sequence similarity was recently demonstrated experimentally by (Oborská-Oplová et al, 2025). We incorporate this information in the Discussion section as follows (P. 14 Lines 10-16):

      “It was also recently shown that the r-protein uS5 (encoded by RPS2 in yeast) has a latent MTS that is masked by a special mitochondrial avoidance segment (MAS) preceding it (Oborská-Oplová et al, 2025). The removal of the MAS leads to import of uS5 into mitochondria killing the cells. The case of uS5 is an example of occasional similarity between an r-protein and an MTS caused by similar requirements of positive charges for rRNA binding and mitochondrial import. It remains unclear if other r-proteins have a MAS and if there are other mechanisms that protect mitochondria from translocation of cytosolic proteins.”

      We also conducted additional analysis to substantiate the claim that ribosomal (r)-proteins are similar in their physico-chemical properties to MTS-containing mitochondrial proteins. For this we chose not to use prediction algorithms like TartgetP and MitoFates that were already trained on the same dataset of yeast proteins to discriminate cytosolic and mitochondrial localization. Instead, we extended the analysis earlier made by (Woellhaf et al, 2014) and calculated several different properties such as charge, hydrophobicity, hydrophobic moment and amino acid content for mitochondrial MTS-containing proteins, cytosolic non-ribosomal proteins, and r-proteins. The analysis showed striking similarity of r-proteins and mitochondrial proteins. We incorporate a new Figure 3 – figure supplement 3 and the following text in the Results section (P. 8 Lines14-22): 

      “Five out of eight proteins are components of the cytosolic ribosome (r-proteins). In agreement with previous reports (Woellhaf et al, 2014) we find that their unique properties, such as charge, hydrophobicity and amino acid content, are indeed more similar to mitochondrial proteins than to cytosolic ones (Fig. 3 – figure supplement 3). Additional experiments with heterologous protein expression and in vitro import will be required to confirm the mitochondrial import and targeting mechanisms of these eight non-mitochondrial proteins. The data highlights that out of hundreds of very abundant proteins with high prediction scores only few are actually imported and highlights the importance of the mechanisms that help to avoid translocation of wrong proteins (Oborská-Oplová et al, 2025).”

      To further prove the possibility of r-protein import into mitochondria we aimed to clone the r-proteins identified in this work for cell-free expression and import into purified mitochondria. Despite the large effort, we have succeeded in cloning and efficiently expressing only Rpl23a (Author response image 1 A). Rpl23a indeed forms proteinase-protected fractions in a membrane potential-dependent manner when incubated with mitochondria. The inverse import dynamics of Rpl23a could be either indicative of quick degradation inside mitochondria or of background signal during the import experiments (Author response image 1.A). To address the r-protein degradation possibility, we measured how does GFP signal change in the BiG Mito-Split diploid collection strains after blocking cytosolic translation with cycloheximide (CHX). For this we selected Mrpl12a, that had one of the highest signals. We did not detect any drop in fluorescence signal for Rpl12a and the control protein Mrpl6 (Author response image 1 B). This might indicate the lack of degradation, or the degradation of the whole protein except GFP<sub>11</sub> that remains connected to GFP<sub>1-10</sub>. Due to time constrains we could not perform all experiments for the whole set of potentially imported r-proteins. Since more experiments are required to clearly show the mechanisms of mitochondrial r-protein import, degradation, and toxicity, or possible moonlighting functions (such as import into mitochondria derived from pim1∆ strain, degradation assays, fractionations, and analyses with antibodies for native proteins) we decided not to include this new data into the manuscript itself.

      Author response image 1.

      The import of r-proteins into mitochondria and their stability. (A) Rpl23 was synthesized in vitro (Input), radiolabeled, and imported into mitochondria isolated from BY4741 strain as described before (Peleh et al, 2015); the import was performed for 5,10, or 15 minutes and mitochondria were treated with proteinase K (PK) to degrade nonimported proteins; some reactions were treated with the mix of valinomycin, antimycin, and oligomycin (VAO) to dissipate mitochondrial membrane potential; the proteins were visualized by SDS-PAGE and autoradiography (B) The strains from the diploid BiG Mito-Split collection were grown in YPD to mid-logarithmic growth phase, then CHX was added to block translation and cell aliquots were taken from the culture and analyzed by fluorescence microscopy at the indicated time points. Scale bar is 5 µm.

      (3) The claim that the approach can be used to assess the topology of inner membrane proteins is problematic as the C-terminal tag can alter the biogenesis pathway of the protein or impact on the translocation dynamics (in particular as the imaging method applied here does not allow for analysis of dynamics). The hypothesis that the biogenesis route can be monitored is therefore far-reaching. To strengthen the hypothesis the authors should assess if the C-terminal GFP11 influences protein solubility by assessing protein aggregation of e.g. Rip1.

      We agree with the reviewer that the tag and assembly of GFP<sub>1-10/11</sub> can further complicate the assessment of topology of the IM proteins that already have complex biogenesis routes (lateral transfer, conservative, and a Rip1-specific Bcs1 pathway). To emphasize that the assessment of the steady state topology needs to be backed up by additional biochemical approaches, we edited the beginning of the corresponding Results sections as follows (P. 11 Lines 2-6): 

      “Studying membrane protein biogenesis requires an accurate way to determine topology in vivo. The mitochondrial IM is one of the most protein-rich membranes in the cell supporting a wide variety of TMD topologies with complex biogenesis pathways. We aimed to find out if our BiG Mito-Split collection can accurately visualize the steady-state localization of membrane protein C-termini protruding into the matrix or trap protein transport intermediates” (inserted text is underlined).

      The collection that we studied by microscopy is diploid and contains one WT copy of each 3xGFP<sub>11</sub>tagged gene. To assess the influence of the tag on the protein function we performed growth assays with haploid strains which have one 3xGFP<sub>11</sub>-tagged gene copy and no GFP<sub>1-10</sub>. We find that Rip13xGFP<sub>11</sub> displays slower growth on glycerol at 30˚C and even slower at 37˚C while tagged Qcr8, Qcr9, and Qcr10 grow normally (Author response image 2 A). Based on the growth assays and microscopy it is not possible to conclude whether the “Qcr” proteins’ biogenesis is affected by the tag. It may be that laterally sorted proteins are functional with the tag and constitute the majority while only a small portion is translocated into the matrix, trapped and visualized with GFP<sub>1-10</sub>. In case of Rip1 it was shown that C-terminal tag can affect its interaction with the chaperone Mzm1 and promote Rip1 aggregation (Cui et al, 2012). The extent of Rip1 function disruption can be different and depends on the tag. We hypothesize that our split-assay may trap the pre-translocation intermediate of Rip1 and can be helpful to study its interactors. To test this, we performed anti-GFP immune-precipitation (IP) using GFP-Trap beads (Author response image 2 B).

      Author response image 2.

      The influence of 3x-GFP11 on the function and processing of the inner membrane proteins. (A) Drop dilution assays with haploid strains from C-SWAT 3xGFP<Sub>11</sub> library on fermentative (YPD) and respiratory (YPGlycerol) media at different temperatures. (B) Immuno-precipitation with GFP-Trap agarose was performed on haploid strain that has only Rip1-3xGFP<sub>11</sub> and on the diploid strain derived from this haploid mated with BiG Mito-Split strain containing mtGFP<sub>1-10</sub> and WT untagged Rip1 using the lysis (1% TX-100) and washing protocols provided by the manufacturer; the total (T) and eluted with the Laemmli buffer (IP) samples were analyzed by immunoblotting with polyclonal rabbit antibodies against GFP (only visualizes GFP<Sub>11</sub> in these samples) and Rip1 (visualizes both tagged and WT Rip1). Polyclonal home-made rabbit antisera for GFP and Rip1 were kindly provided by Johannes Herrmann (Kaiserslautern) and Thomas Becker (Bonn); the antisera were diluted 1:500 for decorating the membranes.

      We find that the haploid strain with Rip1-3xGFP<sub>11</sub> contains not only mature (m) and intermediate (i) forms but also an additional higher Mw band that we interpreted as precursor that was not cleaved by MPP. WT Rip1 in the diploid added two more lower Mw bands: (m) and (i) forms of the untagged Rip1. IP successfully enriched GFP<sub>1-10</sub> fragment as visualized by anti-GFP staining. Interestingly only the highest Mw Rip1-3xGFP<sub>11</sub> band was also enriched when anti-Rip1 antibodies were used to analyze the samples. This suggests that Rip1 precursor gets completely imported and interacts with GFP<sub>1-10</sub> and can be pulled down. It is however not processed. Processed Rip1 is not interacting with GFP<sub>1-10</sub>. Based on the literature we expect all Rip1 in the matrix to be cleaved by MPP including the one interacting with GFP. Due to this discrepancy, we did not include this data in the manuscript. This is however clear that the assay may be useful to analyze biogenesis intermediates of the IM and matrix proteins. To emphasize this, we added information on the C-terminal tagging of Rip1 in the Results section (P. 11 Lines 18-20):

      “It was shown that a C-terminal tag on Rip1 can prevent its interaction with the chaperone Mzm1 and promote aggregation in the matrix (Cui et al, 2012). It is also possible that our assay visualizes this trapped biogenesis intermediate.”

      We also added a note on biogenesis intermediates in the Discussion (P. 14 Line 36 onwards): 

      “It is possible that the proteins with C-termini that are translocated into the IMS from the matrix side can be trapped by the interaction with GFP<sub>1-10</sub>. In that case, our assay can be a useful tool to study these pre-translocation intermediates.”

      (4) The hypothesis that the method can reveal new substrates for Bcs1 is interesting, and it would strongly increase the relevance for the scientific community if this would be directly tested, e.g. by deleting BCS1 and testing if more IM proteins are then detected by interaction with the matrix GFP110.

      we attempted to move the BiG Mito-Split assay into haploid strains where BCS1 and other factors can be deleted, however, this was not successful. Since this was a big effort (We cloned 10 potential substrate proteins but none of them were expressed) we decided not to pursue this further.

      (5) The screening of six different growth conditions reflects the strength of the high-throughput imaging readout. However, the interpretation of the data and additional follow-up on this is rather short and would be a nice addition to the present manuscript. In addition, one wonders, what was the rationale behind these six conditions (e.g. DTT treatment)? The direct metabolic shift from fermentation to respiration to boost mitochondrial biogenesis would be a highly interesting condition and the authors should consider adding this in the present manuscript.

      we agree with the reviewer that the analysis of different conditions is a strength of this work. However, we did not reveal any clear protein groups with strong conditional import and thus it was hard to select a follow-up candidate. The selection of conditions was partially driven by the technical possibilities: the media change is challenging on the robotic system; heat shock conditions make microscope autofocus unstable; library strain growth on synthetic respiratory media is very slow and the media cannot be substituted with rich media due to its autofluorescence. However, the usage of the spinning disc confocal microscope allowed us to screen directly in synthetic oleate media which has a lot of background on widefield systems due to oil micelles. We extended the explanation of condition choice as follows (P. 4 Line 34 onwards): 

      “The diploid BiG Mito-Split collection was imaged in six conditions representing various carbon sources and a diversity of stressors the cells can adapt to: logarithmic growth on glucose as a control carbon source and oleic acid as a poorly studied carbon source; post-diauxic (stationary) phase after growth on glucose where mitochondria, are more active and inorganic phosphate (Pi) depletion that was recently described to enhance mitochondrial membrane potential (Ouyang et al, 2024); as stress conditions we chose growth on glucose in the presence of 1 mM dithiothreitol (DTT) that might interfere with the disulfide relay system in the IMS, and nitrogen starvation as a condition that may boost biosynthetic functions of mitochondria. DTT and nitrogen starvation were earlier used for a screen with the regular C’-GFP collection (Breker et al, 2013). Another important consideration for selecting the conditions was the technical feasibility to implement them on automated screening setups.”

      Reviewer #3 (Recommendations for the authors )

      (6) This is a very elegant and clearly written study. As mentioned above, my only concern is that the biological significance of the obtained data, at this stage, is rather limited. It would have been nice if the authors explored one of the potential applications of the system they propose. For example, it should be relatively easy to analyze whether Cox26, Qcr8, Qcr9, or Qcr10 are new substrates of Bsc1, as the authors speculate.

      we thank the reviewer for their positive feedback. We addressed the biological application of the screen by including new data on metabolite concentrations in the strains where Gpp1 N-terminus was mutated leading to loss of the mitochondrial form. We added panels H and I to Figure 4, the new Supplementary Table S2 and appended the description of these results at the end of the third Results subsection (P. 10 Lines 19-35). Our data now show a role for the mitochondrial fraction of Gpp1 which adds mechanistic insight into this dually localized protein.

      We also were interested in the applications of our system to the study of mitochondrial import. However, the study of Cox26, Qcr8, Qcr9, and Qcr10 was not successful (also related to point 4, Reviewer #2). We thus decided to investigate the import mechanisms of the poorly studied dually localized proteins Arc1, Fol3, and Hom6 (related to Figure 4 of the original manuscript). To this end, we expressed these proteins in vitro, radiolabeled, and performed import assays with purified mitochondria. Arc1 was not imported, Fol3 and Hom6 gave inconclusive results (Author response image 3). Since it is known that even some genuine fully or dually localized mitochondrial proteins such as Fum1 cannot be imported in vitro post-translationally (Knox et al, 1998), we cannot draw conclusions from these experiments and left them out of the revised manuscript. Additional investigation is required to clarify if there exist special cytosolic mechanisms for the import of these proteins that were not reconstituted in vitro such as co-translational import.

      Author response image 3.

      In vitro import of poorly studies dually localized proteins. Arc1, Fol3, and Hom6 were cloned into pGEM4 plasmid, synthesized in vitro (Input), radiolabeled, and imported into mitochondria isolated from BY4741 strain as described before (Peleh et al, 2015); the import was performed for 5,10, or 15 minutes and mitochondria were treated with proteinase K (PK) to degrade non-imported proteins; some reactions were treated with the mix of valinomycin, antimycin, and oligomycin (VAO) to dissipate mitochondrial membrane potential. The proteins were separated by SDS-PAGE and visualized by autoradiography.

      Minor comments:

      (7) It is unclear why the authors used the six growth conditions they used, and why for example a nonfermentable medium was not included at all.

      we address this shortcoming in the reply to the previous point 5 (Reviewer #2).

      (8) Page 2, line 17 - "Its" should be corrected to "its".

      Changed

      (9) Page 2, line 25 to the end of the paragraph - the authors refer to the TIM complex when actually the TIM23 complex is probably meant. Also, it would be clearer if the TIM22 complex was introduced as well, especially in the context of the sentence stating that "the IM is a major protein delivery destination in mitochondria".

      This was corrected.

      (10) Page 5, line 35 - "who´s" should be corrected to "whose".

      This was corrected.

      (11) Page 9, line 5 - "," after Gpp1 should probably be "and".

      This was corrected.

      (12) Page 11 - the authors discuss in several places the possible effects of tags and how they may interfere with "expression, stability and targeting of proteins". Protein function may also be dramatically affected by tags - a quick look into the dataset shows that several mitochondrial matrix and inner membrane proteins that are essential for cell viability were not identified in the screen, likely because their function is impaired.

      we agree with the reviewer that the influence of tags needs to be carefully evaluated. This is not always possible in the context of whole genomic screens. Sometimes, yeast collections (and proteomic datasets) can miss well-known mitochondrial residents without a clear reason. To address this important point we conducted an additional analysis to look specifically at the essential proteins. We indeed found that several of the mitochondrial proteins that are essential for viability were absent from the collection at the start, but for those present, their essentiality did not impact the likelihood to be detected in our assay. To describe the analysis we added the following text and a Fig. 3 – figure supplement 2. Results now read (P.7 Lines 8-21): 

      “Next, we checked the two categories of proteins likely to give biased results in high-throughput screens of tagged collections: proteins essential for viability, and molecular complex subunits. To look at the first category we split the proteomic dataset of soluble matrix proteins (Vögtle et al. 2017) into essential and non-essential ones according to the annotations in the Saccharomyces Genome Database (SGD) (Wong et al, 2023). We found that there was no significant difference in the proportion of detected proteins in both groups (17 and 20 % accordingly), despite essential proteins being less represented in the initial library (Fig. 3 – figure supplement 2A). From the three essential proteins of the (Vögtle et al. 2017) dataset for which the strains present in our library but showed no signal, two were nucleoporins Nup57 and Nup116, and one was a genuine mitochondrial protein Ssc1. Polymerase chain reaction (PCR) and western blot verification showed that the Ssc1 strain was incorrect (Fig. 3 – figure supplement 2B). We conclude that essential proteins are more likely to be absent or improperly tagged in the original C’-SWAT collection, but the essentiality does not affect the results of the BiG Mito-Split assay.” 

      Discussion (P. 13 Lines 23-26): 

      “We did not find that protein complex components or essential proteins are more likely to be falsenegatives. However, some essential proteins were absent from the collection to start with (Fig. 3 – figure supplement 2A). Thus, a small tag allows visualization of even complex proteins.” 

      From our data it is difficult to estimate the effect of tagging on protein function. We also addressed the effect of tagging Rip1 as well as performed growth assays on the tagged small “Qcr proteins” in the reply to point 3 (Reviewer #2). It is also difficult to estimate the effect of GFP<sub>1-10</sub> and <sub>11</sub> complex assembly on protein function since the presence of functional, unassembled GFP<sub>11</sub> tagged pool cannot be ruled out in our assay. 

      Other changes

      Figure and table numbers changed after new data additions.

      A sentence added in the abstract to highlight the additional experiments on Gpp1 function: “We use structure-function analysis to characterize the dually localized protein Gpp1, revealing an upstream start codon that generates a mitochondrial targeting signal and explore its unique function.”

      The reference to the PCR verification (Fig. 3 – Supplement 2B) of correct tagging of Ycr102c was added to the Results section (P.8 Line 6), western blot verification added on.

      Added the Key Resources Table at the beginning of the Methods section.

      Small grammar edits, see tracked changes.

      References:

      Bader G, Enkler L, Araiso Y, Hemmerle M, Binko K, Baranowska E, De Craene J-O, Ruer-Laventie J, Pieters J, Tribouillard-Tanvier D, et al (2020) Assigning mitochondrial localization of dual localized proteins using a yeast Bi-Genomic Mitochondrial-Split-GFP. eLife 9: e56649

      Cui T-Z, Smith PM, Fox JL, Khalimonchuk O & Winge DR (2012) Late-Stage Maturation of the Rieske Fe/S Protein: Mzm1 Stabilizes Rip1 but Does Not Facilitate Its Translocation by the AAA ATPase Bcs1. Mol Cell Biol 32: 4400–4409

      Desai N, Brown A, Amunts A & Ramakrishnan V (2017) The structure of the yeast mitochondrial ribosome. Science 355: 528–531

      Guo H, Bueler SA & Rubinstein JL (2017) Atomic model for the dimeric FO region of mitochondrial ATP synthase. Science 358: 936–940

      Knox C, Sass E, Neupert W & Pines O (1998) Import into Mitochondria, Folding and Retrograde Movement of Fumarase in Yeast. J Biol Chem 273: 25587–25593

      Morgenstern M, Stiller SB, Lübbert P, Peikert CD, Dannenmaier S, Drepper F, Weill U, Höß P, Feuerstein R, Gebert M, et al (2017) Definition of a High-Confidence Mitochondrial Proteome at Quantitative Scale. Cell Rep 19: 2836–2852

      Oborská-Oplová M, Geiger AG, Michel E, Klingauf-Nerurkar P, Dennerlein S, Bykov YS, Amodeo S, Schneider A, Schuldiner M, Rehling P, et al (2025) An avoidance segment resolves a lethal nuclear–mitochondrial targeting conflict during ribosome assembly. Nat Cell Biol 27: 336–346

      Peleh V, Ramesh A & Herrmann JM (2015) Import of Proteins into Isolated Yeast Mitochondria. In Membrane Trafficking: Second Edition, Tang BL (ed) pp 37–50. New York, NY: Springer

      Srivastava AP, Luo M, Zhou W, Symersky J, Bai D, Chambers MG, Faraldo-Gómez JD, Liao M & Mueller DM (2018) High-resolution cryo-EM analysis of the yeast ATP synthase in a lipid membrane. Science 360: eaas9699

      Vögtle F-N, Burkhart JM, Gonczarowska-Jorge H, Kücükköse C, Taskin AA, Kopczynski D, Ahrends R, Mossmann D, Sickmann A, Zahedi RP, et al (2017) Landscape of submitochondrial protein distribution. Nat Commun 8: 290

      Woellhaf MW, Hansen KG, Garth C & Herrmann JM (2014) Import of ribosomal proteins into yeast mitochondria. Biochem Cell Biol 92: 489–498

    1. Author Response:

      Reviewer #1:

      This is a very interesting study that examines the neural processes underlying age-related changes in the ability to prioritize memory for value information. The behavioral results show that older subjects are better able to learn which information is valuable (i.e., more frequently presented) and are better at using value to prioritize memory. Importantly, prioritizing memory for high-value items is accompanied by stronger neural responses in the lateral PFC, and these responses mediate the effects of age on memory.

      Strengths of this paper are the large sample size and the clever learning tasks. The results provide interesting insights into potential neurodevelopmental changes underlying the prioritization of memory.

      There are also a few weaknesses:

      First, the effects of age on repetition suppression in the parahippocampal cortex are relatively modest. It is not clear why repetition suppression effects should only be estimated using the first and last but not all presentations. The consideration of linear and quadratic effects of repetition number could provide a more reliable estimate and provide insights into age-related differences in the dynamics of frequency learning across multiple repetitions.

      Thank you for this helpful suggestion. As recommended, we have now computed neural activation within our parahippocampal region of interest not just for the first and last appearance of each item during frequency learning, but for all appearances. Specifically we extended our repetition suppression analysis described in the manuscript to include all image repetitions (p. 36 - 37). Our new methods description reads:

      “For each stimulus in the high-frequency condition, we examined repetition suppression by measuring activation within a parahippocampal ROI during the presentation of each item during frequency-learning. We defined our ROI by taking the peak voxel (x = 30, y = -39, z = -15) from the group-level first > last item appearance contrast for high-frequency items during frequency-learning and drawing a 5 mm sphere around it. This voxel was located in the right parahippocampal cortex, though we observed widespread and largely symmetric activation in bilateral parahippocampal cortex. To encompass both left and right parahippocampal cortex within our ROI, we mirrored the peak voxel sphere. For each participant, we modeled the neural response to each appearance of each item using the Least Squares-Separate approach (Mumford et al., 2014). Each first-level model included a regressor for the trial of interest, as well as separate regressors for the onsets of all other items, grouped by repetition number (e.g., a regressor for item onsets on their first appearance, a regressor for item onsets on their second appearance, etc.). Values that fell outside five standard deviations from the mean level of neural activation across all subjects and repetitions were excluded from subsequent analyses (18 out of 10,320 values; .01% of observations). In addition to examining neural activation as a function of stimulus repetition, we also computed an index of repetition suppression for each high-frequency item by computing the difference in mean beta values within our ROI on its first and last appearance.”

      As suggested, we ran a mixed effects model examining the influence of linear and quadratic age and linear and quadratic repetition number on neural activation. In line with our whole-brain analysis, we observed a robust effect of linear and quadratic repetition number, suggesting that neural activation decreased non-linearly across stimulus repetitions. In addition, we observed significant interactions between our age and repetition number terms, suggesting that repetition suppression increased into early adulthood. Thus, although the relation we observed between age and repetition suppression is modest, the results from our new analyses suggest it is robust. Because these results largely aligned with the pattern of age-related change we observed in our analysis of repetition suppression indices, we continued to use that compressed metric in subsequent analyses looking at relations with behavior. However, we have updated our results section to include the full analysis taking into account all item repetitions, as suggested. Our updated manuscript now reads (p. 9):

      “We next examined whether repetition suppression in the parahippocampal cortex changed with age. We defined a parahippocampal region of interest (ROI) by drawing a 5mm sphere around the peak voxel from the group-level first > last appearance contrast (x = 30, y = -39, z = -15), and mirrored it to encompass both right and left parahippocampal cortex (Figure 2C). For each participant, we modeled the neural response to each appearance of each high-frequency item. We then examined how neural activation changed as a function of repetition number and age. To account for non-linear effects of repetition number, we included linear and quadratic repetition number terms. In line with our whole-brain analysis, we observed a main effect of repetition number, F(1, 5016.0) = 30.64, p < .001, indicating that neural activation within the parahippocampal ROI decreased across repetitions. Further, we observed a main effect of quadratic repetition number, F(1, 9881.0) = 7.47, p = .006, indicating that the reduction in neural activity was greatest across earlier repetitions (Fig 3A). Importantly, the influence of repetition number on neural activation varied with both linear age, F(1, 7267.5) = 7.2, p = .007 and quadratic age , F(1, 7260.8) = 6.9, p = .009. Finally, we also observed interactions between quadratic repetition number and both linear and quadratic age (ps < .026). These age-related differences suggest that repetition suppression was greatest in adulthood, with the steepest increases occurring from late adolescence to early adulthood (Figure 3).”

      "For each participant for each item, we also computed a “repetition suppression index” by taking the difference in mean beta values within our ROI on each item’s first and last appearance (Ward et al., 2013). These indices demonstrated a similar pattern of age- related variance — we found that the reduction of neural activity from the first to last appearance of the items varied positively with linear age, F(1, 78.32) = 3.97, p = .05, and negatively with quadratic age, F(1, 77.55) = 4.8, p = .031 (Figure 3B). Taken together, our behavioral and neural results suggest that sensitivity to the repetition of items in the environment was prevalent from childhood to adulthood but increased with age.”

      In addition, in the main text on p. 10, we have now included the suggested scatter plot (see new Fig. 3B, below) as well as a modified version of our previous figure S2 to show neural activation across all repetitions in the parahippocampal cortex (see new Fig 3A). We thank the reviewer for this helpful suggestion, as we believe these new figures much more clearly illustrate the repetition suppression effects we observed during frequency learning.

      Fig 3. (A) Neural activation within a bilateral parahippocampal cortex ROI decreased across stimulus repetitions both linearly, F(1, 5015.9) = 30.64, p < .001, and quadratically, F(1, 9881.0) = 7.47, p = .006. Repetition suppression increased with linear age, F(1, 7267.5) = 7.2, p = .007, and quadratic age F(1, 7260.8) = 6.9, p = .009. The horizontal black lines indicate median neural activation values. The lower and upper edges of the boxes indicate the first and third quartiles of the grouped data, and the vertical lines extend to the smallest value no further than 1.5 times the interquartile range. Grey dots indicate data points outside those values. (B) The decrease in neural activation in the bilateral PHC ROI from the first to fifth repetition of each item also increased with both linear age, F(1, 78.32) = 3.97, p = .05, and quadratic age, F(1, 77.55) = 4.8, p = .031.

      Second, the behavioral data show effects of age on both initial frequency learning and the effects of item frequency on memory. It is not clear whether the behavioral findings reflect the effects of age on the ability to use value information to prioritize memory or simply better initial learning of value-related information on older subjects.

      Thank you for raising this important point. Indeed, one of our main findings is that older participants are better both at learning the structure of their environments and also at using structured knowledge to strategically prioritize memory. In our original manuscript, we described results of a model that included participants’ explicit frequency reports as a predictor of memory. Model comparison revealed that participants’ frequency reports — which we interpret as reflecting their beliefs about the structure of the environment — predicted memory more strongly than the item’s true frequency. In other words, participants’ beliefs about the structure of the environment (even if incorrect) more strongly influenced their memory encoding than the true structure of the environment. Critically, however, frequency reports interacted with age to predict memory (Fig 8). Even when we accounted for age-related differences in knowledge of the structure of the environment, older participants demonstrated a stronger influence of frequency on memory, suggesting they were better able to use their beliefs to control subsequent associative encoding. We have now clarified our interpretation of this model in our discussion on p. 23:

      “Importantly, though we observed age-related differences in participants’ learning of the structure of their environment, the strengthening of the relation between frequency reports and associative memory with increasing age suggests that age differences in learning cannot fully account for age differences in value-guided memory. Even when accounting for individual differences in participants’ explicit knowledge of the structure of the environment, older participants demonstrated a stronger relation between their beliefs about item frequency and associative memory, suggesting that they used their beliefs to guide memory to a greater degree than younger participants.”

      As noted by the reviewer, however, our initial memory analysis did not account for age-related differences in participants’ initial, online learning of item frequency, and our neural analyses further did not account for age differences in explicit frequency reports. We have now run additional control analyses to account for the potential influence of individual differences in frequency learning on associative memory. Specifically, for each participant, we computed three metrics: 1.) their overall accuracy during frequency-learning, 2.) their overall accuracy for the last presentation of each item during frequency-learning (as suggested by Reviewer 2), and 3.) the mean magnitude of the error in their frequency reports. We then included these metrics as covariates in our memory analyses.

      When we include these control variables in our model, we continue to observe a robust effect of frequency condition (p < .001) as well as robust interactions between frequency condition and linear and quadratic age (ps < .003) on associative memory accuracy. We also observed a main effect of frequency error magnitude on memory accuracy (p < .001). Here, however, we no longer observe main effects of age or quadratic age on overall memory accuracy. Given the relation we observed between frequency error magnitudes and age, the results from this model suggests that there may be age-related improvements in overall memory that influence both memory for associations as well as learning of and memory for item frequencies. The fact that age no longer relates to overall memory when controlling for frequency error magnitudes suggest that age-related variance in memory for item frequencies and memory for associations are strongly related within individuals. Importantly, however, age-related variance in memory for item frequencies did not explain age-related variance in the influence of frequency condition on associative memory, suggesting that there are developmental differences in the use of knowledge of environmental structure to prioritize valuable information in memory that persist even when controlling for age-related differences in initial learning of environmental regularities. Given the importance of this analysis in elucidating the relation between the learning of environmental structure and value-guided memory, we have now updated the results in the main text of our manuscript to include them. Specifically, on p. 13, we now write:

      “Because we observed age-related differences in participants’ online learning of item frequencies and in their explicit frequency reports, we further examined whether these age differences in initial learning could account for the age differences we observed in associative memory. To do so, we ran an additional model in which we included each participant’s mean frequency learning accuracy, mean frequency learning accuracy on the last repetition of each item, and explicit report error magnitude as covariates. Here, explicit report error magnitude predicted overall memory performance, χ2(1) =13.05, p < .001, and we did not observe main effects of age or quadratic age on memory performance (ps > .20). However, we continued to observe a main effect of frequency condition, χ2(1) = 19.65 p < .001, as well as significant interactions between frequency condition and both linear age χ2(1) = 10.59, p = .001, and quadratic age χ2(1) = 9.15, p = .002. Thus, while age differences in initial learning related to overall memory performance, they did not account for age differences in the use of environmental regularities to strategically prioritize memory for valuable information.”

      In addition, as suggested by the reviewer, we also included the three covariates as control variables in our mediation analysis. When controlling for online frequency learning and explicit frequency report errors, PFC activity continued to mediate the relation between age and memory difference scores. We have now included these results on p. 16 - 17 of the main text:

      “Further, when we included quadratic age, WASI scores, online frequency learning accuracy, online frequency learning accuracy on the final repetition of each item, and mean explicit frequency report error magnitudes as control variables in the mediation analysis, PFC activation continued to mediate the relation between linear age and memory difference scores (standardized indirect effect: .56, 95% confidence interval: [.06, 1.35], p = .023; standardized direct effect; 1.75, 95% confidence interval: [.12, .3.38], p = .034).”

      We also refer to these analyses when we interpret our findings in our discussion. On p. 23, we write:

      “In addition, we continued to observe a robust interaction between age and frequency condition on associative memory, even when controlling for age-related change in the accuracy of both online frequency learning and explicit frequency reports. Thus, though we observed age differences in the learning of environmental regularities and in their influence on subsequent associative memory encoding, our developmental memory effects cannot be fully explained by differences in initial learning.”

      We thank the reviewer for this constructive suggestion, as we believe these control analyses strengthen our interpretation of age differences in both the learning and use of environmental regularities to prioritize memory.

      Reviewer #2:

      Nussenbaum and Hartley provide novel neurobehavioral evidence of how individuals differentially use incrementally acquired information to guide goal-relevant memory encoding, highlighting roles for the medial temporal lobe during frequency learning, and the lateral prefrontal cortex for value-guided encoding/retrieval. This provides a novel behavioral phenomenology that gives great insight into the processes guiding adaptive memory formation based on prior experience. However, there were a few weaknesses throughout the paper that undermined an overall mechanistic understanding of the processes.

      First, there was a lack of anatomical specificity in the discussion and interpretation of both prefrontal and striatal targets, as there is great heterogeneity across these regions that would infer very different behavioral processes.

      We agree with the reviewer that our introduction and discussion would benefit from more anatomical granularity, and we did indeed have a priori predictions about more specific neural regions that might be involved in our task.

      First, we expected that both the ventral and dorsal striatum might be responsive to stimulus value across our age range. Prior work has suggested that activity in the ventral striatum often correlates with the intrinsic value of a stimulus, whereas activity in the dorsal striatum may reflect goal-directed action values (Liljeholm & O’Doherty, 2012). In our task, we expected that high-frequency items may acquire intrinsic value during frequency-learning that is then reflected in the striatal response to these items during encoding. However, because participants were not rewarded when they encountered these images, but rather incentivized to encode associations involving them, we hypothesized that the dorsal striatum may represent the value of the ‘action’ of remembering each pair. In line with this prediction, the dorsal striatum, and the caudate in particular, have also been shown to be engaged during value-guided cognitive control (Hikosaka et al., 2014; Insel et al., 2017).

      We have now revised our introduction to include greater specificity in our anatomical predictions on p. 3:

      “When individuals need to remember information associated with previously encountered stimuli (e.g., the grocery store aisle where an ingredient is located), frequency knowledge may be instantiated as value signals, engaging regions along the mesolimbic dopamine pathway that have been implicated in reward anticipation and the encoding of stimulus and action values. These areas include the ventral tegmental area (VTA) and the ventral and dorsal striatum (Adcock et al., 2006; Liljeholm & O’Doherty, 2012; Shigemune et al., 2014).”

      Though we initially predicted that encoding of high-value information would be associated with increased activation in both the ventral and dorsal striatum, the activation we observed was largely within the dorsal striatum, and specifically, the caudate. We have now revised our discussion accordingly on p. 26:

      “Though we initially hypothesized that both the ventral and dorsal striatum may be involved in encoding of high-value information, the activation we observed was largely within the dorsal striatum, a region that may reflect the value of goal-directed actions (Liljeholm & O’Doherty, 2012). In our task, rather than each stimulus acquiring intrinsic value during frequency-learning, participants may have represented the value of the ‘action’ of remembering each pair during encoding.”

      Second, while the ventromedial PFC often reflects value, given the control demands of our task, we expected to see greater activity in the dorsolateral PFC, which is often engaged in tasks that require the implementation of cognitive control (Botvinick & Braver, 2015). Thus, we hypothesized that individuals would show increased activation in the dlPFC during encoding of high- vs. low-value information, and that this activation would vary as a function of age. We have now clarified this hypothesis on p. 3:

      “Value responses in the striatum may signal the need for increased engagement of the dorsolateral prefrontal cortex (dlPFC) (Botvinick & Braver, 2015), which supports the implementation of strategic control.”

      In our discussion, we review disparate findings in the developmental literature and discuss factors that may contribute to these differences across studies. For example, in our discussion of Davidow et al. (2016), we highlight differences between their task design and the present study, focusing on how their task involved immediate receipt of reward at the time of encoding, while our task incentivized memory accuracy. We further note that studies that involve reward delivery at the time of encoding may engage different neural pathways than those that promote goal-directed encoding. Beyond Davidow et al. (2016), there are no other neuroimaging studies that examine the influence of reward on memory across development. Thus, we cannot relate our present neural findings to prior work on the development of value-guided memory. As we note in our discussion (p. 28), “Further work is needed to characterize both the influence of different types of reward signals on memory across development, as well as the development of the neural pathways that underlie age-related change in behavior.”

      Second, age-related differences in neural activation emerged both during the initial frequency learning as well as during memory-guided adaptive encoding. While data from this initial phase was used to unpack the behavioral relationships on adaptive memory, a major weakness of the paper was not connecting these measures to neural activity during memory encoding/retrieval. This would be especially relevant given that both implicit and explicit measures of frequency predicted subsequent performance, but it is unclear which of these measures was guiding lateral PFC and caudate responses.

      Thank you for this valuable suggestion. We agree that it would be interesting to link frequency- learning behavior to neural activity at encoding. As such, we have now conducted additional analyses to explore these relations.

      In the original version of our manuscript, we examined behavior at the item level through mixed- effects models, and neural activation during encoding at the participant level. Thus, to examine the relation between frequency-learning metrics and neural activation at encoding, we created two additional participant-level metrics. For each participant we computed their average repetition suppression index, and a measure of frequency distance. The average repetition suppression index reflects the overall extent to which the participant demonstrated repetition suppression in response to the fifth presentation of the high-frequency items, and is computed by averaging each participant’s repetition suppression indices across items. We hypothesized that participants who demonstrated the greatest degree of repetition suppression might be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information. The frequency distance metric reflects the average distance between participants’ explicit frequency reports for items that appeared once and items that appeared five times, and is computed by averaging their explicit frequency reports for items in each frequency condition, and then subtracting the average reports in the low-frequency condition from those in the high- frequency condition. We hypothesized that participants with the largest frequency distances might similarly be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information.

      We first wanted to confirm that the relations we observed between repetition suppression, frequency reports, and age, could also be observed at the participant level. In line with our prior, behavioral analyses, we found that age related to both mean repetition suppression indices (marginally; linear age: p = .067; quadratic age: p = .042); and frequency distances (linear and quadratic age: ps < .001).

      In addition, we further tested whether these two metrics related to memory performance. In contrast to our item-level findings, we did not observe a significant relation between repetition suppression indices and memory (p = .83). We did observe an effect of frequency distance on memory performance. Specifically, we observed significant interactions between frequency distance and age (p = .014) and frequency distance and quadratic age (p = .021) on memory difference scores, such that the influence of frequency distance on memory difference scores increased with increasing age from childhood to adolescence.

      We next examined how mean repetition suppression indices and frequency distances related to differential neural activation during encoding of high- and low-value pairs. In line with our memory findings, we did not observe any significant relations between mean repetition suppression indices and neural activation in the caudate or prefrontal cortex during encoding (ps > .15).

      Frequency distance did not relate to caudate activation during encoding nor did we observe a frequency distance x age interaction effect (ps > .16). Frequency distance did, however, relate to differential PFC activation during encoding of high- vs. low-value pairs. Specifically, we observed a main effect of frequency distance on PFC activation (p = .0012), such that participants whose explicit reports of item frequency, were on average, more distinct across frequency conditions, demonstrated increased PFC activation during encoding of pairs involving high- vs. low-frequency items. Interestingly, when we included frequency distance in our model, we no longer observed a significant effect of age on differential PFC activation, nor did we observe a significant frequency distance x age interaction (ps > .13). These findings suggest that PFC activation during encoding may have, in part, reflected participants’ beliefs about the structure of the environment, with participants demonstrating stronger differential engagement of control processes across conditions when their representations of the conditions themselves were more distinct.

      Finally, we examined how age, frequency distance, and PFC activation related to memory difference scores. Here, even when controlling for both frequency distance and PFC activation, we continued to observe main effects of age and quadratic age on memory difference scores (linear age: p = .006; quadratic age: p = .001). In line with our analysis of the relation between frequency reports and memory, these results suggest that age-related variance in value-guided memory may depend on both knowledge of the structure of the environment and use of that knowledge to effectively control encoding.

      We have now added these results to our manuscript on p. 13 - 14. We write:

      “Given the relations we observed between memory and both repetition suppression and frequency reports, we examined whether they related to neural activation in both our caudate and PFC ROI during encoding. To do so, we computed each participant’s average repetition suppression index, and their “frequency distance” — or the average difference in their explicit reports for items in the high- and low-frequency conditions. We expected that participants with greater average repetition suppression indices and greater frequency distances represented the high- and low-frequency items as more distinct from one another and therefore would show greater differences in neural activation at encoding across frequency conditions. In line with our prior analyses, both metrics varied with age (though repetition suppression only marginally (linear age: p = .067; quadratic age: p = .042); Appendix 3 y Tables 22 and 25), suggesting that older participants demonstrated better learning of the structure of the environment. We ran linear regressions examining the relations between each metric, age, and their interaction on neural activation in both the caudate and PFC. We observed no significant effects or interactions of average repetition suppression indices on neural activation (ps > .15; Appendix 3 Tables 23 and 24). We did, however, observe a significant effect of frequency distance on PFC activation (β = .42, SE = .12, p = .0012), such that participants who believed that average frequencies of the high- and low-frequency items were further apart also demonstrated greater PFC activation during encoding of pairs with high- vs. low-frequency items. Here, we did not observe a significant effect of age on PFC activation (β = -.03, SE = .13, p = .82), suggesting that age-related variance in PFC activation may be related to age differences in explicit frequency beliefs. Importantly, however, even when we accounted for both PFC activation and frequency distances, we continued to observe an effect of age on memory difference scores (β = .56, SE = .20, p = .006), which, together with our prior analyses, suggest that developmental differences in value-guided memory are not driven solely by age differences in beliefs about the structure of the environment but also depend on the use of those beliefs to guide encoding.”

      We have added the full model results to Appendix 3: Full Model Specification and Results.

      Given these results, we have now revised our interpretation of our neural data. Our memory analyses demonstrate that across our age range, we observed age-related differences in both the acquisition of knowledge of the structure of the environment and in its use. Originally, we interpreted the PFC activation as reflecting the use of learned value to guide memory. However, the strong relation we found between frequency distance and PFC activation suggests that the age differences in PFC activation that we observed may also be related to age differences in knowledge of the structure of the environment that governs when control processes should be engaged most strongly. However, these results must be interpreted cautiously. Participants provided explicit frequency reports after they completed the encoding and retrieval tasks, and so explicit frequency reports may have been influenced not only by participants’ memories of online frequency learning, but also by the strength with which they encoded the item and its paired associate, and the experience of successfully retrieving it.

      We have now revised our discussion to consider these results. On p. 23, we now write,

      “Our neural results further suggest that developmental differences in memory were driven by both knowledge of the structure of the environment and use of that knowledge to guide encoding.”

      On p. 24, we write,

      “The development of adaptive memory requires not only the implementation of encoding and retrieval strategies, but also the flexibility to up- or down-regulate the engagement of control in response to momentary fluctuations in information value (Castel et al., 2007, 2013; Hennessee et al., 2017). Importantly, value-based modulation of lateral PFC engagement during encoding mediated the relation between age and memory selectivity, suggesting that developmental change in both the representation of learned value and value-guided cognitive control may underpin the emergence of adaptive memory prioritization. Prior work examining other neurocognitive processes, including response inhibition (Insel et al., 2017) and selective attention (Störmer et al., 2014), has similarly found that increases in the flexible upregulation of control in response to value cues enhance goal-directed behavior across development (Davidow et al., 2018), and may depend on the engagement of both striatal and prefrontal circuitry (Hallquist et al., 2018; Insel et al., 2017). Here, we extend these past findings to the domain of memory, demonstrating that value signals derived from the structure of the environment increasingly elicit prefrontal cortex engagement and strengthen goal-directed encoding across childhood and into adolescence.”

      And on p. 25, we have added an additional paragraph:

      “Further, we also demonstrate that in the absence of explicit value cues, the engagement of prefrontal control processes may reflect beliefs about information value that are learned through experience. Here, we found that differential PFC activation during encoding of high- vs. low-value information reflected individual and age-related differences in beliefs about the structure of the environment; participants who represented the average frequencies of the low- and high-frequency items as further apart also demonstrated greater value-based modulation of lateral PFC activation. It is important to note, however, that we collected explicit frequency reports after associative encoding and retrieval. Thus the relation between PFC activation and explicit frequency reports may be bidirectional — while participants may have increased the recruitment of cognitive control processes to better encode information they believed was more valuable, the engagement of more elaborative or deeper encoding strategies that led to stronger memory traces may have also increased participants’ subjective sense of an item’s frequency (Jonides & Naveh-Benjamin, 1987).”

      Third, more discussion is warranted on the nature of age-related changes given that some findings followed quadratic functions and others showed linear. Further interpretation of the quadratic versus linear fits would provide greater insight into the relative rates of maturation across discrete neurobehavioral processes.

      We agree with the reviewer that more discussion is warranted here. While many cognitive processes tend to improve with increasing age, the significant interaction between quadratic age and frequency condition on memory accuracy could reflect a number of different patterns of developmental variance. Because quadratic curves are U-shaped, the significant interaction between quadratic age and frequency condition could reflect a peak in value-guided memory in adolescence. However, the combination of linear and quadratic effects can also capture “plateauing” effects, where the influence of age on a particular cognitive process decreases at a particular developmental timepoint. To determine how to interpret the quadratic effect of age on value-guided memory — and specifically, to test for the presence of an adolescent peak — we ran an additional analysis.

      To test for an adolescent peak in value-guided memory, we first fit our memory accuracy model without any age terms, and then extracted the random slope across frequency conditions for each subject. We then conducted a ‘two lines test’ (Simonsohn, 2018) to examine the relation between age and these random slopes. In brief, the two-lines test fits the data with two linear models — one with a positive slope and one with a negative slope, algorithmically determining the breakpoint in the estimates where the signs of the slopes change. When we analyzed our memory data in this way, we found a robust, positive relation between age and value-guided memory (see newly added Appendix 2 Figure 3, also below) from childhood to mid- adolescence, that peaked around age 16 (age 15.86). From age ~16 to early adulthood, however, we observed only a marginal negative relation between age and value-guided memory (p = .0567). Thus, our findings do not offer strong evidence in support of an adolescent peak in value-guided memory — instead, they suggest that improvements in value-guided memory are strongest from childhood to adolescence.

      Appendix 2 - Figure 3. Results from the two-lines test (Simonsohn, 2018) revealed that the influence of frequency condition on memory accuracy increased throughout childhood and early adolescence, and did not significantly decrease from adolescence into early adulthood.

      To more clearly demonstrate the relation between age and value-guided memory, we have now included the results of the two-lines test in the results section of our main text. On p. 12 - 13, we write:

      “In line with our hypothesis, we observed a main effect of frequency condition on memory, χ2(1) = 21.51, p <.001, indicating that individuals used naturalistic value signals to prioritize memory for high-value information. Critically, this effect interacted with both linear age (χ2(1) = 11.03, p < .001) and quadratic age (χ2(1) = 9.51, p = .002), such that the influence of frequency condition on memory increased to the greatest extent throughout childhood and early adolescence. To determine whether the interaction between quadratic age and frequency condition on memory accuracy reflected an adolescent peak in value-guided memory prioritization, we re-ran our memory accuracy model without including any age terms, and extracted each participant’s random slope across frequency conditions. We then submitted these random slopes to the “two-lines” test (Simonsohn, 2018), which fits two regression lines with oppositely signed slopes to the data, algorithmically determining where the sign flip should occur. The results of this analysis revealed that the influence of frequency condition on memory significantly increased from age 8 to age 15.86 (b = .03, z = 2.71, p = .0068; Appendix 2 – Figure 3), but only marginally decreased from age 15.86 to age 25 (b = -.02, z = 1.91, p = .0576). Thus, the interaction between frequency condition and quadratic age on memory performance suggests that the biggest age differences in value-guided memory occurred through childhood and early adolescence, with older adolescents and adults performing similarly.”

      That said, this developmental trajectory is likely specific to the particular demands of our task. In our previous behavioral study that used a very similar paradigm (Nussenbaum, Prentis, & Hartley, 2018), we observed only a linear relation between age and value-guided memory.

      Although the task used in our behavioral study was largely similar to the task we employed here, there were subtle differences in the design that may have extended the age range through which we observed improvements in memory prioritization. In particular, in our previous behavioral study, the memory test required participants to select the correct associate from a grid of 20 options (i.e., 1 correct and 19 incorrect options), whereas here, participants had to select the correct associate from a grid of 4 options (1 correct and 3 incorrect options). In our prior work, the need to differentiate the ‘correct’ option from many more foils may have increased the demands on either (or both) memory encoding or memory retrieval, requiring participants to encode and retrieve more specific representations that would be less confusable with other memory representations. By decreasing the task demands in the present study, we may have shifted the developmental curve we observed toward earlier developmental timepoints.

      We originally did not emphasize our quadratic findings in the discussion of our manuscript because, given the marginal decrease in memory selectivity we observed from age 16 to age 25 and the different age-related findings across our two studies, we did not want to make strong claims about the specific shape of developmental change. However, we agree with the reviewer that these points are worthy of discussion within the manuscript. We have now amended our discussion on p. 25 accordingly:

      “We found that memory prioritization varied with quadratic age, and our follow-up tests probing the quadratic age effect did not reveal evidence for significant age-related change in memory prioritization between late adolescence and early adulthood. However, in our prior behavioral work using a very similar paradigm (Nussenbaum et al., 2020), we found that memory prioritization varied with linear age only. In line with theoretical proposals (Davidow et al., 2018), subtle differences in the control demands between the two tasks (e.g., reducing the number of ‘foils’ presented on each trial of the memory test here relative to our prior study), may have shifted the age range across which we observed differences in behavior, with the more demanding variant of our task showing more linear age-related improvements into early adulthood. In addition, the specific control demands of our task may have also influenced the age at which value- guided memory emerged. Future studies should test whether younger children can modulate encoding based on the value of information if the mnemonic demands of the task are simpler.”

      We thank the reviewer for this helpful suggestion, and believe our additions that expand on the quadratic age effects help clarify our developmental findings.

      Although hippocamapal and PHC results did not show a main effect of value, it seems by the introduction that this region would be critical for the processes under study. I would suggest including these regions as ROIs of interest guiding age-related differences during the memory encoding and retrieval phases. Even reporting negative findings for these regions would be helpful to readers, especially given the speculation of the negative findings in the discussion.

      Thank you for this suggestion. We have now examined how differential neural activation within the hippocampus and parahippocampal cortex during encoding of high- vs. low-value information varies with age. To do so, we followed the same approach as with our PFC and caudate ROI analyses. Specifically, we first identified the voxel within both the hippocampus and parahippocampal cortex with the highest z-statistic from our group-level 5 > 1 encoding contrast. We then drew a 5-mm sphere around these voxels and examined how mean beta weights within these spheres varied with age.

      We did not observe any relation between differential hippocampal or parahippocampal cortex activation during encoding of high- vs. low-value information and age (ps > .50). We agree with the reviewer that these results are informative, and have now added them to Appendix 2: Supplementary Analyses, which we refer to in the main text (p. 15). In Appendix 2, we write:

      “Hippocampal and parahippocampal cortex activation during encoding A priori, we expected that regions in the medial temporal lobe that have been linked to successful memory formation, including the hippocampus and parahippocampal cortex (Davachi, 2006), may be differentially engaged during encoding of high- vs. low- value information. Further, we hypothesized that the differential engagement of these regions across age may contribute to age differences in value-guided memory. Though we did not see any significant clusters of activation in the hippocampus or parahippocampal cortex in our group level high value vs. low value encoding contrast, we conducted additional ROI analyses to test these hypotheses. As with our other ROI analyses, we first identified the peak voxel (based on its z-statistic; hippocampus: x = 24, y = 34, z = 23; parahippocampal cortex: x = 22, y = 41, z = 16) in each region from our group-level contrast, and then drew 5-mm spheres around them. We then examined how average parameter estimates within these spheres related to both age and memory difference scores.

      First, we ran a linear regression modeling the effects of age, WASI scores, and their interaction on hippocampal activation. We did not observe a main effect of age on hippocampal activation, (β = .00, SE = .10, p > .99). We did, however, observe a significant age x WASI score interaction effect (β = .30, SE = .10, p = .003). Next, we conducted another linear regression to examine the effects of hippocampal activation, age, WASI scores, and their interaction on memory difference scores. In contrast to our prefrontal cortex activation results, activation in the hippocampus did not relate to memory difference scores, (β = -.02, SE = .03, p = .50).

      We repeated these analyses with our parahippocampal cortex sphere. Here, we did not observe any significant effects of age on parahippocampal activation (β = -.07, SE = .11, p = .50), nor did we observe any effects of parahippocampal activation on memory difference scores (β = .01, SE = .03, p = .25).”

      Reviewer #3:

      This paper investigated age differences in the neurocognitive mechanisms of value-based memory encoding and retrieval across children, adolescents and young adults. It used a novel experimental paradigm in combination with fMRI to disentangle age differences in determining the value of information based on its frequency from the usage of these learned value signals to guide memory encoding. During value learning, younger participants demonstrated a stronger effect of item repetition on response accuracy, whereas repetition suppression effects in a parahippocampal ROI were strongest in adults. Item frequency modulated memory accuracy such that associative memory was better for previously high-frequency value items. Notably, this effect increased with age. Differences in memory accuracy between low- and high-frequency items were associated with left lateral PFC activation which also increased with age. Accordingly, a mediation analyses revealed that PFC activation mediated the relation between age and memory benefit for high- vs. low-frequency items. Finally, both participants' representations of item frequency (which were more likely to deviate in younger children) and repetition suppression in the parahippocampal ROI were associated with higher memory accuracy. Together, these results data add to the still scarce literature examining how information value influences memory processes across development.

      Overall, the conclusions of the paper are well supported by the data, but some aspects of the data analysis need to be clarified and extended.

      Empirical findings directly comparing cross-sectional and longitudinal effects have demonstrated that cross-sectional analyses of age differences do not readily generalize to longitudinal research (e.g., Raz et al., 2005; Raz & Lindenberger, 2012). Formal analyses have demonstrated that proportion of explained age-related variance in cross-sectional mediation models may stem from various factors, including similar mean age trends, within-time correlations between a mediator and an outcome, or both (Lindenberger et al., 2011; see also Hofer, Flaherty, & Hoffman, 2006; Maxwell & Cole, 2007). Thus, the results of the mediation analysis showing that PFC activation explains age-related variance in memory difference scores, cannot be taken to imply that changes in PFC activation are correlated with changes in value-guided memory. While the general limitations of a cross-sectional study are noted in the Discussion of the manuscript, it would be important to discuss the critical limitations of the mediation analysis. While the main conclusions of the paper do not critically depend on this analysis, it would be important to alert the reader to the limited information value in performing cross-sectional mediation analyses of age variance.

      Thank you for raising this critical point. We have expanded our discussion to specifically note the limitations of our mediation analysis and to more strongly emphasize the need for future longitudinal studies to reveal how changes in neural circuitry may support the emergence of motivated memory across development. Specifically, on p. 26, we now write:

      “One important caveat is that our study was cross-sectional — it will be important to replicate our findings in a longitudinal sample to more directly measure how developmental changes in cognitive control within an individual contribute to changes in their ability to selectively encode useful information. Our mediation results, in particular, must be interpreted with caution as simulations have demonstrated that in cross-sectional samples, variables can emerge as significant mediators of age-related change due largely to statistical artifact (Hofer, Flaherty, & Hoffman, 2006; Lindenberger et al., 2011). Indeed, our finding that PFC activation mediates the relation between age and value-guided memory does not necessarily imply that within an individual, PFC development leads to improvements in memory selectivity. Longitudinal work in which individuals’ neural activity and memory performance is sampled densely within developmental windows of interest is needed to elucidate the complex relations between age, brain development, and behavior (Hofer, Flaherty, & Hoffman, 2006; Lindenberger et al., 2011).”

      It would be helpful to provide more information on how chance memory performance was handled during data analysis, especially as it is more likely to occur in younger participants. Related to this, please connect the points that belong to the same individual in Figure 3 to facilitate evaluation of individual differences in the memory difference scores.

      Thank you for raising this important point. On each memory test trial, participants viewed the item (either a postcard or picture) above images of four possible paired associates (see Figure 1 on p. 6). On each memory test trial, participants had 6 seconds to select one of these items. If participants did not make a response within 6 seconds, that trial was considered ‘missed.’ Missed trials were excluded from behavioral analyses and regressed out in neural analyses. If participants selected the correct associate, memory accuracy was coded as ‘1;’ if they selected an incorrect associate, accuracy was coded as ‘0.’ On each trial, there was 1 correct option and 3 incorrect options. As such, chance-level memory performance was 25%. We have now clarified this on p. 34 and included a dashed line indicating chance-level performance within Fig. 4 (formerly Figure 3) on p. 12. In addition, we have also updated Figure 4 (see below) to connect the points belonging to the same participants, as suggested by the reviewer.

      Figure 4. Participants demonstrated prioritization of memory for high-value information, as indicated by higher memory accuracy for associations involving items in the five- relative to the one-frequency condition (χ2(1) = 19.73, p <.001). The effects of item frequency on associative memory increased throughout childhood and into adolescence (linear age x frequency condition: χ2(1) = 10.74, p = .001; quadratic age x frequency condition: χ2(1) = 9.27, p = .002).

      Out of 90 participants, 2 children performed at or below chance (<= 25% memory accuracy). Interpreting the behavior of the participants who responded to fewer than 12 out of 48 trials correctly is challenging. On the one hand, they might not have remembered anything and responded correctly on these trials due to randomly guessing. On the other hand, they may have implemented an encoding strategy of focusing only on a small number of pairs. Thus, a priori, based on the analysis approach we implemented in our prior, behavioral study (Nussenbaum et al., 2019), we decided to include all participants in our memory analyses, regardless of their overall accuracy. However, when we exclude these two participants from our memory analyses, our main findings still hold. Specifically, we continue to observe main effects of frequency condition and age, and interactions between frequency condition and both linear and quadratic age on associative memory accuracy (ps < .012).

      We have now clarified these details about chance-level performance in the methods section of our manuscript on p. 34.

      “For our memory analyses, trials were scored as ‘correct’ if the participant selected the correct association from the set of four possible options presented during the memory test, ‘incorrect’ if the participant selected an incorrect association, and ‘missed’ if the participant failed to respond within the 6-second response window. Missed trials were excluded from all analyses. Because participants had to select the correct association from four possible options, chance-level performance was 25%. Two child participants performed at or below chance-level on the memory test. They were included in all analyses reported in the manuscript; however, we report full details of the results of our memory analyses when we exclude these two participants in Appendix 3 (Table 15). Importantly, our main findings remain unchanged.”

      In Appendix 3, we include a table with the full results from our memory model without these two participants:

      Appendix Table 15: Associative memory accuracy by frequency condition (below chance subjects excluded)

      I would like to see some consideration of how the different signatures of value learning, repetition suppression and reported item frequency, are related to the observed PFC and caudate effects during memory encoding. Such a discussion would help the reader connect the findings on learning and using information value across development.

      Thank you for this valuable suggestion. We agree that it would be interesting to link frequency- learning behavior to neural activity at encoding. As such, we have now conducted additional analyses to explore these relations.

      In the original version of our manuscript, we examined behavior at the item level through mixed- effects models, and neural activation during encoding at the participant level. Thus, to examine the relation between frequency-learning metrics and neural activation at encoding, we created two additional participant-level metrics. For each participant we computed their average repetition suppression index, and a measure of frequency distance. The average repetition suppression index reflects the overall extent to which the participant demonstrated repetition suppression in response to the fifth presentation of the high-frequency items, and is computed by averaging each participant’s repetition suppression indices across items. We hypothesized that participants who demonstrated the greatest degree of repetition suppression might be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information. The frequency distance metric reflects the average distance between participants’ explicit frequency reports for items that appeared once and items that appeared five times, and is computed by averaging their explicit frequency reports for items in each frequency condition, and then subtracting the average reports in the low-frequency condition from those in the high- frequency condition. We hypothesized that participants with the largest frequency distances might similarly be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information.

      We first wanted to confirm that the relations we observed between repetition suppression, frequency reports, and age, could also be observed at the participant level. In line with our prior, behavioral analyses, we found that age related to both mean repetition suppression indices (marginally; linear age: p = .067; quadratic age: p = .042); and frequency distances (linear and quadratic age: ps < .001).

      In addition, we further tested whether these two metrics related to memory performance. In contrast to our item-level findings, we did not observe a significant relation between repetition suppression indices and memory (p = .83). We did observe an effect of frequency distance on memory performance. Specifically, we observed significant interactions between frequency distance and age (p = .014) and frequency distance and quadratic age (p = .021) on memory difference scores, such that the influence of frequency distance on memory difference scores increased with increasing age from childhood to adolescence.

      We next examined how mean repetition suppression indices and frequency distances related to differential neural activation during encoding of high- and low-value pairs. In line with our memory findings, we did not observe any significant relations between mean repetition suppression indices and neural activation in the caudate or prefrontal cortex during encoding (ps > .15).

      Frequency distance did not relate to caudate activation during encoding nor did we observe a frequency distance x age interaction effect (ps > .16). Frequency distance did, however, relate to differential PFC activation during encoding of high- vs. low-value pairs. Specifically, we observed a main effect of frequency distance on PFC activation (p = .0012), such that participants whose explicit reports of item frequency, were on average, more distinct across frequency conditions, demonstrated increased PFC activation during encoding of pairs involving high- vs. low-frequency items. Interestingly, when we included frequency distance in our model, we no longer observed a significant effect of age on differential PFC activation, nor did we observe a significant frequency distance x age interaction (ps > .13). These findings suggest that PFC activation during encoding may have, in part, reflected participants’ beliefs about the structure of the environment, with participants demonstrating stronger differential engagement of control processes across conditions when their representations of the conditions themselves were more distinct.

      Finally, we examined how age, frequency distance, and PFC activation related to memory difference scores. Here, even when controlling for both frequency distance and PFC activation, we continued to observe main effects of age and quadratic age on memory difference scores (linear age: p = .006; quadratic age: p = .001). In line with our analysis of the relation between frequency reports and memory, these results suggest that age-related variance in value-guided memory may depend on both knowledge of the structure of the environment and use of that knowledge to effectively control encoding.

      We have now added these results to our manuscript on p. 13 - 14. We write:

      “Given the relations we observed between memory and both repetition suppression and frequency reports, we examined whether they related to neural activation in both our caudate and PFC ROI during encoding. To do so, we computed each participant’s average repetition suppression index, and their “frequency distance” — or the average difference in their explicit reports for items in the high- and low-frequency conditions. We expected that participants with greater average repetition suppression indices and greater frequency distances represented the high- and low-frequency items as more distinct from one another and therefore would show greater differences in neural activation at encoding across frequency conditions. In line with our prior analyses, both metrics varied with age (though repetition suppression only marginally (linear age: p = .067; quadratic age: p = .042); Appendix 3 Tables 22 and 25), suggesting that older participants demonstrated better learning of the structure of the environment. We ran linear regressions examining the relations between each metric, age, and their interaction on neural activation in both the caudate and PFC. We observed no significant effects or interactions of average repetition suppression indices on neural activation (ps > .15; Appendix 3 Tables 23 and 24). We did, however, observe a significant effect of frequency distance on PFC activation (β = .42, SE = .12, p = .0012), such that participants who believed that average frequencies of the high- and low-frequency items were further apart also demonstrated greater PFC activation during encoding of pairs with high- vs. low-frequency items. Here, we did not observe a significant effect of age on PFC activation (β = -.03, SE = .13, p = .82), suggesting that age-related variance in PFC activation may be related to age differences in explicit frequency beliefs. Importantly, however, even when we accounted for both PFC activation and frequency distances, we continued to observe an effect of age on memory difference scores (β = .56, SE = .20, p = .006), which, together with our prior analyses, suggest that developmental differences in value-guided memory are not driven solely by age differences in beliefs about the structure of the environment but also depend on the use of those beliefs to guide encoding.”

      We have added the full model results to Appendix 3.

      Given these results, we have now revised our interpretation of our neural data. Our memory analyses demonstrate that across our age range, we observed age-related differences in both the acquisition of knowledge of the structure of the environment and in its use. Originally, we interpreted the PFC activation as reflecting the use of learned value to guide memory. However, the strong relation we found between frequency distance and PFC activation suggests that the age differences in PFC activation that we observed may also be related to age differences in knowledge of the structure of the environment that governs when control processes should be engaged most strongly. However, these results must be interpreted cautiously. Participants provided explicit frequency reports after they completed the encoding and retrieval tasks, and so explicit frequency reports may have been influenced not only by participants’ memories of online frequency learning, but also by the strength with which they encoded the item and its paired associate, and the experience of successfully retrieving it.

      We have now revised our discussion to consider these results. On p. 23, we now write,

      “Our neural results further suggest that developmental differences in memory were driven by both knowledge of the structure of the environment and use of that knowledge to guide encoding.”

      n p. 24, we write,

      “The development of adaptive memory requires not only the implementation of encoding and retrieval strategies, but also the flexibility to up- or down-regulate the engagement of control in response to momentary fluctuations in information value (Castel et al., 2007, 2013; Hennessee et al., 2017). Importantly, value-based modulation of lateral PFC engagement during encoding mediated the relation between age and memory selectivity, suggesting that developmental change in both the representation of learned value and value-guided cognitive control may underpin the emergence of adaptive memory prioritization. Prior work examining other neurocognitive processes, including response inhibition (Insel et al., 2017) and selective attention (Störmer et al., 2014), has similarly found that increases in the flexible upregulation of control in response to value cues enhance goal-directed behavior across development (Davidow et al., 2018), and may depend on the engagement of both striatal and prefrontal circuitry (Hallquist et al., 2018; Insel et al., 2017). Here, we extend these past findings to the domain of memory, demonstrating that value signals derived from the structure of the environment increasingly elicit prefrontal cortex engagement and strengthen goal-directed encoding across childhood and into adolescence.”

      And on p. 25, we have added an additional paragraph:

      “Further, we also demonstrate that in the absence of explicit value cues, the engagement of prefrontal control processes may reflect beliefs about information value that are learned through experience. Here, we found that differential PFC activation during encoding of high- vs. low-value information reflected individual and age-related differences in beliefs about the structure of the environment; participants who represented the average frequencies of the low- and high-frequency items as further apart also demonstrated greater value-based modulation of lateral PFC activation. It is important to note, however, that we collected explicit frequency reports after associative encoding and retrieval. Thus the relation between PFC activation and explicit frequency reports may be bidirectional — while participants may have increased the recruitment of cognitive control processes to better encode information they believed was more valuable, the engagement of more elaborative or deeper encoding strategies that led to stronger memory traces may have also increased participants’ subjective sense of an item’s frequency (Jonides & Naveh-Benjamin, 1987).”

      A point worthy of discussion are the implications of the finding that younger participants demonstrated greater deviations in their frequency reports for the development of value learning, given that frequency reports were found to predict associative memory accuracy.

      Thank you for raising this important point. Indeed, one of our main findings is that older participants are better both at learning the structure of their environments and also at using structured knowledge to strategically prioritize memory. In our original manuscript, we described results of a model that included participants’ explicit frequency reports as a predictor of memory. Model comparison revealed that participants’ frequency reports — which we interpret as reflecting their beliefs about the structure of the environment — predicted memory more strongly than the item’s true frequency. In other words, participants’ beliefs about the structure of the environment (even if incorrect) more strongly influenced their memory encoding than the true structure of the environment. Critically, however, frequency reports interacted with age to predict memory (Fig 8). Even when we accounted for age-related differences in knowledge of the structure of the environment, older participants demonstrated a stronger influence of frequency on memory, suggesting they were better able to use their beliefs to control subsequent associative encoding. We have now clarified our interpretation of this model in our discussion on p. 23:

      “Importantly, though we observed age-related differences in participants’ learning of the structure of their environment, the strengthening of the relation between frequency reports and associative memory with increasing age suggests that age differences in learning cannot fully account for age differences in value-guided memory. Even when accounting for individual differences in participants’ explicit knowledge of the structure of the environment, older participants demonstrated a stronger relation between their beliefs about item frequency and associative memory, suggesting that they used their beliefs to guide memory to a greater degree than younger participants.”

      As noted by the reviewer, however, our initial memory analysis did not account for age-related differences in participants’ initial, online learning of item frequency, and our neural analyses further did not account for age differences in explicit frequency reports. We have now run additional control analyses to account for the potential influence of individual differences in frequency learning on associative memory. Specifically, for each participant, we computed three metrics: 1.) their overall accuracy during frequency-learning, 2.) their overall accuracy for the last presentation of each item during frequency-learning (as suggested by Reviewer 2), and 3.) the mean magnitude of the error in their frequency reports. We then included these metrics as covariates in our memory analyses.

      When we include these control variables in our model, we continue to observe a robust effect of frequency condition (p < .001) as well as robust interactions between frequency condition and linear and quadratic age (ps < .003) on associative memory accuracy. We also observed a main effect of frequency error magnitude on memory accuracy (p < .001). Here, however, we no longer observe main effects of age or quadratic age on overall memory accuracy. Given the relation we observed between frequency error magnitudes and age, the results from this model suggests that there may be age-related improvements in overall memory that influence both memory for associations as well as learning of and memory for item frequencies. The fact that age no longer relates to overall memory when controlling for frequency error magnitudes suggest that age-related variance in memory for item frequencies and memory for associations are strongly related within individuals. Importantly, however, age-related variance in memory for item frequencies did not explain age-related variance in the influence of frequency condition on associative memory, suggesting that there are developmental differences in the use of knowledge of environmental structure to prioritize valuable information in memory that persist even when controlling for age-related differences in initial learning of environmental regularities. Given the importance of this analysis in elucidating the relation between the learning of environmental structure and value-guided memory, we have now updated the results in the main text of our manuscript to include them. Specifically, on p. 13, we now write:

      “Because we observed age-related differences in participants’ online learning of item frequencies and in their explicit frequency reports, we further examined whether these age differences in initial learning could account for the age differences we observed in associative memory. To do so, we ran an additional model in which we included each participant’s mean frequency learning accuracy, mean frequency learning accuracy on the last repetition of each item, and explicit report error magnitude as covariates. Here, explicit report error magnitude predicted overall memory performance, χ2(1) =13.05, p < .001, and we did not observe main effects of age or quadratic age on memory performance (ps > .20). However, we continued to observe a main effect of frequency condition, χ2(1) = 19.65 p < .001, as well as significant interactions between frequency condition and both linear age χ2(1) = 10.59, p = .001, and quadratic age χ2(1) = 9.15, p = .002. Thus, while age differences in initial learning related to overall memory performance, they did not account for age differences in the use of environmental regularities to strategically prioritize memory for valuable information.”

      In addition, as suggested by the reviewer, we also included the three covariates as control variables in our mediation analysis. When controlling for online frequency learning and explicit frequency report errors, PFC activity continued to mediate the relation between age and memory difference scores. We have now included these results on p. 16 - 17 of the main text:

      “Further, when we included quadratic age, WASI scores, online frequency learning accuracy, online frequency learning accuracy on the final repetition of each item, and mean explicit frequency report error magnitudes as control variables in the mediation analysis, PFC activation continued to mediate the relation between linear age and memory difference scores (standardized indirect effect: .56, 95% confidence interval: [.06, 1.35], p = .023; standardized direct effect; 1.75, 95% confidence interval: [.12, .3.38], p = .034).”

      We also refer to these analyses when we interpret our findings in our discussion. On p. 23, we write:

      “In addition, we continued to observe a robust interaction between age and frequency condition on associative memory, even when controlling for age-related change in the accuracy of both online frequency learning and explicit frequency reports. Thus, though we observed age differences in the learning of environmental regularities and in their influence on subsequent associative memory encoding, our developmental memory effects cannot be fully explained by differences in initial learning.”

      We thank the reviewer for this constructive suggestion, as we believe these control analyses strengthen our interpretation of age differences in both the learning and use of environmental regularities to prioritize memory.

    1. eLife Assessment

      During the development of the unicellular eukaryote Dictyostelium discoideum, cells aggregate into mounds, forming protrusions or tips, which then become the front of migrating slugs and the top of fruiting bodies. This valuable study identifies adenosine deaminase-related growth factor (ADGF) as a key regulator of tip formation and convincingly shows that ADGF catalyses the conversion of adenosine to ammonia, allowing ammonia to initiate tip formation, and then elucidates pathways upstream and downstream of ADGF. The authors discuss the intriguing possibility that mammalian ADGF may also similarly regulate development.

    2. Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the adgf gene aggregate but do not form tips. A remarkable result, shown by several different ways, is that the adgf mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the adgf mutant such as increased mound size, altered cAMP signaling, and abnormal cell type differentiation. It appears that the adgf mutant has defects the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signaling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Comments on previous version:

      Looks better, but I think you answered my questions (listed as weaknesses in the public review) in the reply to the reviewer but not in the paper. I'd suggest carefully thinking about my questions and addressing them in the Discussion (The authors have now done this).

    3. Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (adgf), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The adgf null mutant has a pre-tip mound arrest phenotype, which can be rescued by external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signaling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an adgf mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterisation of significant changes in cAMP signaling components, suggesting low cAMP signaling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell-type differentiation towards prestalk fate

      Comments on previous version:

      The revised version of the paper has improved significantly in terms of structure and clarity. The additional data on rescue of total cAMP production by ammonia (Fig. 7C) in the adgf- mutant and the 5-fold increased prespore expression of adgf RNA compared to prestalk cells (Fig 9) are useful data additions.

      The link between changes in cAMP signaling (lower aca expression) and wave geometry (concentric waves rather than spiral waves) remains speculative.

      I noted that Fig 6 contains different images than the previous version (Fig 7).

      The statement "Interestingly, Klebsiella pneumoniae physically separated from the Dictyostelium adgf mutants in a partitioned dish, also rescues the mound arrest phenotype suggesting a cross-kingdom interaction that drives development" in the summary is rather overdone. All experiments were performed with axenic strains (no bacteria).

      as is the sentence "Remarkably, in higher vertebrates, adgf expression is elevated during gastrulation and thus adenosine deamination may be a conserved process driving organizer development in different organisms"

      The data supporting this in the supplementary information is hardly legible and poorly presented. What is shown is ADA expression in different tissues, not at different stages. I would suggest taking these figures out and concentrating the summary on the key mechanistic findings of the paper. (The authors have now done this.)

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      (1) Weaknesses: The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Ammonia can come from a variety of sources both within and outside the cells and this can be from dead cells also. Ammonia by increasing cAMP levels, trigger collective cell movement thereby establishing a tip in Dictyostelium. A gaseous signal can act over long distances in a short time and for instance ammonia promotes synchronous development in a colony of yeast cells (Palkova et al., 1997; Palkova and Forstova, 2000). The slug tip is known to release ammonia probably favouring synchronized development of the entire colony of Dictyostelium. However, after the tips are established ammonia exerts negative chemotaxis probably helping the slugs to move away from each other ensuring equal spacing of the fruiting bodies (Feit and Sollitto, 1987).

      It is well known that ammonia serves as a signalling molecule influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). Ammonia by raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, is known to increase the speed of chemotaxing amoebae (Siegert and Weijer, 1989; Van Duijn and Inouye, 1991), inducing collective cell movement (Bonner et al., 1988, 1989), favoring tipped mound development.

      Ammonia produced in millimolar concentrations during tip formation (Schindler and Sussman, 1977) could ward off other predators in soil. For instance, ammonia released by Streptomyces symbionts of leaf-cutting ants is known to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled back into amino acids, as observed during breast cancer proliferation (Spinelli et al., 2017). Such a process may also occur in starving Dictyostelium cells, supporting survival and differentiation. These findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development.

      (2) The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia reinforces or maintains the positional information by elevating cAMP levels, favoring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). In adgf mutants that have low ammonia levels, both neutral red staining (a marker for prestalk and ALCs) (Figure. S3) and the prestalk marker ecmA/ ecmB expression (Figure. 7D) are higher than the WT and the mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.

      Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement during tip formation (Bonner et al., 1989).

      (3) By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      Exposure of adgf mounds to ammonia, led to tip development within 4 h (Figure. 5). In contrast, adgf controls remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not the trigger for tip development and ammonia promotes the transition from mound to tipped mound formation.

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Further, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Figure. S4A), and they continue to stay as mounds.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Figure. 2H. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) and suggests that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      (1) Weaknesses: Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      adgf expression was examined at 0, 8, 12, and 16 h (Figure. 1), and the total ADA activity was assayed at 12 and 16 h (Figure. 3). Previously, the 12 h data was not included, and it’s been added now (Figure. 3A). The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. Since the ADA assay will also report the activity of other three isoforms, it will not exclusively reflect ADGF activity.

      Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) suggesting that WT adgf favours prespore differentiation. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (2) The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The adgf mutant in comparison to WT has diminished acaA expression (Fig. 6B) and reduced cAMP levels (Fig. 6A) both at 12 and 16 h of development. The cAMP levels were measured at 8 h and 12 h in the mutant.

      We would like to add that ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) in Dictyostelium. Exposure to ammonia increases acaA expression in WT (Figure. 7B) and is likely to increase acaA expression/ cAMP levels in the mutant also (Riley and Barclay, 1990; Feit et al., 2001) thereby rescuing the defects in cAMP signalling. Based on the comments, cAMP levels will also be measured in the mutant after the rescue with ammonia.

      (3) No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      cAMP levels will be quantified in the dhkD mutant after treatment with ammonia. The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation.

      Reviewer #1 (Recommendations for the authors):

      (1) Lines: 47,48 - "The gradient of these morphogens along the slug axis determines the cell fate, either as prestalk (pst) or as prespore (psp) cells." - many workers have shown that this is not true - intrinsic factors such as cell cycle phase drive cell fate.

      Thank you for pointing this out. We have removed the line and rephrased as “Based on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate as prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011).

      (2) Line 48 - PKA - please explain acronyms at first use.

      Corrected

      (3) Line 56 - The relationship between adenosine deaminase and ADGF is a bit unclear, please clarify this more.

      Adenosine deaminase (ADA) is intracellular, whereas adenosine deaminase related growth factor (ADGF) is an extracellular ADA and has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008).

      (4) Figure 1 - where are these primers, and the bsr cassette, located with respect to the coding region start and stop sites?

      The primer sequences are mentioned in the supplementary table S2. The figure legend is updated to provide a detailed description.

      (5) Line 104 - 37.47% may be too many significant figures.

      Corrected

      (6) Line 123 - 1.003 Å may be too many significant figures.

      Corrected

      (7) Line 128 - Since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected

      (8) Figure 3G - did the DCF also increase mound size? It sort of looks like it did.

      Yes, the addition of DCF increases the mound size (now Figure. 2G).

      (9) Figure 3I - the spore mass shown here for ADGF - looks like there are 3 stalks protruding from it; this can happen if a plate is handled roughly and the spore masses bang into each other and then merge

      Thank you for pointing this out. The figure 3I (now Figure. 2I) is replaced.

      (10) Lines 160-162 - since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected.

      (11) Line 165 - ' ... that are involved in adenosine formation' needs a reference.

      Reference is included.

      (12) Line 205 - 'Addition of ADA to the CM of the mutant in one compartment.' - might clarify that the mutant is the ADGF mutant

      Yes, revised to 'Addition of ADA to the CM of the adgf mutant in one compartment.'

      (13) Lines 222-223 need a reference for caffeine acting as an adenosine antagonist.

      Reference is included.

      (14) Figure 8B - left - use a 0-4 or so scale so the bars are more visible.

      Thank you for the suggestion. The scale of the y-axis is adjusted to 0-4 in Figure. 7B to enhance the visibility of the bars.

      Reviewer #2 (Recommendations for the authors):

      The paper describes new insights into the role of ADGF, an enzyme that catalyses the breakdown of adenosine in ammonia and inosine, in tip formation in Dictyostelium development.

      A knockout of the gene results in a tipless mound stage arrest and the mounds formed are somewhat larger in size. Synergy experiments show that the effect of the mutation is non-cell autonomous and further experiments show that the mound arrest phenotype can be rescued by the provision of ammonia vapour. These observations are well documented. Furthermore, the paper contains a wide variety of experiments attempting to place the observed effects in known signalling pathways. It is suggested that ADGF may function downstream of DhkD, a histidine kinase previously implicated in ammonia signalling. Ammonia has long been described to affect different aspects, including differentiation of slug and culmination stages of Dictyostelium development, possibly through modulating cAMP signalling, but the exact mechanisms of action have not yet been resolved. The experiments reported here to resolve the mechanistic basis of the mutant phenotype need focusing and further work.

      (1) The paper needs streamlining and editing to concentrate on the main findings and implications.

      The manuscript will be revised extensively.

      Below is a list of some more specific comments and suggestions.

      (2) Introduction: Focus on what is relevant to understanding tip formation and the role of nucleotide metabolism and ammonia (see https://doi.org/10.1016/j.gde.2016.05.014).leading). This could lead to the rationale for investigating ADGF.

      The manuscript will be revised extensively

      (3) Lines 36-38 are not relevant. Lines 55-63 need shortening and to focus on ADGF, cellular localization, and substrate specificity.

      The manuscript will be revised accordingly. Lines 36-38 will be removed, and the lines 55-63 will be shortened.

      In humans, two isoforms of ADA are known including ADA1 and ADA2, and the Dictyostelium homolog of ADA2 is adenosine deaminase-related growth factor (ADGF). Unlike ADA that is intracellular, ADGF is extracellular and also has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008). Loss-of-function mutations in ada2 are linked to lymphopenia, severe combined immunodeficiency (SCID) (Gaspar, 2010), and vascular inflammation due to accumulation of toxic metabolites like dATP (Notarangelo, 2016; Zhou et al., 2014).

      (4) Results: This section would benefit from better streamlining by a separation of results that provide more mechanistic insight from more peripheral observations.

      The manuscript will be revised and the peripheral observations (Figure. 2) will be shifted to the supplementary information.

      (5) Line 84 needs to start with a description of the goal, to produce a knockout.

      Details on the knockout will be elaborated in the revised manuscript. Line number 84 (now 75). Dictyostelium cell lines carrying mutations in the gene adgf were obtained from the genome wide Dictyostelium insertion (GWDI) bank and were subjected to further analysis to know the role of adgf during Dictyostelium development.

      (6) Knockout data (Figure 1) can be simplified and combined with a description of the expression profile and phenotype Figure 3 F, G, and Figure 5. Higher magnification and better resolution photographs of the mutants would be desirable.

      Thank you, as suggested the data will be simplified (section E will be removed) and combined with a description of the expression profile and, the phenotype images of Figure 3 F, G, and Figure 5 ( now Figure. 2 F, G, and Figure. 4) will be replaced with better images/ resolution.

      (7) It would also be relevant to know which cells actually express ADGF during development, using in-situ hybridisation or promoter-reporter constructs.

      To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (8) Figure 2 - Information is less directly relevant to the topic of the paper and can be omitted (or possibly in Supplementary Materials).

      Figure. 2 will be moved to supplementary materials.

      (9) Figures 4A, B - It is shown that as could be expected ada activity is somewhat reduced and adenosine levels are slightly elevated. However, the fact that ada levels are low at 16hrs could just imply that differentiation of the ADGF- cells is blocked/delayed at an earlier time point. To interpret these data, it would be necessary to see an ada activity and adenosine time course comparison of wt and mutant, or to see that expression is regulated in a celltype specific manner that could explain this (see above). It would be good to combine this with the observation that ammonia levels are lower in the ADGF- mutant than wildtype and that the mutant phenotype, mound arrest can be rescued by an external supply of ammonia (Figure 6).

      In Dictyostelium four isoforms of ADA including ADGF are present, and thus the time course of total ADA activity will also report the function of other isoforms. Further, a number of pathways, generate adenosine (Dunwiddie et al., 1997; Boison and Yegutkin, 2019). ADGF expression was examined at 0, 8, 12 and 16 h (Fig 1) and the ADA activity was assayed at 12 h, the time point where the expression gradually increases and reaches a peak at 16 h. Earlier, we have not shown the 12 h activity data which will be included in the revised version. ADGF expression was found to be highly elevated at 16 h and adenosine/ammonia levels were measured at the two points indicated in the mutant.

      (10) Panel 4C could be combined with other measurements trying to arrive at more insight in the mechanisms by which ammonia controls tip formation.

      Panel 4C (now 3C) illustrates the genes involved in the conversion of cAMP to adenosine. Since Figure. 3 focuses on adenosine levels and ADA activity in both WT and adgf mutants, we have retained Panel 3C in Figure. 3, for its relevance to the experiment.

      (11) There is a large variety of experiments attempting to link the mutant phenotype and its rescue by ammonia to cAMP signalling, however, the data do not yet provide a clear answer.

      It is well known that ammonia increases cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) and adenylate cyclase activity (Cotter et al., 1999) in D. discoideum, and exposure to ammonia increases acaA expression (Fig 7B) suggesting that ammonia regulates cAMP signaling. To address the concerns, cAMP levels will be quantified in the mutant after ammonia treatment.

      (12) The mutant is shown to have lower cAMP levels at the mound stage which ties in with low levels of acaA expression (Figures 7A and B), also various phosphodiesterases, the extracellular phosphodiesterase pdsa and the intracellular phosphodiesterase regA show increased expression. Suggesting a functional role for cAMP signalling is that the addition of di cGMP, a known activator of acaA, can also rescue the mound phenotype (Figure 7E). There appears to be a partial rescue of the mound arrest phenotype level by the addition of 8Br-cAMP (fig 7C), suggesting that intracellular cAMP levels rather than extracellular cAMP signalling can rescue some of the defects in the ADGF- mutant. Better images and a time course would be helpful.

      The relevant images will be replaced and a developmental time course after 8-Br-cAMP treatment will be included in the revised manuscript (Figure. 6D).

      (13) There is also the somewhat surprising observation that low levels of caffeine, an inhibitor of acaA activation also rescues the phenotype (Figure 7F).

      With respect to caffeine action on cAMP levels, the reports are contradictory. Caffeine has been reported to increase adenylate cyclase expression thereby increasing cAMP levels (Hagmann, 1986) whereas Alvarez-Curto et al., (2007) found that caffeine reduced intracellular cAMP levels in Dictyostelium. Caffeine, although is a known inhibitor of ACA, is also known to inhibit PDEs (Nehlig et al., 1992; Rosenfeld et al., 2014). Therefore, if caffeine differentially affects ADA and PDE activity, it may potentially counterbalance the effects and rescue the phenotype.

      (14) The data attempting to asses cAMP wave propagation in mounds (Fig 7H) are of low quality and inconclusive in the absence of further analysis. It remains unresolved how this links to the rescue of the ADGF- phenotype by ammonia. There are no experiments that measure any of the effects in the mutant stimulated with ammonia or di-cGMP.

      The relevant images will be replaced (now Figure. 6H). Ammonia by increasing acaA expression (Figure. 7B), and cAMP levels (Figure. 7C) may restore spiral wave propagation, thereby rescuing the mutant.

      (15) A possible way forward could also come from the observation that ammonia can rescue the wobbling mound arrest phenotype from the histidine kinase mutant dhkD null mutant, which has regA as its direct target, linking ammonia and cAMP signalling. This is in line with other work that had suggested that another histidine kinase, dhkC transduces an ammonia signal sensor to regA activation. A dhkC null mutant was reported to have a rapid development phenotype and skip slug migration (Dev. Biol. (1998) 203, 345). There is no direct evidence to show that dhkD acts upstream of ADGF and changes in cAMP signalling, for instance, measurements of changes in ADA activity in the mutant.

      cAMP levels will be quantified in the dhkD mutant after ammonia treatment and accordingly, the results will be revised.

      (16) The paper makes several further observations on the mutant. After 16 hrs of development the adgf- mutant shows increased expression of the prestalk cell markers ecmA and ecmB and reduced expression of the prespore marker pspA. In synergy experiments with a majority of wildtype, these cells will sort to the tip of the forming slug, showing that the differentiation defect is cell autonomous (Fig 9). This is interesting but needs further work to obtain more mechanistic insight into why a mutant with a strong tip/stalk differentiation tendency fails to make a tip. Here again, knowing which cells express ADGF would be helpful.

      The adgf mutant shows increased prestalk marker expression in the mound but do not form a tip. It is well known that several mound arrest mutants form differentiated cells but are blocked in development with no tips (Carrin et al., 1994). This is addressed in the discussions (539). To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (17) The observed large mound phenotype could as suggested possibly be explained by the low ctn, smlA, and high cadA and csA expression observed in the mutant (Figure 3). The expression of some of these genes (csA) is known to require extracellular cAMP signalling. The reported low level of acaA expression and high level of pdsA expression could suggest low levels of cAMP signalling, but there are no actual measurements of the dynamics of cAMP signalling in this mutant to confirm this.

      The acaA expression was examined at 8 and 12 h (Figure. 6B) and cAMP levels were measured at 12 and 16 h in the adgf mutants (Figure. 6A). Both acaA expression and cAMP levels were reduced, suggesting that cells expressing adgf regulate acaA expression and cAMP levels. This regulation, in turn, is likely to influence cAMP signaling, collective cell movement within mounds, ultimately driving tip development. Exposure to ammonia led to increased acaA expression (Figure. 7B) in in WT. Based on the comments above, cAMP levels will be measured in the mutant before and after rescue with ammonia.

      (18) Furthermore, it would be useful to quantify whether ammonia addition to the mutant reverses mound size and restores any of the gene expression defects observed.

      Ammonia treatment soon after plating or six hours after plating, had no effect on the mound size (Figure. 5G).

      (19) There are many experimental data in the supplementary data that appear less relevant and could be omitted Figure S1, S3, S4, S7, S8, S9, S10.

      Figure S8, S9, S10 are omitted. We would like to retain the other figures

      Figure S1 (now Figure. S2): It is widely believed that ammonia comes from protein (White and Sussman, 1961; Hames and Ashworth, 1974; Schindler and Sussman, 1977) and RNA (Walsh and Wright, 1978) catabolism. Figure. S2 shows no significant difference in protein and RNA levels between WT and adgf mutant strains, suggesting that adenosine deaminaserelated growth factor (ADGF) activity serves as a major source of ammonia and plays a crucial role in tip organizer development in Dictyostelium. Thus, it is important to retain this figure.

      Figure S3 (now Figure. S4): The figure shows the treatment of various mound arrest mutants and multiple tip mutants with ADA enzyme and DCF, respectively, to investigate the pathway through which adgf functions. Additionally, it includes the rescue of the histidine kinase mutant dhkD with ammonia, indicating that dhkD acts upstream of adgf via ammonia signalling. Therefore, it is important to retain this figure.

      Figure S4 (now Figure. S5): This figure represents the developmental phenotype of other deaminase mutants. Unlike adgf mutants, mutations in other deaminases do not result in complete mound arrest, despite some of these genes exhibiting strong expression during development. This underscores the critical role of adenosine deamination in tip formation. Therefore, let this figure be retained.

      Figure S7 (now Figure. S8): Figure S8 presents the transcriptomic profile of ADGF during gastrulation and pre-gastrulation stages across different organisms, indicating that ADA/ADGF is consistently expressed during gastrulation in several vertebrates (Pijuan-Sala et al., 2019; Tyser et al., 2021). Notably, the process of gastrulation in higher organisms shares remarkable similarities with collective cell movement within the Dictyostelium mound (Weijer, 2009), suggesting a previously overlooked role of ammonia in organizer development. This implies that ADA may play a fundamental role in regulating morphogenesis across species, including Dictyostelium and vertebrates. Therefore, we would like to retain this figure.

      (20). Given the current state of knowledge, speculation about the possible role of ADGF in organiser function in amniotes seems far-fetched. It is worth noting that the streak is not equivalent to the organiser. The discussion would benefit from limiting itself to the key results and implications.

      The discussion is revised accordingly by removing the speculative role of ADGF in organizer function in amniotes. The lines “It is likely that ADA plays a conserved, fundamental role in regulating morphogenesis in Dictyostelium and other organisms including vertebrates” have been removed.

    1. Author Response:

      Reviewer #1 (Public Review):

      The main finding - that the moment-to-moment relationship between excitability and perception is coupled to the body's slower respiratory oscillation - is novel, interesting, and important for advancing our understanding of how the brain-body system works as a whole. The experiment is simple and elegant, and the authors strike the right level of making the most of the data without doing too much and obscuring the main findings. The primary weakness, in my opinion, is the inability to distinguish between the possibility that respiration modulates excitability and the possibility that respiration modulates something boring like signal-to-noise ratio. In terms of conclusions, I thought the authors stuck pretty well to the data. The one place where the conclusions felt a little bold was in terms of the respiration <> alpha <> behavior relationship, where it felt the authors had already made up their minds re: causality. I agree that it probably makes more sense for respiration to influence something about the brain than vice versa, and the background presented in the Intro/Discussion supports this. However, the analysis only tells us that the behavioral performance was modulated by both alpha and respiration (and their interaction, but this is no way causal). Overall, it will be necessary to differentiate the current interpretation from the possibility that breathing and alpha are two unrelated time courses that influence behavior at the same time (and even interact in how they influence behavior, but just not interact with each other), and I do not believe the phase-amplitude coupling analysis is sufficient for this.

      We thank the reviewer for their positive and constructive evaluation of our work.

      Reviewer #2 (Public Review):

      Kluger and colleagues investigated the influence of respiration on visual sensory perception in a near-threshold task and argue that the detected correlation between respiration phase and detection precision is liked to alpha power, which in turn is modulated by the phase of respiration. The experiments involved detecting a low-contrast visual stimulus to the left or right of a fixation point with contrast settings adjusted via an adaptive staircase approach to reach a desired 60% hit rate, resulting in an observed hit rate of 54%. The main findings are that mutual information between the discrete outcome of hit-or- miss and the continuous contrast variable is significantly increased when respiration phase is considered as well. Furthermore, results show that neuronal alpha oscillation power is modulated in phase with respiration and that perception accuracy is correlated with alpha power. Time resolved correlation analysis aligned on respiration phase shows that this correlation peaks during inspiration around the same phase where the psychometric function for the visual detection task reaches a minimum. The experimental design and data analysis seem solid but there are several concerns regarding the novelty of the findings and the interpretation of the results.

      Major concerns: The finding that visual perception is modulated by the respiration cycle is not new (see e.g. Flexman et al. 1974 or Zelano et al. 2016).

      There are multiple studies going back decades that show alpha oscillation power to be modulated by breathing (e.g. Stancák et al., 1993, Bing-Canar et al. 2016). Also, as the authors acknowledge, it is well-established that alpha power correlates with neuronal excitability and perception threshold. What seems to be new in this study is the use of a linear mixed effect model to analyze the relationship between alpha power, respiration phase and perception accuracy. However, the results mostly seem to confirm previous findings.

      Thank you for giving us the opportunity to clarify our approach and the conceptual novelty it provides. First, not at all do we claim that our study is the first to demonstrate respiration-related alpha changes. Not only do we prominently cite the work by Zelano and colleagues (JNeuro, 2016) in the Introduction and Discussion sections, we also have previous work from our own lab demonstrating these effects (see Kluger & Gross, PLoS Biol 2021). Second, the reviewer’s comment that ‘the results mostly seem to confirm previous findings’ unfortunately appears to frame a critical proof-of-concept as a lack of novelty: In order for us to claim a triadic relationship between respiration, excitability, and behaviour, it is paramount to first demonstrate that assumptions about pairwise relations (such as respiration <> alpha power and alpha power <> behaviour) are supported, which of course means replicating known results in our data. Third, in order to evaluate the novelty of our present study, it is crucial to consider its core aim, which was to characterise how automatic respiration is related to lowest-level perception by means of respiration-induced modulation of neural oscillations. At this point, we respectfully disagree with the reviewer’s assessment of our results being mostly replicative, as the references they provide differ from our approach in various key aspects: The classic study by Flexman and colleagues (1974) merely differentiates between inspiration and expiration, critically without accounting for the asymmetry between the two respiratory phases. Zelano and colleagues (2016) did not investigate visual perception at all, but instead asked participants to categorise emotional face stimuli (termed ‘emotion recognition task’). Stancák and colleagues (1993) did not investigate automatic, but paced breathing, which involves continuous, conscious top-down control of one’s breathing rhythm - a demand that is not comparable to automatic, natural breathing we investigate here. The same is true for any kind of respiratory intervention or training like the ‘mindfulness-of-breathing exercise’ employed in the study by Bing-Canar and colleagues (2016). Once again, the oscillatory changes reported by the authors are not induced by automatic breathing, but instead reflect the outcome of a conscious manipulation of the breathing rhythm. In highlighting the key differences between previous studies and our approach, we do hope to have dispelled the reviewer’s initial concern regarding the novelty of our findings.

      Magnetoencephalography captures broad band neuronal activity including gamma frequencies. As the authors show (Fig. 4) and other studies have shown, the power of neuronal oscillations across multiple frequency bands is modulated by respiration phase. Gamma and beta oscillations have been implicated in sensory processing as well. Support for the author's hypothesis that the perception threshold modulation with respiration is due to alpha power modulation would be strengthened if they could show that the power of oscillations in other frequency bands are not or only weakly linked to perception accuracy.

      We thank the reviewer for their well-justified suggestion to extend the spectral scope of our analyses to include other frequency bands. In response to their comment, we have recomputed our analysis pipeline for the frequency range between 2 - 70Hz. While the whole analysis and results are described in a new Supplementary Text and Supplementary Figures (see below), we outline key findings here.

      In keeping with the structure of our main analyses, we first computed cluster-corrected whole-scalp topographies for delta, theta, alpha, beta, and gamma bands for hits vs misses over time intervals 1s prior to stimulus presentation:

      Fig. S4 | Band-specific topographies over time. Whole-scalp topographic distribution of normalised pre- and peristimulus power differences between hits and misses, separately for each frequency band. Channels with significant differences in the respective band are marked (cluster-corrected within the respective time frame). Related to Fig. 3.

      Compared to the clear parieto-occipital topography of prestimulus alpha modulations, delta and theta effects were prominently shifted to anterior sensors, which renders their involvement in low-level visual processing highly unlikely. No significant effects were observed in the gamma range. In contrast, beta-band modulations were closest to the alpha effects in their topography, covering parietal as well as occipital sites. Although the size of normalised effects were markedly smaller in the beta band (compared to alpha frequencies, cf. colour scaling), the topographic distribution of prestimulus modulations as well as the spectral proximity of the two bands prompted further investigation of beta involvement. To this end, we computed the instantaneous correlation between individual beta power (over the respiration cycle) and respiratory phase, analogous to our main analysis shown in Fig. 4c. Consistent with the TFR analysis shown above, no significant correlation between oscillatory power and respiration time courses were found for delta, theta, and gamma bands. For the beta band, however, we found a significant correlation during the inspiratory phase, similar to the alpha correlation described in the main text (and shown for comparison in the new Supplementary Fig. S5):

      Fig. S5 | Instantaneous correlation of beta power and perceptual sensitivity. Group-level correlation between individual beta and PsychF threshold courses (averaged between 14 - 30 Hz) with significant phase vector (length of seven time points) marked by dark grey dots (cluster-corrected). Correlation time course of the alpha band (see Fig. 4c) shown for reference in light grey. Related to Fig. 4.

      While both alpha and beta power were correlated to the breathing signal during the inspiratory phase, the correlation time courses suggested that there might be differential effects in both frequency bands, as indicated by the phase shift visible in Supplementary Fig S5. Therefore, we finally recomputed the LMEM visualised in Fig. 4 with an additional factor for beta power. In this extended model, significant effects were found for both alpha (t(1790) = 3.27, p < .001) and beta power (t(1790) = 4.83, p < .001). Beta showed significant interactions with the sine of the respiratory signal (t(1790) = -3.52, p < .001) as well as with alpha power (t(1790) = -4.63, p < .001). Comparing the LMEM to the previous model which only contained alpha power (along with respiratory sine and cosine) confirmed the significant contribution of beta power in explaining PsychF threshold variation by means of a theoretical likelihood ratio test (χ²(4) = 60.43, p < .001). Overall, we thus found beta power to be i) significantly modulated by respiration (see Fig 1), ii) significantly suppressed over parieto-occipital sensors for hits vs misses (see Fig. S4), and iii) significantly contribute to variations in PsychF threshold (see Fig S5). Collectively, these findings suggest differential roles of alpha and beta power, which we discuss in the main text as well as in the Supplementary Text:

      “Whole-scalp control analyses across all frequency bands demonstrated that this topographical pattern was unique to alpha and beta prestimulus power (see Supplementary Text 1 and Fig. S4).”

      “Control analyses across all frequency bands yielded a significant instantaneous correlation between PsychF threshold and beta power as well, albeit at a slightly later phase (see Fig. S5). No significant correlations were found for the remaining frequency bands.”

      “Accordingly, one recent study proposed that the alpha rhythm shapes the strength of neural stimulus representations by modulating excitability (Iemi et al., 2021). Previous work by Michalareas and colleagues (2016) as well as our own data (see Supplementary Material) point towards an interactions between alpha and beta bands, as beta oscillations have very recently been implicated in mediating top-down signals from the frontal eye field (FEF) that modulate excitability in the visual cortex during spatial attention (Veniero et al., 2021). Our findings suggest that this top-down signalling is modulated across the respiration cycle in a way that changes behavioural performance.”

      In the discussion the authors speculate that respiration locked modulation of alpha power and associated neuronal excitability could be based on the modulation of blood CO2 levels. Most recent studies of respiratory modulation of brain activity have demonstrated significant differences between nasal and oral breathing, with nasal breathing (through activation of the olfactory bulb) typically resulting in a stronger influence of respiration on neuronal activity and behavioral performance than oral breathing. The authors only tested nasal breathing. If blood CO2 fluctuations are indeed responsible for the observed effect, there should be no difference in outcome between nasal and oral breathing. Comparing the two conditions would thus provide interesting additional information about the possible underlying mechanisms.

      We appreciate the reviewer’s well-justified remarks regarding the differential effects for nasal and oral breathing and their implications on underlying mechanisms such as CO2. In revising the present as well as other manuscripts, it has become evident that fluctuations of CO2 alone (and, as we previously discussed, related changes in pH) cannot possibly explain the effects we and others are observing. Therefore, the revised manuscript no longer discusses CO2 as a potential mechanism. We have removed the corresponding paragraph and instead refer to the distinction between nasal and oral breathing to strengthen the argument for OB-induced cross-frequency coupling:

      “As outlined in the introduction, there is broad consensus that cross-frequency coupling (Canolty and Knight, 2010; Jensen and Colgin, 2007) plays a central role in translating respiratory to neural rhythms: Respiration entrains neural activity within the olfactory tract via mechanoreceptors, after which the phase of this infraslow rhythm is coupled to the amplitude of faster oscillations (see Fontanini and Bower, 2006; Ito et al., 2014). While this mechanism is difficult to investigate directly in humans, converging evidence for the importance of bulbar rhythms comes from animal bulbectomy studies (Ito et al., 2014) and the fact that respiration-related changes in both oscillatory power and behaviour dissipate during oral breathing (Zelano et al., 2016; Perl et al., 2019). Thus, rhythmic nasal respiration conceivably aligns rhythmic brain activity across the brain, which in turn influences behaviour. In our present paradigm, transient phases of heightened excitability would then be explained by decreased inhibitory influence on neural signalling within the visual cortex, leading to increased postsynaptic gain and higher detection rates. Given that the breathing act is under voluntary control, the question then becomes to what extent respiration may be actively used to synchronise information sampling with phasic states of heightened excitability.”

      Reviewer #3 (Public Review):

      The topic is timely, the study is well-designed, and the work has been performed in a highly competent manner. The authors relate three variables: respiration, alpha power and perceptual performance, constituting a link between somatic and neuronal physiology and cognition. A particular strength is the temporal resolution of respiration effects on cognition (continuous analysis of the respiration cycle). Furthermore, results are well contextualized by very comprehensively written introduction and discussion sections (which, nevertheless, could be slightly shortened).

      We do appreciate the reviewer’s positive evaluation of our manuscript and are thankful for their constructive remarks. We respond to their comments in detail below and have shortened the Discussion section in response to one of the reviewer’s remarks (kindly see points 1.1 and 2 below).

      I have three points of criticism, all meant in a constructive way:

      1. I wonder whether the authors could have gone one step further in the analysis of causal mechanisms, rather than correlations. The analysis of timing (Fig. 4d) and the last sentence of the abstract suggest that they imagine a causal role of respiratory feedback on cognitive performance, mediated via coordination of brain activity (in the specific case, by increasing excitability in visual areas). This could be made more explicit by appropriate experiments and data analysis:

      1.1. Manipulating the input signal: former studies suggest that nasal respiration is crucial for effects on brain oscillations and/or performance (e.g. Yanovsky et al., 2014; Zelano et al., 2016). Thus, the causal inference could be easily checked by comparing nasal versus oral respiration, without changing gas- and pH-parameters of activity of brainstem centers. >Admittedly, this experiment may add significant work to the present data which, by themselves, are already very strong.

      We thank the reviewer for their insightful comment regarding the question of causality. We acknowledge that our interpretation should have been phrased a little more cautiously. Therefore, we have rephrased corresponding paragraphs at various instances throughout the manuscript (kindly see below). Particular under current circumstances, we further appreciate the reviewer’s concern regarding the acquisition of additional data for a direct comparison of nasal vs oral breathing. Their comment is of course entirely valid and we were eager to address it, especially since it relates to CO2- and/or pH-related mechanisms of RMBOs we previously discussed. In light of the reviewer’s comments (also see their related comment #2 below) and convincing evidence from both animal and human studies that already compared nasal and oral breathing, we no longer feel that changes in CO2 provide a reasonable explanation for respiration-related oscillatory and behavioural effects we observed here. Consequently, we have removed the corresponding paragraph from the Discussion section which now reads as follows:

      “As outlined in the introduction, there is broad consensus that cross-frequency coupling (Canolty and Knight, 2010; Jensen and Colgin, 2007) plays a central role in translating respiratory to neural rhythms: Respiration entrains neural activity within the olfactory tract via mechanoreceptors, after which the phase of this infraslow rhythm is coupled to the amplitude of faster oscillations (see Fontanini and Bower, 2006; Ito et al., 2014). While this mechanism is difficult to investigate directly in humans, converging evidence for the importance of bulbar rhythms comes from animal bulbectomy studies (Ito et al., 2014) and the fact that respiration-related changes in both oscillatory power and behaviour dissipate during oral breathing (Zelano et al., 2016; Perl et al., 2019). Thus, rhythmic nasal respiration conceivably aligns rhythmic brain activity across the brain, which in turn influences behaviour. In our present paradigm, transient phases of heightened excitability would then be explained by decreased inhibitory influence on neural signalling within the visual cortex, leading to increased postsynaptic gain and higher detection rates. Given that the breathing 17 act is under voluntary control, the question then becomes to what extent respiration may be actively used to synchronise information sampling with phasic states of heightened excitability.”

      1.2. Temporal relations: The authors show that respiration-induced alpha modulation precedes behavioral modulation (Fig. 4d and related results text). Again, this finding suggests a causal influence of respiration on performance, mediated by alpha suppression (see results, lines 318-320). Could the data be directly tested for causality (e.g. by applying Granger causality, dynamic causal modelling or other methods)? If this is difficult, the question of causality should at least be discussed more explicitly.

      We appreciate the reviewer’s constructive criticism and their suggestion to employ causal analyses. While we agree that the overall pattern of results strongly suggests a causal cascade of respiration -> excitability -> perception, our interpretation with regard to a dynamic mechanism was probably overly strong. Unfortunately, it is indeed difficult to use directional analyses like Granger causality or DCM on the current data, since these methods quantify the relationship between two time series. They would not allow us to investigate the triad of respiration, alpha power, and behaviour, as we have discrete responses (i.e., single events) instead of a continuous behavioural measure. In fact, we are currently preparing a directional analysis of respiration-brain coupling (in resting-state data without a behavioural component) for an upcoming manuscript. In response to the reviewer’s remarks, we have toned down our interpretation throughout the manuscript and explicitly discuss the question of causality in the Discussion section of the revised manuscript:

      “The bootstrapping procedure yielded a confidence interval of [-33.17 -29.25] degrees for the peak effect of alpha power. While these results strongly suggest that respiration-alpha coupling temporally precedes behavioural consequences, they do not provide sufficient evidence for a strict causal interpretation (see Discussion)”

      “Rigorous future work is needed to investigate potentially causal effects of respiration-brain coupling on behaviour, e.g. by means of directed connectivity within task-related networks. A second promising line of research considers top-down respiratory modulation as a function of stimulus characteristics (such as predictability). This would grant fundamental insights into whether respiration is actively adapted to optimise sensory sampling in different contexts, as suggested by the animal literature.”

      1. At various instances, the authors suggest that respiration-induced changes in pH may be responsible for the changes in cortical excitability which, in turn, affect behavioral performance. In the discussion, they quote respective literature (lines 406-418). I glanced through the quoted papers by Feldman, Chesler, Lee, Dulla and Gourine - as far as I could see none of them suggests that the cyclic process of respiration induces significant cyclic shifts of pH in the brain parenchyma (if at all, this may occur in specialized chemosensory neurons in the brainstem). Moreover, recent real-time measurements by Zhang et al. (Chem. Sci 12:7369-7376) do also not reveal such cyclic changes in the cortex. Finally, translating oscillatory extracellular pH changes (if existent) into changes in inhibitory efficacy would require some time, potentially inducing delays and variance onto the cyclic changes at the network level. I feel that the evidence for the proposed mechanism is not sufficient, notwithstanding that it is a valid hypothesis. Please check and correct the interpretation of the cited literature if necessary.

      We acknowledge the reviewer’s caution regarding our suggestion of pH involvement, which is closely related to their previous comment (kindly see 1.1 above). As the reviewer mentions themselves, there are several studies demonstrating an absence of both neural and behavioural modulations for oral (vs nasal) breathing. These reports provide direct evidence against a mechanism driven by changes in CO2 and/or pH, which would be identical for nasal and oral breathing. Moreover, a second valid criticism is the uncertain temporal delay introduced by the (hypothetical) translation of pH changes into neural signals, which would most likely be incompatible with the ‘online’ (i.e., within-cycle) effects we report here. Therefore, as outlined in our response above, we have removed the pH-related suggestions from the Discussion section.

      1. Finally, some illustrations should be presented in a clearer way for those not familiar with the specifics of MEG analysis.

      We appreciate the reviewer’s suggestions regarding the clarity of our manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors challenge the long-standing conclusion that Orco and IR-dependent olfactory receptor neurons are segregated into subtypes such that Orco and IR expression do not overlap. First, the authors generate new knock-in lines to tag the endogenous loci with an expression reporter system, QF/QUAS. They then compare the observed expression of these knock-ins with the widely used system of enhancer transgenes of the same receptors, namely Orco, IR8a, IR25a, and IR76b. Surprisingly, they observe an expansion of the expression of the individual knock-in reporters as compared to the transgenic reporters in more chemosensory neurons targeting more glomeruli per receptor type than previously reported. They verify the expression of the knock-in reporters with antibody staining, in situ hybridization and by mining RNA sequencing data.

      Finally, they address the question of physiological relevance of such co-expression of receptor systems by combining optogenetic activation with single sensillum recordings and mutant analysis. Their data suggests that IR25a activation can modulate Orco-dependent signaling and activation of olfactory sensory neurons.

      The paper is well written and easy to follow. The data are well presented and very convincing due in part to the combination of complementary methods used to test the same point. Thus, the finding that co-receptors are more broadly and overlappingly expressed than previously thought is very convincing and invites speculation of how this might be relevant for the animal and chemosensory processing in general. In addition, the new method to make knock-ins and the generated knock-ins themselves will be of interest to the fly community.

      We thank the reviewer for their enthusiasm and support of our work!

      The last part of the manuscript, although perhaps the most interesting, is the least developed compared to the other parts. In particular, the following points could be addressed:

      • It would be good to see a few more traces and not just the quantifications. For instance, the trace of ethyl acetate in Fig. 6C, and penthyl acetate for 6G.

      Thank you for the suggestion. We have added a new figure supplement (Figure 6-Figure Supplement 3) with additional example traces for all odorants from Figure 6 for which we found a statistically significant difference between the two genotypes (Ir25a versus wildtype).

      • In Fig. 4D, the authors show the non-retinal fed control, which is great. An additional genetic control fed with retinal would have been nice.

      For these experiments, we followed a standard practice in Drosophila optogenetics to test the same experimental genotype in the presence or absence of the essential cofactor all-trans-retinal. This controls for potential effects from the genetic background. It is possible our description of these experiments was unclear (as also suggested by comments from Reviewer 2). As such, we have clarified our experimental design for the optogenetic experiments in the revised manuscript:

      Modified text: “No light-induced responses were found in control flies, which had the same genotype as experimental flies but were not fed all-trans retinal (-ATR), a necessary co-factor for channelrhodopsin function (see Methods).” and “Bottom trace is control animal, which has the same genotype as the experimental animal but was not fed the required all-trans retinal cofactor (-ATR).”

      Figure 4-Figure Supplement 1 legend: “In all optogenetic experiments, control animals have the same genotypes as the corresponding experimental animals but have not been fed all-trans retinal.”

      Methods: “For all optogenetic experiments, the control flies were of the same genotype as experimental flies but had not been fed all-trans retinal.”

      • It appears that mostly IR25a is strongly co-expressed with other co-receptors. The provided experiments suggest a possible modulation between IR25a and Orco-dependent neuronal activity. However, what does this mean? How could this be relevant? And moreover, is this a feature of Drosophila melanogaster after many generations in laboratories?

      We share this reviewer’s excitement regarding the numerous questions our work now raises. While testing additional functional ramifications of chemosensory co-receptor expression is beyond the scope of this work (but will undoubtedly be the focus of future studies), we did expand on what this might mean in the revised Discussion section of the revised manuscript. Previously, we had raised the hypothesis that chemoreceptor co-expression could be an evolutionary relic of Ir25a expression in all chemoreceptor neurons , or a biological mechanism to broaden the response profile of an olfactory neuron without sacrificing its ability to respond to specific odors. We now extend our discussion to raise additional possible ramifications. For example, we suggest that modulating Ir25a coexpression could alter the electrical properties of a neuron, making it more (or possibly less) sensitive to Orco-dependent responses. We also suggest that Ir25a coexpression might be an evolutionary mechanism to allow olfactory neurons to adjust their response activities. That is, that most Orco-positive olfactory neurons are already primed to be able to express a functional Ir receptor if one were to be expressed. Such co-expression in some olfactory neurons might present an evolutionary advantage by ensuring olfactory responses to a complex but crucial biologically relevant odor, like human odors to some mosquitoes.

      Reviewer #2 (Public Review):

      In the present study, the authors: 1) generated knock-in lines for Orco, Ir8a, Ir25a, and IR7ba, and examined their expression, with a main focus on the adult olfactory organs. 2) confirmed the expression of these receptors using antibody staining. 3) examined the innervation patterns of these knock-in lines in the nervous system. 4) identified a glomerulus, VM6, that is divided into three subdivisions. 5) examined olfactory responses of neurons co-expressing Orco and Ir25a

      The results of the first four sets of experiments are well presented and support the conclusions, but the results of the last set of experiments (the electrophysiology part) need some details. Please find my detailed comments below.

      We thank the reviewer for their support of our work and appreciating the importance of our findings. In the revised manuscript, we now provide the additional experimental details for the electrophysiology work as requested.

      Major points

      Line 167-171: I wonder if the authors also compared the Orco-T2A-QF2 knock-in with antibody staining of the antenna.

      We did perform whole-mount anti-Orco antibody staining on Orco-T2A-QF2 > GFP antennae (example image below). We saw broad overlap between Orco+ and GFP+ cells, similar to the palps. However, we did not include these results since quantification of these tissues is challenging for the following reasons:

      1. There are ~1,200 olfactory neurons in each antenna, many of which are Orco+.
      2. The thickness of the tissue makes determinations of co-localization difficult in wholemount staining.
      3. Co-localization is further complicated by the sub-cellular localization of the signals: Orco antibodies preferentially label dendrites and weakly label cell bodies, while our GFP reporter is cytoplasmic and preferentially labels cell bodies. For these reasons, we focused on the numerically simpler palps for quantification. For the Ir8aT2A-QF2 and Ir76b-T2A-QF2 lines, palp quantification was not an option as neither knock-in drove expression in the palps (and the available antibodies did not work with the whole-mount staining protocol). This is why we performed antennal cryosections to validate these lines. Below is an example image of the antennal whole-mount staining in the Orco-T2A-QF2 knock-in line, illustrating the quantification challenges enumerated above.

      *Co-staining of anti-Orco and GFP in Orco-T2A-QF2 > 10xQUAS-6xGFP antenna *

      Lines 316-319 (Figure 4D): It would be better if the authors compare the responses of Ir25a>CsChrimson to those of Orco>CsChrimson.

      The goal of the optogenetic experiments was to provide experimental support for Ir25a expression in Orco+ neurons in an approach independent to previous methods. Our main question was whether we could activate what was previously considered Orco-only olfactory neurons using the Ir25a knock-in. These experiments were not designed to determine if this optogenetic activation recapitulated the normal activity of these neurons. For these reasons, we did not attempt the optogenetic experiments with Orco>CsChrimson flies.

      Line 324-326: Why the authors tested control flies not fed all-trans retinal? They should test Ir25a-T2A-QF2>QUAS-CsChrimson not fed all-trans retinal as a control.

      We apologize for the confusion. The “control” flies we used were indeed Ir25a-T2AQF2>QUAS-CsChrimson flies not fed all-trans retinal as suggested by the reviewer. This detail was in the methods, yet likely was not clear. We have amended the main text in multiple locations to state the full genotype of the control fly more clearly:

      Modified text: “No light-induced responses were found in control flies, which had the same genotype as experimental flies but were not fed all-trans retinal (-ATR), a necessary co-factor for channelrhodopsin function (see Methods).” and “Bottom trace is control animal, which has the same genotype as the experimental animal but was not fed the required all-trans retinal cofactor (-ATR).”

      Figure 4-Figure Supplement 1 legend: “In all optogenetic experiments, control animals have the same genotypes as the corresponding experimental animals but have not been fed all-trans retinal.”

      Methods: “For all optogenetic experiments, the control flies were of the same genotype as experimental flies but had not been fed all-trans retinal.”

      Line 478-500: I wonder if the observed differences between the wildtype and Ir25a2 mutant lines are due to differences in the genetic background between both lines. Did the authors backcross Ir25a2 mutant line with the used wildtype for at least five generations?

      Yes, the mutants are outcrossed into the same genetic background as the wildtypes for at least five generations. Please see Methods, revised manuscript: “Ir25a2 and Orco2 mutant fly lines were outcrossed into the w1118 wildtype genetic background for at least 5 generations.”

      Line 1602-1603: Does the identification of ab3 sensilla using fluorescent-guided SSR apply for ab3 sensilla in Orco mutant flies. How does this ab3 fluorescent-guided SSR work?

      In fluorescence guided SSR (fgSSR; Lin and Potter, PloS One, 2015), the ab3 sensilla is GFPlabelled (genotype: Or22a-Gal4>UAS-mCD8:GFP), which allows this sensilla to be specifically identified under a microscope and targeted for SSR recordings. We generated fly stocks for fgSSR identification of ab3 in all three genetic backgrounds (wildtype, Orco mutant, Ir25a mutant).

      These three genotypes are described in the methods:

      “Full genotypes for ab3 fgSSR were:

      Pin/CyO; Or22a-Gal4,15XUAS-IVS-mcd8GFP/TM6B (wildtype),

      Ir25a2; Or22a-Gal4,15XUAS-IVS-mcd8GFP/TM6B (Ir25a2 mutant),

      Or22a-Gal4/10XUAS-IVS-mcd8GFP (attp40); Orco2 (Orco2 mutant).”

      Line 1602-1604: There is no mention of how the authors identified ab9 sensilla.

      Information on the identification of ab9 sensilla is under the optogenetics section of the methods: “Identification of ab9 sensilla was assisted by fluorescence-guided Single Sensillum Recording (fgSSR) (Lin and Potter, 2015) using Or67b-Gal4 (BDSC #9995) recombined with 15XUAS-IVS-mCD8::GFP (BDSC #32193).”

      Line 1648: what are the set of odorants that were used to identify the different coeloconic sensilla?

      We have added the specific odorants used for sensillar identification for coeloconic SSR in the Methods. The protocol and odorants used were:

      *2,3-butanedione (BUT), 1,4-diaminobutane (DIA), Ammonia (AM), hexanol (HEX), phenethylamine (PHEN), and propanal (PROP) to distinguish coeloconic sensilla:

      o Wildtype flies: Strong DIA and BUT responses identify ac2 and rule out ac4. Absence of strong AM response rules out ac1, absence of HEX response rules out ac3, absence of PHEN response further rules out ac4.

      o Ir25a mutant flies (amine responses lost, so cannot use PHEN and DIA as diagnostics): Strong BUT response and moderate PROP response identify ac2 and rule out ac4. Absence of strong AM response rules out ac1, absence of HEX response rules out ac3. Ac4 is further ruled out anatomically based on sensillar location compared to ac2.

      Revised text: “Different classes of coeloconic sensilla were identified by their known location on the antenna and confirmed with their responses to a small panel of diagnostic odorants: in wildtype flies, ac2 sensilla were identified by their strong responses to 1,4-diaminobutane and 2,3-butanedione. The absence of a strong response to ammonia was used to rule out ac1 sensilla, the absence of a hexanol response was used to rule out ac3 sensilla, and the absence of a phenethylamine response was used to rule out ac4 sensilla. In Ir25a mutant flies in which amine responses were largely abolished, ac2 and ac4 sensilla were distinguished based on anatomical location, as well as the strong response of ac2 to 2,3-butanedione and the moderate response to propanal (both absent in ac4). Ac1 and ac3 sensilla were excluded similarly in the mutant and wildtype flies. No more than 4 sensilla per fly were recorded. Each sensillum was tested with multiple odorants, with a rest time of at least 10s between applications.

    1. Author Response:

      Reviewer #1 (Public Review):

      1. There was little comment on the strategy/mechanism that enabled subjects to readily attain Target I (MU 1 active alone), and then Target II (MU1 and MU2 active to the same relative degree). To accomplish this, it would seem that the peak firing rate of MU1 during pursuit of Target II could not exceed that during Target I despite an increased neural drive needed to recruit MU2. The most plausible explanation for this absence of additional rate coding in MU1 would be that associated with firing rate saturation (e.g., Fuglevand et al. (2015) Distinguishing intrinsic from extrinsic factors underlying firing rate saturation in human motor units. Journal of Neurophysiology 113, 1310-1322). It would be helpful if the authors might comment on whether firing rate saturation, or other mechanism, seemed to be at play that allowed subjects to attain both targets I and II.

      To place the cursor inside TII, both MU1 and MU2 must discharge action potentials at their corresponding average discharge rate during 10% MVC (± 10% due to the target radius and neglecting the additional gain set manually in each direction). Therefore, subjects could simply exert a force of 10% MVC to reach TII and would successfully place the cursor inside TII. However, to get to TI, MU1 must discharge action potentials at the same rate as during TII hits (i.e. average discharge rate at 10% MVC) while keeping MU2 silent. Based on the performance analysis in Fig 3D, subjects had difficulties moving the cursor towards TI when the difference in recruitment threshold between MU1 and MU2 was small (≤ 1% MVC). In this case, the average discharge rate of MU1 during 10% MVC could not be reached without activating MU2. As could be expected, reaching towards TI became more successful when the difference in recruitment threshold between MU1 and MU2 was relatively large (≥3% MVC). In this case, subjects were able to let MU1 discharge action potentials at its average discharge rate at 10% MVC without triggering activation of MU2 (it seems the discharge rate of MU1 saturated before the onset of MU2). Such behaviour can be observed in Fig. 2A. MUs with a lower recruitment threshold saturate their discharge rate before the force reaches 10% MVC. We adapted the Discussion accordingly to describe this behaviour in more detail.

      1. Figure 4 (and associated Figure 6) is nice, and the discovery of the strategy used by subjects to attain Target III is very interesting. One mechanism that might partially account for this behavior that was not directly addressed is the role inhibition may have played. The size principle also operates for inhibitory inputs. As such, small, low threshold motor neurons will tend to respond to a given amount of inhibitory synaptic current with a greater hyperpolarization than high threshold units. Consequently, once both units were recruited, subsequent gradual augmentation of synaptic inhibition (concurrent with excitation and broadly distributed) could have led to the situation where the low threshold unit was deactivated (because of the higher magnitude hyperpolarization), leaving MU2 discharging in isolation. This possibility might be discussed.

      We agree with the reviewer’s comment that inhibition might have played a critical role in succeeding to reach TIII. Hence, we have added this concept to our discussion.

      1. In a similar vein as for point 2 (above), the argument that PICs may have been the key mechanism enabling the attainment of target III, while reasonable, also seems a little hand wavy. The problem with the argument is that it depends on differential influences of PICs on motor neurons that are 1) low threshold, and 2) have similar recruitment thresholds. This seems somewhat unlikely given the broad influence of neuromodulatory inputs across populations of motor neurons.

      We agree with the reviewer’s point and reasoning that a mixture of neuromodulation and inhibition likely introduced the variability in MU activity we observed in this study. This comment is addressed in the answer to comment 3.

      Reviewer #2 (Public Review):

      [...]

      1. Some subjects seemed to hit TIII by repeatedly "pumping" the force up and down to increase the excitability of MU2 (this appears to happen in TIII trials 2-6 in Fig. 4 - c.f. p18 l30ff). It would be useful to see single-trial time series plots of MU1, MU2, and force for more example trials and sessions, to get a sense for the diversity of strategies subjects used. The authors might also consider providing additional analyses to test whether multiple "pumps" increased MU2 excitability, and if so, whether this increase was usually larger for MU2 than MU1. For example, they might plot the ratio of MU2 (and MU1) activation to force (or, better, the residual discharge rate after subtracting predicted discharge based on a nonlinear fit to the ramp data) over the course of the trial. Is there a reason to think, based on the data or previous work, that units with comparatively higher thresholds (out of a sample selected in the low range of <10% MVC) would have larger increases in excitability?


      We added a supplementary figure (Supplement 4) that visualizes additional trials from different conditions and subjects for TIII-instructed trials and noted this in the text.

      MU excitability might indeed be pronounced during repeated activations within a couple of seconds (see, for example, M. Gorassini, J. F. Yang, M. Siu, and D. J. Bennett, “Intrinsic Activation of Human Motoneurons: Reduction of Motor Unit Recruitment Thresholds by Repeated Contractions,” J. Neurophysiol., vol. 87, no. 4, pp. 1859–1866, 2002.). Such an effect, however, seems to be equally distributed to all active MUs. Moreover, we are not aware of any recent studies suggesting that MUs, within the narrow range of 0-10% MVC, may be excited differently by such a mechanism. Supplement 4C and D illustrate trials in which subjects performed multiple “pumps”. Visually, we could not find changes in the excitability specific to any of the two MUs nor that subjects explored repeated activation of MUs as a strategy to reach TIII. It seems subjects instead tried to find the precise force level which would allow them to keep MU2 active after the offset of MU1. We further discussed that PICs act very broadly on all MUs. The observed discharge patterns when successfully reaching TIII may likely be due to an interplay of broadly distributed neuromodulation and locally acting synaptic inhibition.

      1. I am somewhat surprised that subjects were able to reach TIII at all when the de-recruitment threshold for MU1 was lower than the de-recruitment threshold for MU2. It would be useful to see (A) performance data, as in Fig. 3D or 5A, conditioned on the difference in de-recruitment thresholds, rather than recruitment thresholds, and (B) a scatterplot of the difference in de-recruitment vs the difference in recruitment thresholds for all pairs.


      We agree that comparing the difference in de-recruitment threshold with the performance of reaching each target might provide valuable insights into the strategies used to perform the tasks. Hence, we added this comparison to Figure 4E at p. 16, l. 1. A scatterplot of the difference in de-recruitment threshold and the difference in recruitment threshold has been added to Supplement 3A. The Results section was modified in line with the above changes.

      1. Using MU1 / MU2 rates to directly control cursor position makes sense for testing for independent control over the two MUs. However, one might imagine that there could exist a different decoding scheme (using more than two units, nonlinearities, delay coordinates, or control of velocity instead of position) that would allow subjects to generate smooth trajectories towards all three targets. Because the authors set their study in a BCI context, they may wish to comment on whether more complicated decoding schemes might be able to exploit single-unit EMG for BCI control or, alternatively, to argue that a single degree of freedom in input fundamentally limits the utility of such schemes.


      This study aimed to assess whether humans can learn to decorrelate the activity between two MUs coming from the same functional MU pool during constraint isometric conditions. The biofeedback was chosen to encourage subjects to perform this non-intuitive and unnatural task. Transferring biofeedback on single MUs into an application, for example, BCI control, could include more advanced pre-processing steps. Not all subjects were able to navigate the cursor along both axes consistently (always hitting TI and TIII). However, the performance metric (Figure 4C) indicated that subjects became better over time in diverging from the diagonal and thus increased their moving range inside the 2D space for various combinations of MU pairs. Hence, a weighted linear combination of the activity of both MUs (for example, along the two principal components based on the cursor distribution) may enable subjects to navigate a cursor from one axis to another. Similarly, coadaptation methods or different types of biofeedback (auditory or haptic) may help subjects. Furthermore, using only two MUs to drive a cursor inside a 2-D space is prone to interference. Including multiple MUs in the control scheme may improve the performance even in the presence of noise. We have shown that the activation of a single MU pool exposed to a common drive does not necessarily obey rigid control. State-dependent flexible control due to variable intrinsic properties of single MUs may be exploited for specific applications, such as BCI. However, further research is necessary to understand the potentials and limits of such a control scheme.

      1. The conclusions of the present work contrast somewhat with those of Marshall et al. (ref. 24), who claim (for shoulder and proximal arm muscles in the macaque) that (A) violations of the "common drive" hypothesis were relatively common when force profiles of different frequencies were compared, and that (B) microstimulation of different M1 sites could independently activate either MU in a pair at rest. Here, the authors provide a useful discussion of (A) on p19 l11ff, emphasizing that independent inputs and changes in intrinsic excitability cannot be conclusively distinguished once the MU has been recruited. They may wish to provide additional context for synthesizing their results with Marshall et al., including possible differences between upper / lower limb and proximal / distal muscles, task structure, and species.

      The work by Marshall, Churchland and colleagues shows that when stimulating focally in specific sites in M1 single MUs can be activated, which may suggest a direct pathway from cortical neurons to single motor neurons within a pool. However, it remains to be shown if humans can learn to leverage such potential pathways or if the observations are limited to the artificially induced stimulus. The tibialis anterior receives a strong and direct cortical projection. Thus, we think that this muscle may be well suited to study whether subjects can explore such specific pathways to activate single MUs independently. However, it may very well be that the control of upper limbs show more flexibility than lower ones. However, we are not aware of any study that may provide evidence for a critical mismatch in the control of upper and lower limb MU pools. We have added this discussion to the manuscript.

      Reviewer #3 (Public Review):

      [...]

      Even if the online decomposition of motor units were performed perfectly, the visual display provided to subject smooths the extracted motor unit discharge rates over a very wide time window: 1625 msec. This window is significantly larger than the differences in recruitment times in many of the motor unit pairs being used to control the interface. So while it's clear that the subjects are learning to perform the task successfully, it's not clear to me that subjects could have used the provided visual information to receive feedback about or learn to control motor unit recruitment, even if individuated control of motor unit recruitment by the nervous system is possible. I am therefore not convinced that these experiments were a fair test of subjects' ability to control the recruitment of individual motor units.

      Regarding the validating of isolating motor units in the conditions analysed in this study, we have added a full new set of measurements with concomitant surface and intramuscular recordings during recruitment/derecruitment of motor units at variable recruitment speed. This provides a strong validation of the approach and of the accuracy of the online decomposition used in this study. Subjects received visual feedback on the activity of the selected MU pair, i.e. discharge behaviour of both MUs and the resulting cursor movement. This information was not clear from the initial submission and hence, we annotated the current version to clarify the biofeedback modalities. To further clarify the decoding of incoming MU1/MU2 discharge rates into cursor movement, we included Supplement 2. We also included a video that shows that the smoothing window on the cursor position does not affect the immediate cursor movement due to incoming spiking activity. For example, as shown in Supplement 2, for the initial offset of 0ms, the cursor starts moving along the axis corresponding to a sole activation of MU1 and immediately diverges from this axis when MU2 starts to discharge action potentials. We, therefore, think that the biofeedback provided to the subjects does allow exploration of single MU control.

      Along similar lines, it seems likely to me that subjects are using some other strategy to learn the task, quite possibly one based on control of over overall force at the ankle and/or voluntary recruitment of other leg/foot muscles. Each of these variables will presumably be correlated with the activity of the recorded motor units and the movement of the cursor on the screen. Moreover, because these variables likely change on a similar (or slower) timescale than differences in motor units recruitment or derecruitment, it seems to me that using such strategies, which do not reflect or require individuated motor unit recruitment, is a highly effective way to successfully complete the task given the particular experimental setup.

      In addition to being seated and restricted by an ankle dynamometer, subjects were instructed to only perform dorsiflexion of the ankle. Further, none of the subjects reported compensatory movements as a strategy to reach any of the targets. In addition, to be successfully utilised, such compensatory movements would need to influence various combinations of MUs tested in this study equally, even when they differ in size. Nevertheless, we acknowledge, as pointed out by the reviewer, that our setup has limitations. We only measured force in a single direction (i.e. ankle dorsiflexion) and did not track toe, hip or knee movements. Even though an instructor supervised leg movement throughout the experiment, it may be that very subtle and unknowingly compensatory movements have influenced the activity of the selected MUs. Hence, we updated the limitations section in the Discussion.

      To summarize my above two points, it seems like the author's argument is that absence of evidence (subjects do not perform individuated MU recruitment in this particular task) constitutes evidence of absence (i.e. is evidence that individuated recruitment is not possible for the nervous system or for the control of brain-machine interfaces). Therefore given the above-described issues regarding real-time feedback provided to subjects in the paper it is not clear to me that any strong conclusions can be drawn about the nervous system's ability or inability to achieve individuated motor unit recruitment.

      We hope that the above changes clarify the biofeedback modalities and their potential to provide subjects with the necessary information for exploring independent MU control. Our experiments aimed to investigate whether subjects can learn under constraint isometric conditions to decorrelate the activity between two MUs coming from the same functional pool. While it seemed that MU activity could be decorrelated, this almost exclusively happened (TIII-instructed trials) within a state-dependent framework, i.e. both MUs must be activated first before the lower threshold one is switched off. We did not observe flexible MU control based exclusively on a selective input to individual MUs (MU2 activated before MU1 during initial recruitment). That does not mean that such control is impossible. However, all successful control strategies that were voluntarily explored by the subjects to achieve flexible control were based on a common input and history-dependent activation of MUs. We have added these concepts to the discussion section.

      Second, to support the claims based on their data the authors must explain their online spike-sorting method and provide evidence that it can successfully discriminate distinct motor unit onset/offset times at the low latency that would be required to test their claims. In the current manuscript, authors do not address this at all beyond referring to their recent IEEE paper (ref [25]). However, although that earlier paper is exciting and has many strengths (including simultaneous recordings from intramuscular and surface EMGs), the IEEE paper does not attempt to evaluate the performance metrics that are essential to the current project. For example, the key metric in ref 25 is "rate-of-agreement" (RoA), which measures differences in the total number of motor unit action potentials sorted from, for example, surface and intramuscular EMG. However, there is no evaluation of whether there is agreement in recruitment or de-recruitment times (the key variable in the present study) for motor units measured both from the surface and intramuscularly. This important technical point must be addressed if any conclusions are to be drawn from the present data.

      We have taken this comment in high consideration, and we have performed a validation based on concomitant intramuscular and surface EMG decomposition in the exact experimental conditions of this study, including variations in the speed of recruitment and de-recruitment. This new validation fully supports the accuracy in of the methods used when detecting recruitment and de-recruitment of motor units.

      My final concern is that the authors' key conclusion - that the nervous system cannot or does not control motor units in an individuated fashion - is based on the assumption that the robust differences in de-recruitment time that subjects display cannot be due to differences in descending control, and instead must be due to changes in intrinsic motor unit excitability within the spinal cord. The authors simply assert/assume that "[derecruitment] results from the relative intrinsic excitability of the motor neurons which override the sole impact of the receive synaptic input". This may well be true, but the authors do not provide any evidence for this in the present paper, and to me it seems equally plausible that the reverse is true - that de-recrutiment might influenced by descending control. This line of argumentation therefore seems somewhat circular.

      When subjects were asked to reach TIII, which required the sole activation of a higher threshold MU, subjects almost exclusively chose to activate both MUs first before switching off the lower threshold MU. It may be that the lower de-recruitment threshold of MU2 was determined by descending inputs changing the excitability of either MU1 or MU2 (for example, see J. Nielsen, C. Crone, T. Sinkjær, E. Toft, and H. Hultborn, “Central control of reciprocal inhibition during fictive dorsiflexion in man,” Exp. brain Res., vol. 104, no. 1, pp. 99–106, Apr. 1995 or E. Jankowska, “Interneuronal relay in spinal pathways from proprioceptors,” Prog. Neurobiol., vol. 38, no. 4, pp. 335–378, Apr. 1992). Even if that is the case, it remains unknown why such a command channel that potentially changes the excitability of a single MU was not voluntarily utilized at the initial recruitment to allow for direct movement towards TIII (as direct movement was preferred for TI and TII). We cannot rule out that de-recruitment was affected by selective descending commands. However, our results match observations made in previous studies on intrinsic changes of MU excitability after MU recruitment. Therefore, even if descending pathways were utilized throughout the experiment to change, for example, MU excitability, subjects were not able to explore such pathways to change initial recruitment and achieve general flexible control over MUs. The updated discussion explains this line of reasoning.

      Reviewer #4 (Public Review):

      [...]

      1. Figure 6a nicely demonstrates the strategy used by subjects to hit target TIII. In this example, MU2 was both recruited and de-recruited after MU1 (which is the opposite of what one would expect based on the standard textbook description). The authors state (page 17, line 15-17) that even in the reverse case (when MU2 is de-recruited before MU1) the strategy still leads to successful performance. I am not sure how this would be done. For clarity, the authors could add a panel similar to panel A to this figure but for the case where the MU pairs have the opposite order of de-recruitment.

      We have added more examples of successful TIII-instructed trials in Supplement 4. Supplement 4C and D illustrate examples of subjects navigating the cursor inside TIII even when MU2 was de-recruited before MU1. As exemplarily shown, subjects also used the three-stage approach discussed in the manuscript. In contrast to successful trials in which MU2 was de-recruited after MU1 (for example, Supplement 4B), subjects required multiple attempts until finding a precise force level that allowed a continuous firing of MU2 while MU1 remained silent. We have added a possible explanation for such behaviour in the Discussion.

      1. The authors discuss a possible type of flexible control which is not evident in the recruitment order of MUs (page 19, line 27-28). This reasoning was not entirely clear to me. Specifically, I was not sure which of the results presented here needs to be explained by such mechanism.

      We have shown that subjects can decorrelate the discharge activity of MU1 and MU2 once both MUs are active (e.g. reaching TIII). Thus, flexible control of the MU pair was possible after the initial recruitment. Therefore, this kind of control seems strongly linked to a specific activation state of both MUs. We further elaborated on which potential mechanisms may contribute to this state-dependent control.

      1. The authors argue that using a well-controlled task is necessary for understanding the ability to control the descending input to MUs. They thus applied a dorsi-flexion paradigm and MU recordings from TA muscles. However, it is not clear to what extent the results obtained in this study can be extrapolated to the upper limb. Controlling the MUs of the upper limb could be more flexible and more accessible to voluntary control than the control of lower limb muscles. This point is crucial since the authors compare their results to other studies (Formento et al., bioRxiv 2021 and Marshall et al., bioRxiv 2021) which concluded in favor of the flexible control of MU recruitment. Since both studies used the MUs of upper limb muscles, a fair comparison would involve using a constrained task design but for upper limb muscles.

      We agree with the reviewer that our work differs from previous approaches, which also studied flexible MU control. We, therefore, added a paragraph to the limitation section of the Discussion.

      1. The authors devote a long paragraph in the discussion to account for the variability in the de-recruitment order. They mostly rely on PIC, but there is no clear evidence that this is indeed the case. Is it at all possible that the flexibility in control over MUs was over their recruitment threshold? Was there any change in de-recruitment of the MUs during learning (in a given recording session)?

      The de-recruitment threshold did not critically change when compared before and after the experiment on each day (difference in de-recruitment threshold before and after the experiment: -0.16 ± 2.28% MVC, we have now added this result to the Results section). Deviations from the classical recruitment order may be achieved by temporal (short-lived) changes in the intrinsic excitability of single MUs. We, therefore, extended our discussion on potential mechanisms that may explain the observed variability given all MUs receive the same common input.

      1. The need for a complicated performance measure (define on page 5, line 3-6) is not entirely clear to me. What is the correlation between this parameter and other, more conventional measures such as total-movement time or maximal deviation from the straight trajectory? In addition, the normalization process is difficult to follow. The best performance was measured across subjects. Does this mean that single subject data could be either down or up-regulated based on the relative performance of the specific subject? Why not normalize the single-subject data and then compare these data across subjects?

      We employed this performance metric to overcome shortcomings of traditional measures such as target hit count, time-to-target or deviation from the straight trajectory. Such problems are described in the illustration below for TIII-instructed trials (blue target). A: the duration of the trial is the same in both examples (left and right); however, on the left, the subject manages to keep the cursor close to the target-of-interest while on the right, the cursor is far away from the target centre of TIII. B: In both images the cursor has the same distance d to the target centre of TIII. However, on the left, the subject manages to switch off MU1 while keeping MU2 active, while on the right, both MUs are active. C: On the left, the subject manages to move the cursor inside the TIII before the maximum trial time was reached, while on the right, the subject moved the cursor up and down, not diverging from the ideal trajectory to the target centre but fails to place the cursor inside TIII within the duration of the trial. In all examples, using only one conventional measure fails to account for a higher performance value in the left scenario than in the right. Our performance metric combines several performance metrics such as time-to-target, distance from the target centre, and the discharge rate ratio between MU1 and MU2 via the angle 𝜑 and thus allows a more detailed analysis of the performance than conventional measures. The normalisation of the performance value was done to allow for a comparison across subjects. The best and worst performance was estimated using synthetic data mimicking ideal movement towards each target (i.e. immediate start from the target origin to the centre of the target, while the normalised discharge rate of the corresponding MU is set to 1). Since the target space is normalised for all subjects in the same manner (mean discharge rate of the corresponding MUs at 10 %MVC) this allows us to compare the performance between subjects, conditions and targets.

      1. Figure 3C appears to indicate that there was only moderate learning across days for target TI and TII. Even for target TIII there was some improvement but the peak performance in later days was quite poor. The fact that the MUs were different each day may have affected the subjects' ability to learn the task efficiently. It would be interesting to measure the learning obtained on single days.

      We have added an analysis that estimated the learning within a session per subject and target (Supplement 3C). In order to evaluate the strength of learning within-session, the Spearman correlation coefficient between target-specific performance and consecutive trials was calculated and averaged across conditions and days. The results suggest that there was little learning within sessions and no significant difference between targets. These results have now been added to the manuscript.

      1. On page 16 line 12-13, the authors describe the rare cases where subjects moved directly towards TIII. These cases apparently occurred when the recruitment threshold of MU2 was lower. What is the probable source of this lower recruitment level in these specific trials? Was this incidental (i.e., the trial was only successful when the MU threshold randomly decreased) or was there volitional control over the recruitment threshold? Did the authors test how the MU threshold changed (in percentages) over the course of the training day?

      We did not track the recruitment threshold throughout the session but only at the beginning and end. We could not identify any critical changes in the recruitment order (see Results section). However, our analysis indicated that during direct movements towards TIII, MU2 (higher threshold MU) was recruited at a lower force level during the initial ramp and thus had a temporary effective recruitment threshold below MU1. It is important to note that these direct movements towards TIII only occurred for pairs of MUs with a similar recruitment threshold (see Figure 6). One possible explanation for this temporal change in recruitment threshold could be altered excitability due to neuromodulatory effects such as PICs (see Discussion). We have added an analysis that shows that direct movements towards TIII occurred in most cases (>90%) after a preceding TII- or TIIIinstructed trial. Both of these targets-of-interest require activation of MU2. Thus, direct movement towards TIII was likely not the result of specific descending control. Instead, this analysis suggests that the PIC effect triggered at the preceding trial was not entirely extinguished when a trial ending in direct movement towards TIII started. Alternatively, the rare scenarios in which direct movements happened could be entirely random. Similar observations were made in previous biofeedback studies [31]. To clarify these points, we altered the manuscript.

    1. eLife Assessment

      This valuable study describes an interesting infection phenotype that differs between adult male and female zebrafish. The authors present data indicating that male-biased expression of Cyp17a2 mediates viral infection through STING and USP8 activity regulation. The authors present solid evidence linking this factor to direct and indirect antiviral outcomes through ubiquitination pathways. These findings raise interesting questions about immune mechanisms that underlie sex dimorphism and the selective pressures that might shape it.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript Lu & Cui et al. observe that adult male zebrafish are more resistant to infection and disease following exposure to Spring Viremia of Carp Virus (SVCV) than female fish. The authors then attempt to identify some of the molecular underpinnings of this apparent sexual dimorphism and focus their investigations on a gene called cytochrome P450, family 17, subfamily A, polypeptide 2 (cyp17a2) because it was among genes that they found to be more highly expressed in kidney tissue from males than in females. Their investigations lead them to propose a direct connection between cyp17a2 and modulation of interferon signaling as the key underlying driver of difference between male and female susceptibility to SVCV.

      Strengths:

      Strengths of this study include the interesting observation of a substantial difference between adult male and female zebrafish in their susceptibility to SVCV, and also the breadth of experiments that were performed linking cyp17a2 to infection phenotypes and molecularly to the stability of host and virus proteins in cell lines. The authors place the infection phenotype in an interesting and complex context of many other sexual dimorphisms in infection phenotypes in vertebrates. This study succeeds in highlighting an unexpected factor involved in antiviral immunity that will be an important subject for future investigations of infection, metabolism, and other contexts.

      Weaknesses:

      Weaknesses of this study include a proposed mechanism underlying the sexual dimorphism phenotype based on experimentation in only males, and widespread reliance on over-expression when investigating protein-protein interaction and localization. Additionally, a minor weakness is that the text describing the identification of cyp17a2 as a candidate contains errors that are confusing. For example:

      - Lines 139-140 describe the data for Figure 2 as deriving from "healthy hermaphroditic adult zebrafish". This appears to be a language error and should be corrected to something that specifies that the comparison made is between healthy adult male and female kidneys.

      - In Figure 2A and associated text cyp17a2 is highlighted but the volcano plot does not indicate why this was an obvious choice. For example, many other genes are also highly induced in male vs female kidneys. Figure 2B and line 143 describe a subset of "eight sex-related genes" but it is not clear how these relate to Figure 2A. The narrative could be improved to clarify how cyp17a2 was selected from Figure 2A and it seems that the authors made an attempt to do this with Figure 2B but it is not clear how these are related. This is important because the available data do not rule out the possibility that other factors also mediate the sexual dimorphism they observed either in combination, in a redundant fashion, or in a more complex genetic fashion. The narrative of the text and title suggests that they consider this to be a monogenic trait but more evidence is needed.

    3. Reviewer #2 (Public review):

      This study conducted by Lu et al. explores the molecular underpinnings of sexual dimorphism in antiviral immunity in zebrafish, with a particular emphasis on the male-biased gene cyp17a2. The authors demonstrate that male zebrafish exhibit stronger antiviral responses than females, and they identify a teleost-specific gene cyp17a2 as a key regulator of this dimorphism. Utilizing a combination of in vivo and in vitro methodologies, they demonstrate that Cyp17a2 potentiates IFN responses by stabilizing STING via K33-linked polyubiquitination and directly degrades the viral P protein via USP8-mediated deubiquitination. The work challenges conventional views of sex-based immunity and proposes a novel, hormone- and sex chromosome-independent mechanism.

      Strengths:

      (1) The following constitutes a novel concept, sexual dimorphism in immunity can be driven by an autosomal gene rather than sex chromosomes or hormones represents a significant advance in the field, offering a more comprehensive understanding of immune evolution.

      (2) The present study provides a comprehensive molecular pathway, from gene expression to protein-protein interactions and post-translational modifications, thereby establishing a link between Cyp17a2 and both host immune enhancement (via STING) and direct antiviral activity (via viral protein degradation).

      (3) In order to substantiate their claims, the authors utilize a wide range of techniques, including transcriptomics, Co-IP, ubiquitination assays, confocal microscopy, and knockout models.

      (4) The utilization of a singular model is imperative. Zebrafish, which are characterized by their absence of sex chromosomes, offer a clear genetic background for the dissection of autosomal contributions to sexual dimorphism.

      Weaknesses:

      (1) Limited discussion on whether this mechanism extends beyond Cyprinidae and its implications for teleost adaptation.

      Comments on revisions:

      The authors successfully achieved their primary aim, which was to identify and characterize a male-biased gene governing antiviral sexual dimorphism in fish. The data provide robust support for the conclusion that Cyp17a2 enhances antiviral immunity through dual mechanisms, STING stabilization and viral protein degradation, independent of classical sex-determining pathways. The findings are consistent across a range of experimental setups and are statistically robust. The revisions have significantly enhanced the clarity, depth, and overall quality of the manuscript. The authors have addressed each concern meticulously, resulting in a much-improved and robust article. No further suggestions are offered.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Figure 10 outlines a mechanistic link between cyp17a2 and the sexual dimorphism the authors report for SVCV infection outcomes. The data presented on increased susceptibility of cyp17a2-/- mutant male zebrafish support this diagram, but this conclusion is fairly weak without additional experimentation in both males and females. The authors justify their decision to focus on males by stating that they wanted to avoid potential androgen-mediated phenotypes in the cpy17a2 mutant background (lines 152156), but this appears to be speculation. It also doesn't preclude the possibility of testing the effects of increased cyp17a2 expression on viral infection in both males and females. This is of critical importance if the authors intend to focus the study on sexual dimorphism, which is how the introduction and discussion are currently structured.

      Thank you for your suggestion. We have revised the relevant statements in the introduction and discussion sections accordingly. The cyp17a2 overexpression experiments were not conducted in both male and female individuals was primarily based on two reasons. First, our laboratory currently lacks the technical capability to achieve cyp17a2 overexpression at the organismal level, existing methodologies are limited to gene knockout via CRISPR-Cas9. Second, even if overexpression were feasible, subsequent comparisons would need to be restricted within sexes (i.e., female vs. female controls or male vs. male controls) to eliminate potential confounding effects of sex hormones. Such experimental outcomes would only demonstrate the antiviral function of Cyp17a2 itself rather than directly elucidate mechanisms underlying sexual dimorphism, which diverges from the central objective of this study.

      We fully agree with your perspective and have accordingly refined relevant discussions in the revised manuscript. Our conclusions now emphasize that "cyp17a2 is one of the factors contributing to sex-based differences in antiviral immunity" rather than implying that it "solely mediates the entire phenotypic divergence." These modifications have been incorporated into the resubmitted version (Lines 112-115).    

      (2) The authors present data indicating an unexpected link between cyp17a2 and ubiquitination pathways. It is unclear how a CYP450 family member would carry out such activities, and this warrants much more attention. One brief paragraph in the discussion (starting at line 448) mentions previous implications of CYP450 proteins in antiviral immunity, but given that most of the data presented in the paper attempt to characterize cyp17a2 as a direct interactor of ubiquitination factors, more discussion in the text should be devoted to this topic. For example, are there any known domains in this protein that make sense in this context? Discussion of this interface is more relevant to the study than the general overview of sexual dimorphism that is currently highlighted in the discussion and throughout the text.

      We are grateful to the reviewer for their suggestion to elaborate on this novel finding. The discussion on this point has been expanded significantly (Lines 448-460). It is acknowledged that Cyp17a2 is devoid of the canonical domains that are typically associated with the ubiquitination machinery (e.g., RING, U-box). The present study proposes that the endoplasmic reticulum (ER) localization of Cyp17a2, in conjunction with its capacity to function as a scaffold protein, is of paramount significance. By residing in the ER, Cyp17a2 is strategically positioned to interact with key immune regulators such as STING, which also localizes to the ER. It is hypothesized that Cyp17a2 facilitates the recruitment of E3 ligases (btr32) and deubiquitinates (USP8) to their substrates (STING and SVCV P protein, respectively) by providing a platform for protein-protein interactions, rather than directly catalyzing ubiquitination. This noncanonical, scaffolding role for a cytochrome P450 (CYP450) enzyme represents an exciting evolutionary adaptation in teleost immunity.

      (3) Figures 2-9 contain information that could be streamlined to highlight the main points the authors hope to make through a combination of editing, removal, and movement to supplemental materials. There is a consistent lack of clarity in these figures that could be improved by supplementing them with more text to accompany the supplemental figures. Using Figure 2 and an example, panel (A) could be removed as unnecessary, panel (B) could be exchanged for a volcano plot with examples highlighting why cyp17a2 was selected for further study and also the full dataset could be shared in a supplemental table, panel (C) could be modified to indicate why that particular subset was chosen for plotting along with an explanation of the scaling, panel (D) could be moved to supplemental because the point is redundant with panels (A) and (C), panel (E) could be presented as a heatmap, in panels (G) and (H) data from EPC cells could be moved to supplemental because it is not central to the phenotype under investigation, panels (J) to (L) and (N) to (P) could be moved to supplemental because they are redundant with the main points made in panels (M) and (Q). Similar considerations could be made with Figures 3-9.

      We thank the reviewer for these excellent suggestions to improve the clarity and focus of our figures. A comprehensive review of all figures has been conducted in accordance with the recommendations made. Figure 2A has been removed. Figure 2B (revised Figure 2A) has been replaced with a volcano plot highlighting cyp17a2 and the full dataset has been provided as supplementary Table S2. Figure 2C (revised Figure 2B) is now a heatmap with eight sex-related genes and an explanation of the scaling has been added to the revised figure legends. Several panels (D, G, H, J-L, N-P) have been moved to the supplementary information (now Figure S1). Figure 2E has been presented as a heatmap. The same approach to streamlining has been applied to Figures 3-9, with confirmatory or secondary data being moved to supplements in order to better emphasize the main conclusions. The figure legends and main text have been updated accordingly.

      (4) The data in Figure 3 (A)-(C) do not seem to match the description in the text. That is, the authors state that cyp17a2 overexpression increases interferon signaling activity in cells, but the figure shows higher increases in vector controls. Additionally, the data in panel (H) are not described. What genes were selected and why, and where are the data on the rest of the genes from this analysis? This should be shared in a supplemental table.

      We apologize for the lack of clarity. In Figures 3A-C, the vector control shows baseline activation due to the stimulants (poly I:C/SVCV), but the fold-increase is significantly greater in the Cyp17a2-overexpressing groups. We have re-plotted the data to more clearly represent the stimulant-induced activation over baseline and added statistical comparisons between the Vector and Cyp17a2 groups under each condition to highlight the enhancing effect of Cyp17a2. For Figure 3H (revised Figure 3F), the heatmap shows a curated set of IFN-stimulated genes (ISGs) most significantly regulated by Cyp17a2 based on our RNA-seq analysis. We have added a description in the revised figure legend and in the results section (Lines 837-840). The full list of differentially expressed genes from this analysis is now provided in Supplementary Table S3.

      (5) Some of the reagents described in the methods do not have cited support for the applications used in the study. For example, the antibody for TRIM11 (line 624, data in Figures 6 & 7) was generated for targeting the human protein. Validation for use of this reagent in zebrafish should be presented or cited. Furthermore, the accepted zebrafish nomenclature for this gene would be preferred throughout the text, which is bloodthirsty-related gene family, member 32.

      We thank the reviewer for raising this important point regarding reagent specificity. To address the concern about antibody validation in zebrafish, we performed the following verification steps. First, we aligned the antigenic sequence targeted by the Abclonal btr32 antibody (ABclonal, A13887) with orthologous sequences from zebrafish, which showed 45% protein sequence similarity (Author response image 1). More importantly, we conducted experimental validation by expressing Myc-tagged btr32 in EPC cells. Both the anti-Myc and the anti-btr32 antibodies detected a protein band at the same molecular weight. Furthermore, when a btr32-specific knockdown plasmid was introduced, the band recognized by the anti-btr32 antibody was significantly reduced (Author response image 2). These results support the specificity of the antibody in recognizing fish btr32. In accordance with the reviewer’s suggestion, we have also updated the gene nomenclature to “bloodthirsty-related gene family, member 32 (btr32)” throughout the manuscript.

      Author response image 1.

      Author response image 2.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Colocalization analyses (Figures 4G, 6I, 9D) require quantitative metrics (e.g., Pearson's coefficients) rather than representative images alone.

      We concur with the reviewer's assessment. We have now performed quantitative colocalization analysis (Pearson's coefficients) for all indicated figures (4G, 6I, 9D). The quantitative results are now presented within the figures themselves and described in the revised figure legends.

      (2) Figure 1 survival curves need annotated statistical tests (e.g., "Log-rank test, p=X.XX")

      The survival curves have now been annotated with the specific p-values from the Log-rank (Mantel-Cox) test (see revised Figures 1A, 2E).

      (3) Figure 2P GSEA should report exact FDR-adjusted *p*-values (not just "*p*<0.05").

      Figure 2P (revised Figure S1J) has been updated to include the exact FDR p-values for the presented GSEA plots.

      (4) Section 2 overextends on teleost sex-determination diversity, condensing to emphasize relevance to immune dimorphism would strengthen narrative cohesion.

      The section on teleost sex-determination diversity in the Discussion (lines 357-365) has been condensed, with a more direct focus on how this diversity provides a unique context for studying immune dimorphism independent of canonical sex chromosomes, as exemplified by the zebrafish model.

      (5) Limited discussion on whether this mechanism extends beyond Cyprinidae and its implications for teleost adaptation.

      The discussion has been expanded (lines 375-386) to address the potential conservation of this mechanism. It is acknowledged that cyp17a2 is a teleost-specific gene, and it is hypothesized that its function in antiviral immunity may signify an adaptive innovation within this extensively diverse vertebrate group. It is suggested that further research in other teleost families will be essential to ascertain the broader evolutionary significance of the present findings.

      Reviewer #2 (Recommendations for the authors):

      (1) Expand the Discussion to address why teleosts may have evolved male-biased immunity. Consider: pathogen pressure differentials in aquatic vs. terrestrial environments; trade-offs between immune investment and reproductive strategies (e.g., male-male competition); comparative advantages in external fertilization systems.

      We have expanded the discussion on lines 412-430, to address the potential conservation of this mechanism. We note that Cyp17a2 is a teleost-specific gene and speculate that its role in antiviral immunity represents an adaptive innovation within this highly diverse group of vertebrates. We propose that future studies of other teleost families are crucial for determining the broader evolutionary significance of our findings.

    1. eLife Assessment

      This important study reports the development of the first tankyrase degrader and demonstrates its enhanced ability to inhibit β-catenin signaling compared to conventional tankyrase inhibitors. The evidence supporting the conclusions is comprehensive and convincing, based on rigorous biochemical and cellular analyses. The findings will be of broad interest to researchers studying Wnt signaling, protein degradation, and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript reports the discovery and characterization of the first bifunctional degrader of tankyrase. Notably, the tankyrase degrader exhibits stronger β-catenin inhibition and tumor growth suppression compared to conventional tankyrase inhibitors. Mechanistically, while tankyrase inhibitors stabilize tankyrase and promote Axin puncta formation - thereby impairing β-catenin degradation - the degrader avoids this effect, resulting in deeper suppression of β-catenin signaling. These findings suggest that targeted degradation of tankyrase offers a novel therapeutic strategy for β-catenin-driven cancers. Overall, this is a compelling study with significant translational potential.

      Strengths:

      (1) The manuscript presents a rigorous and well-executed study on a timely and impactful topic.

      (2) The biochemical and cellular characterization of the tankyrase degrader is thorough, and the comparative analysis with tankyrase inhibitors is insightful.

      (3) The finding that tankyrase stabilization by inhibitors may interfere with Axin function is novel and significant. It aligns with earlier observations (e.g., Huang 2009) that transient tankyrase overexpression can stabilize β-catenin independently of PAR domain activity.

      (4) The use of TNKS1/2 knockout cells expressing catalytically inactive tankyrase to demonstrate β-catenin inhibitory activity of the tankyrase degrader is elegant.

      (5) The finding that the tankyrase degrader has superior anti-proliferative effects in colorectal cancer models has important therapeutic implications.

      Weaknesses:

      (1) A key caveat is that the identified tankyrase degrader also targets GSPT1 for degradation. This raises the possibility that GSPT1 degradation may contribute to the observed β-catenin and tumor growth inhibition.

      (2) The authors address this concern reasonably by showing that DLD1 cells resistant to GSPT1 degradation remain sensitive to the tankyrase degraded.

      (3) To further strengthen this point, the authors might consider generating TNKS1/2 double knockout cells (e.g., in DLD1 or SW480 backgrounds) and demonstrating that the degrader loses its growth-inhibitory effect in these models. However, given the technical challenges of creating double knockouts in cancer cell lines, such experiments could be considered optional.

    3. Reviewer #2 (Public review):

      Summary:

      The ADP-ribosyltransferase tankyrase controls many biological processes, many of which are relevant to human disease. This includes Wnt/beta-catenin signalling, which is dysregulated in many cancers, most notably colorectal cancer. Tankyrase is a positive regulator of Wnt/beta-catenin signalling in that it counters the activity of the beta-catenin destruction complex (DC). Catalytic inhibition of tankyrase not only blocks PAR-dependent ubiquitylation and degradation of AXIN1/2, the central scaffolding protein in the DC, but also tankyrase itself. As a result, blocking tankyrase gives rise to tankyrase accumulation, which may accentuate its non-catalytic functions, which have been proposed to drive Wnt/beta-catenin signalling. Most tankyrase catalytic inhibitors have shown limited efficacy and substantial toxicity in vivo. By developing tankyrase-directed PROTACs, the authors aim to block both catalytic and non-catalytic functions of tankyrase, aspiring to achieve a more complete inhibition of Wnt/beta-catenin signalling. The successfully developed PROTAC, based on the existing catalytic inhibitor IWR1, IWR1-POMA, induces the degradation of both TNKS and TNKS2, blocks beta-catenin-dependent transcription without stabilising the DC in puncta/degradasomes, and inhibits cancer cell growth in vitro. Mechanistically, this points to a scaffolding role of tankyrase in the DC, at least under conditions of tankyrase catalytic inhibition, in line with previous proposals.

      Strengths:

      The study clearly illustrates the incentive for developing a tankyrase degrader, namely, to abolish both catalytic and non-catalytic functions of tankyrase. By and large, the study achieves these ambitions, and the findings support the main conclusions, although the statement that a more complete inhibition of the pathway is achieved requires corroboration. The proteomics studies are powerful. IWR1-POMA constitutes a very useful tool to re-evaluate targeting of tankyrase in oncogenic Wnt/beta-catenin signalling. The paired compounds will benefit investigations of tankyrase scaffolding functions across many different biological systems controlled by tankyrase. The findings are exciting.

      Weaknesses:

      Although the results are promising and mostly compelling, the claim that the PROTACs provide "a deeper suppression of the WNT/β-catenin pathway activity" requires further corroboration, particularly at endogenous tankyrase levels.

      There are also some other points that, if considered, would further improve the manuscript, as detailed below.

      (1) Abstract and line 62: Many catalytic tankyrase inhibitors tend to display toxicity, which is likely on-target (e.g., 10.1177/0192623315621192; 10.1158/0008-5472). This constitutes the main limiting factor for these compounds. An incomplete inhibition of Wnt/beta-catenin signalling may contribute to the challenges, but this does not appear to be the dominant problem. A more prominent introduction to this important challenge is probably expected by the field.

      (2) The authors do a good job in setting the scene for the need for tankyrase degraders. Their observations relating to the formation of puncta (degradasomes) being tankyrase-dependent are compatible with a previous study by Martino-Echarri et al. 2016 (10.1371/journal.pone.0150484): simultaneous silencing of TNKS and TNKS2 by RNAi abolishes degradasome formation. The paper is cited as reference 17, but only in passing, and deserves more prominence. (It includes an entire paragraph titled "Expression of tankyrases 1 and 2 is required for TNKSi-induced formation of axin puncta").

      (3) Moreover, the scaffolding concept has been discussed comprehensively in other studies: 10.1111/bph.14038 and more recently 10.1042/BCJ20230230. There are also a few studies that focus on targeting the ankyrin repeat clusters of tankyrase to disengage substrates (10.1038/s41598-020-69229-y; 10.1038/s41598-019-55240-5) that illustrate the concept of blocking the scaffolding function. In that sense, the hypotheses are mature, and it is interesting to see some of them supported in this study. The authors could improve how they set their work into the context of these other efforts and proposals.

      (4) In several places in the manuscript, the DC is referred to as "biomolecular condensate", at times even as a "classic example", implying that it operates through phase separation. This has not been demonstrated. In fact, super-resolution microscopy indicates that the puncta are not droplet-like (10.7554/eLife.08022), which would argue against the condensate hypothesis.

      (5) It is beautiful to be able to use IWR1 and IWR1-POMA at identical concentrations for direct comparisons. However, this requires the two compounds to bind to tankyrase similarly well and reach the target to a comparable extent. How sure are authors that target engagement is comparable? Has this been evaluated?

      (6) Figure 1F: It is not immediately apparent how IWR1-POMA shows more complete containment of Wnt/beta-catenin signalling. Most Wnt/beta-catenin targets lie close to the perfect diagonal, so I do not see how the statement "that IWR1-POMA controlled WNT/β-catenin signaling more effectively than IWR1" (in the legend of Figure 1F) is supported. Minimally, an expanded explanation would benefit the reader. Providing the colour-coding legend directly in the figure would help improve clarity. Also, the panel is very small and may benefit from a different presentation in the figure.

      (7) Figure 2: The conclusion of a "deeper suppression" of signalling relies on overexpression of tankyrase in an otherwise tankyrase-null background. Have the authors attempted to measure reporter activity or endogenous gene expression without tankyrase overexpression, in Wnt3a-stimulated cells (in the context of a normal Wnt/beta-catenin pathway) or CRC cells at the basal level? Non-catalytic activity in a similar assay has previously been observed upon tankyrase overexpression (10.1016/j.molcel.2016.06.019). Whether or not there is a substantial scaffolding effect at endogenous tankyrase levels after tankyrase inhibition remains unconfirmed, and the PROTAC is a valuable tool to address this important question. The findings presented in Figure S7C and D go some way towards answering this question - these data could be presented more prominently, and similar assays could be performed in other cell systems.

      (8) Line 237/238: "TNKS accumulation negatively impacts the catalytic activity of the DC (Figure 5D)" - the data do not show this. Beta-catenin levels are a surrogate readout for DC function (phosphorylation and ubiquitylation). Minimally, this requires rewording, with reference to beta-catenin levels.

      (9) Line 303-304: Beta-catenin is thought to exchange at beta-catenin degradasomes; this is clear from previous FRAP assays and the observation that phospho-beta-catenin accumulates in degradasomes upon proteasome inhibition (10.1158/1541-7786.MCR-15-0125). However, degradasome size hasn't, to my knowledge, been related to activity. Can this be clarified, please?

      (10) There are previous hypotheses/proposals that the sensitivity of CRC cells to tankyrase inhibition correlates with APC truncation or PIK3CA status (10.1158/1535-7163.MCT-16-0578; 10.1038/s41416-023-02484-8). Have the authors considered expanding their cell line panel (Figure S7) to sample a wider range of cell lines, including some that are wild-type with regard to APC or Wnt/beta-catenin signalling in general? This would be a valuable addition to the work. Quantitated colony formation data could be moved to the main body of the manuscript.

      (11) The manuscript only mentions toxicity (i.e., therapeutic window) in the last sentence of the Discussion section. As this is THE main challenge with tankyrase inhibitors (as mentioned above), can the authors expand their discussion of this aspect? Is there an expectation that PROTACs may be less toxic?

      (12) Figures 3, 4, 5A: For fluorescence microscopy experiments, can these be quantified, and can repeat data be included?

      (13) Figure 4, S6: An additional channel illustrating the distribution of cells (e.g., nuclei, cytoskeleton, or membrane) would be helpful for orientation and context for the AXIN1 signal.

      (14) How were cytosolic fractions of cells prepared to assess cytosolic beta-catenin levels? This detail is missing from the methods.

    4. Reviewer #3 (Public review):

      In this manuscript, Wang et al employ a chemical biology approach to investigate the differences between the enzymatic and scaffolding roles of tankyrase during Wnt β-catenin signalling. It was previously established that, in addition to its enzymatic activity, tankyrase 1/2 also plays a scaffolding function within the destruction complex, a property conferred by SAM-domain-dependent polymerization (PMID: 27494558). It is also known that TNKS1/2 is an autoregulated protein and that its enzymatic inhibition leads to accumulation of total TNKS proteins and stabilization of Axin punctae (through the scaffolding function of TNKS1/2), leading to rigidification of the DC and decreased β-catenin turnover. The authors surmised that this could, in part, explain the limited efficacy of TNKS1/2 catalytic inhibition for the treatment of colorectal cancers. To test this hypothesis, they evaluated a series of PROTAC molecules promoting the degradation of TNKS1/2 to block both the catalytic and scaffolding activities. They show that IWR1-POMA (their most active molecule) promotes more efficient suppression of beta-catenin-mediated transcription and is more active in inhibiting colorectal cancer cell and CRC patient-derived organoids growth. Mechanistically, the authors used FRAP to demonstrate that catalytic inhibitors of TNKS led to a reduced dynamic assembly of the DC (rigidification), whereas IWR1-POMA did not affect the dynamics.

      Overall, this is an interesting study describing the design and development of a PROTAC for TNKS1/2 that could have increased efficacy where catalytic inhibitors have displayed limited activity. Knowing the importance of the scaffolding role of TNKS1/2 within the destruction complex, targeting both the catalytic and scaffolding roles certainly makes sense. The manuscript contains convincing evidence of the different mechanisms of the PROTAC vs catalytic inhibitors. Some additional efforts to quantify several of the experiments and to indicate the reproducibility and statistical analysis would strengthen the manuscript. Ultimately, it would have been great to evaluate the in vivo efficacy of IWR1-POMA in an in vivo CRC assay (APCmin mice or using PDX models); however, I realize that this is likely beyond the scope of this manuscript.

      I have some recommendations listed below for consideration by the authors to strengthen their study:

      (1) The title is slightly misleading, as it is already known that the scaffolding function of TNKS is important within the DC. The authors should consider incorporating the PROTAC targeting aspect in the title (e.g., PROTAC-mediated targeting of tankyrase leads to increased inhibition of betacat signaling and CRC growth inhibition).

      (2) The authors should comment in the manuscript on the bell-shaped curve obtained with treatment of cells with the PROTACs (Figure S2C). This likely indicates tittering of the targets within a bifunctional molecule with increasing concentration (and likely reveals the auto-inhibition conferred by the catalytic inhibition alone).

      (3) The authors comment that using G007-LK as warehead was unsuccessful, but they do not show data. Do the authors know why this was the case?

      (4) Throughout the manuscript, the authors need to do a better job at quantifying their results (i.e., the western blots and the IF). For example, the degradation of TNKS1/2 in Figure 1D is not overly convincing. Similarly, the IF data in Figure 3 needs to be quantified in some ways. Along the same lines, the effect of IWR1-POMA treatments on the proliferation of cells and organoids should be quantified using viability assays... There is also no indication of how many times these experiments were performed and whether the blots shown are representative experiments. The quantification should include all experiments.

    5. Author response:

      Reviewer #1 (Public Review):

      We thank the Reviewer for the favorable feedback. The major concern is the collateral degradation of GSPT1. As the Reviewer noted, IWR1-POMA was able to suppress colony formation in DLD-1 cells resistant to GSPT1/2 degrader, suggesting that TNKS but not GSPT degradation is responsible for growth inhibition.

      We also appreciate that the Reviewer brought it to our attention an important early observation of the TNKS scaffolding effects. Cong reported in 2009 that overexpression of TNKS induced AXIN puncta formation in a SAM but not PARP domain-dependent manner (PMID 19759537). We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      We thank the Reviewer for the encouraging and insightful comments. The major critique concerns whether TNKS degraders can suppress WNT/β-catenin signaling more effectively than TNKS inhibitors at endogenous TNKS levels. Fig. 1D shows that IWR1-POMA reduced the level of cytosolic β-catenin more effectively than IWR1 in Wnt3A-stimulated HEK293 cells without protein overexpression, and Fig. S7B shows that IWR1-POMA reduced STF signals more effectively than IWR1 in DLD-1 and SW480 cells with endogenous TNKS expression. We will corroborate these findings with additional cell lines during the revision.

      (1) We agree with the Reviewer that on-target toxicities pose challenges to the development of WNT inhibitors. For example, LGK974 that inhibits PORCN to prevent the secretion of all WNT proteins showed significant on-target toxicity in human (PMC10020809), and G007-LK that inhibits TNKS to block canonical WNT signaling selectively exhibited weak efficacy and dose-limiting toxicity at 5‒30 mg/kg BID or 10‒60 mg/kg QD in various mouse xenograft models (PMID: 23539443). Similarly, G-631, another TNKS inhibitor, also showed dose-limiting toxicity without significant efficacy at 25‒100 mg/kg QD in mice (PMID: 26692561). However, G007-LK was well-tolerated at 200 mg/kg QD over 3 weeks in mice in another study (PMC5759193). Treating mice with G007-LK at 10 mg/kg QD over 6 months also improved glucose tolerance without notable toxicity (PMID 26631215). Importantly, constitutive silencing of both TNKS for 150 days in APC-null mice prevented tumorigenesis without damaging the intestines (PMC6774804). Furthermore, basroparib, a selective TNKS inhibitor, was well tolerated in a recent clinical trial (PMC12498271). We are therefore cautiously optimistic that TNKS degraders will have an improved therapeutic index compared with TNKS inhibitors.

      (2) We agree with the Reviewer that Henderson's 2016 paper (PMC4773256) shed important light on the role of TNKS scaffolding in the DC. However, whereas this study demonstrated that knocking down both TNKS by siRNA prevented G007-LK to induce AXIN puncta, the function role of TNKS scaffolding in the DC remained unaddressed. We will include a more detailed description on Henderson's discovery.

      (3) Indeed, Guettler demonstrated that TNKS scaffolding could promote WNT/β-catenin signaling in 2016, which forms the basis of the current work. Meanwhile, whereas there have been efforts to target the SAM or ARC domain to address TNKS scaffolding, our approach of targeting TNKS for degradation is complementary. We will provide a more detailed discussion of these studies.

      (4) Biomolecular condensates are membrane less cellular compartments formed by phase separation of biomolecules, regardless of the physical/material properties (PMID: 28935776 and PMC7434221). Super-resolution microscopy studies by Peifer and Stenmark (PMC4568445 and PMID 26124443) showed that AXIN, APC, TNKS, and β-catenin interacted with each other to assemble into membraneless complexes, wherein AXIN and APC formed filaments throughout the DC. Peifer has also summarized evidence that supports the condensate nature of the DC (PMC6386181). However, we acknowledge that testing the physical properties of reconstituted DC (PMC8403986) will provide a better understanding of the nature, for example liquid vs. gel, of these condensates.

      (5) We will evaluate the ability of IWR1 and IWR1-POMA to engage TNKS.

      (6) We will modify Fig. 1F to improve clarity and readability.

      (7) Fig. S7B shows that IWR1-POMA suppressed WNT/β-catenin signaling more effectively than IWR1 in APC-mut DLD-1 and SW480 CRC cells without TNKS overexpression. Similarly, Fig. S6B shows that IWR1-POMA provided a deeper suppression of STF signals in HeLa cells transfected with AXIN1 and β-catenin while expressing endogenous TNKS. These results provide evidence that inhibitor-induced TNKS scaffolding plays a significant role at endogenous TNKS expression levels. Separately, we will reorganize the figures to better present Fig. 7C and D as suggested by the Reviewer.

      (8) We will rephrase "TNKS accumulation negatively impacts the catalytic activity of the DC".

      (9) We apologize for confusing β-catenin phosphorylation with β-catenin abundance. Here, we refer the catalytic activity of the DC to as the ability of the DC to promote β-catenin degradation rather than the kinetics of β-catenin phosphorylation and ubiquitination. It is commonly observed that AXIN stabilization by TNKS inhibitors increases the DC size and reduces the β-catenin levels. Peifer has also noted that APC can increase the size and the "effective activity" of the DC (PMC5912785 and PMC4568445). As such, the induction of AXIN puncta by TNKS inhibitors is frequently used as an indicator of WNT/β-catenin pathway inhibition. However, because the DC only primes β-catenin but does not catalyze its degradation, we will revise our manuscript to improve accuracy and clarity.

      (10) We will examine the effects of IWR1 and IWR1-POMA in additional cell lines, quantify the colony formation data, and reorganize the figures.

      (11) As discussed above, evidence for on-target toxicity of WNT/β-catenin inhibition is mixed. Yet, the observation of no dose-limiting toxicity for basroparib at doses up to 360 mg QD in human (PMC12498271) is encouraging. PROTAC works by catalyzing target degradation, which is different from traditional catalytic inhibitors that require continuous target occupancy at a high level. Because IWR1-POMA has a durable effect on TNKS, we expect that a fully optimized TNKS degrader may allow less frequent dosing than basroparib and consequently an even more favorable therapeutic window.

      (12/13) We will include quantification data, replicate information, and nuclei staining or cell outlines for the fluorescence microscopy experiments.

      (14) Cytosolic fractions of cells were prepared using a commercial cytoplasmic extraction kit following manufacturer's instructions. We will include detailed information in the revised manuscript.

      Reviewer #3 (Public Review):

      We thank the Reviewer for the helpful suggestions.

      (1) We will modify the title to include the PROTAC aspect.

      (2) As the Reviewer suggested, the bell-shaped dose response of the PROTAC originated from the formation of saturated binary complexes. At high PROTAC concentrations, binding of TNKS and CRBN/VHL by separate PROTAC molecules impedes the formation of productive ternary complexes, which results in reduced degradation efficacy and consequently the hook effect.

      (3) The structure-activity relationship of PROTACs is often unpredictable, as both the kinetics and thermodynamics of the target and E3 ligase binding play crucial roles. The lack of translation in degradation efficacy from IWR1 to G007-LK derived PROTACs may originate from differences in the binding kinetics or subtle changes in the orientation of the linker exit vector. We will include data on G007-LK in the revised manuscript.

      (4) We will quantify the Western blots, immunofluorescence images, colony formation data, and the replicate information.

    1. eLife Assessment

      This important study provides a theoretical framework for quantifying privacy risk from publicly shared genome-wide association summary statistics. The findings reveal the conditions under which genotype reconstruction may become feasible, challenging long-held assumptions about personal data safety. While the evidence is solid, supported by clear mathematical derivations and simulations, validation on large empirical datasets would further strengthen the claims.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to demonstrate that GWAS summary statistics, previously considered safe for open sharing, can, under certain conditions, be used to recover individual-level genotypes when combined with large numbers of high-dimensional phenotypes. By reformulating the GWAS linear model as a system of linear programming constraints, they identify a critical phenotype-to-sample size ratio (R/N) above which genotype reconstruction becomes theoretically feasible.

      Strengths:

      There is conceptual originality and mathematical clarity. The authors establish a fundamental quantitative relationship between data dimensionality and privacy leakage and validate their theory through well-designed simulations and application to the GTEx dataset. The derivation is rigorous, the implementation reproducible, and the work provides a formal framework for assessing privacy risks in genomic research.

      Weaknesses:

      The study simplifies assumptions that phenotypes are independent, which is not the truth, and are measured without noise. Real-world data are highly correlated across different levels, not only genotype but also multi-omics, which may overstate recovery potential. The empirical evidence, while illustrative, is limited to small-scale data and idealized conditions; thus, the full practical impact remains to be demonstrated. GTEx analysis used only whole blood eQTL data from 369 individuals, which cannot capture the complexity, sample heterogeneity, or cross-tissue dependencies typical of biobank-scale studies.

    3. Reviewer #2 (Public review):

      Summary:

      This study focuses on the genomic privacy risks associated with Genome-Wide Association Study (GWAS) summary statistics, employing a three-tiered demonstration framework of "theoretical derivation - simulation experiments - real-data validation". The research finds that when GWAS summary statistics are combined with high-dimensional phenotypic data, genotype recovery and individual re-identification can be achieved using linear programming methods. It further identifies key influencing factors such as the effective phenotype-to-sample size ratio (R/N) and minor allele frequency (MAF). These findings provide practical reference for improving data governance policies in genomic research, holding certain real-world significance.

      Strengths:

      This study integrates theoretical analysis, simulation validation, and the application of real-world datasets to construct a comprehensive research framework, which is conducive to understanding and mitigating the risk of private information leakage in genomic research.

      Weaknesses:

      (1) Limited scope of variant types covered:

      The analysis is conducted solely on Single Nucleotide Polymorphisms (SNPs), omitting other crucial genomic variant types such as Copy Number Variations (CNVs), Insertions/Deletions (InDels), and chromosomal translocations/inversions. From a genomic structure perspective, variants like CNVs and InDels are also core components of individual genetic characteristics, and in some disease-related studies, association signals for these variants can be even more significant than those for SNPs. From the perspective of privacy risk logic, the genotypes of these variants (e.g., copy number for CNVs, base insertion/deletion status for InDels) can also be quantified and could theoretically be inferred backwards using the combination of "summary statistics + high-dimensional phenotypes". Their privacy leakage risks might differ from those of SNPs (for instance, rare CNVs might be more easily re-identified due to higher genetic specificity).

      (2) Bias in data applicability scope:

      Both the simulation experiments and real-data validation in the study primarily rely on European population samples (e.g., 489 European samples from the 1000 Genomes Project; the genetic background of whole blood tissue samples from the GTEx project is not explicitly mentioned regarding non-European proportions). It only briefly notes a higher risk for African populations in the individual re-identification risk assessment, without conducting systematic analyses for other populations, such as East Asian, South Asian, or admixed American populations. Significant differences in genetic structure (e.g., MAF distribution, linkage disequilibrium patterns) exist across different populations. This may result in the R/N threshold and the relationship between MAF and recovery accuracy identified in the study not being fully applicable to other populations

      Hence, addressing the aforementioned issues through supplementary work would enhance the study's scientific rigor and application value, potentially providing more comprehensive theoretical and technical support for "privacy protection" in genomic data sharing.

    4. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to demonstrate that GWAS summary statistics, previously considered safe for open sharing, can, under certain conditions, be used to recover individual-level genotypes when combined with large numbers of high-dimensional phenotypes. By reformulating the GWAS linear model as a system of linear programming constraints, they identify a critical phenotypeto-sample size ratio (R/N) above which genotype reconstruction becomes theoretically feasible.

      Strengths:

      There is conceptual originality and mathematical clarity. The authors establish a fundamental quantitative relationship between data dimensionality and privacy leakage and validate their theory through well-designed simulations and application to the GTEx dataset. The derivation is rigorous, the implementation reproducible, and the work provides a formal framework for assessing privacy risks in genomic research

      We thank the reviewer for the positive assessment of our work’s conceptual originality, mathematical rigor, and reproducible implementation.

      Weaknesses:

      The study simplifies assumptions that phenotypes are independent, which is not the truth, and are measured without noise. Real-world data are highly correlated across different levels, not only genotype but also multi-omics, which may overstate recovery potential. The empirical evidence, while illustrative, is limited to small-scale data and idealized conditions; thus, the full practical impact remains to be demonstrated. GTEx analysis used only whole blood eQTL data from 369 individuals, which cannot capture the complexity, sample heterogeneity, or cross-tissue dependencies typical of biobank-scale studies

      We recognize the concern regarding the independence and noiselessness assumptions in our frame work. While assuming independent, noiseless phenotypes represents an idealized scenario, it allows us to clearly demonstrate the conceptual potential of our framework. The GTEx whole blood analysis is intended as a proof-of-concept, illustrating feasibility rather than capturing full biological complexity. In the revised manuscript, we will clarify these assumptions, emphasize that practical reconstruction accuracy maybe lower in correlated and noisy real-world data, and expand empirical validation to multiple GTEx tissue sand independent cohorts to demonstrate robustness under more realistic conditions.

      Reviewer #2 (PublicReview):

      Summary:

      This study focuses on the genomic privacy risks associated with Genome-Wide Association Study (GWAS) summary statistics, employing a three-tiered demonstration framework of” theoretical derivation- simulation experiments- real-data validation”. The research finds that when GWAS summary statistics are combined with high-dimensional phenotypic data, genotype recovery and individual re-identification can be achieved using linear programming methods. It further identifies key influencing factors such as the effective phenotype-to-sample sizeratio(R/N) and minor allele frequency(MAF). These findings provide practical reference for improving data governance policies in genomic research, holding certain real-world significance

      Strengths:

      This study integrates theoretical analysis, simulation validation, and the application of real world datasets to construct a comprehensive research framework, which is conducive to understanding and mitigating the risk of private information leakage in genomic research

      We are glad the reviewer values our integration of theory, simulation, and real data

      Weaknesses:

      (1) Limited scope of variant types covered:

      The analysis is conducted solely on Single Nucleotide Polymorphisms(SNPs), omitting other crucial genomic variant types such as Copy Number Variations(CNVs), Insertions/Deletions (InDels), and chromosomal translocations/inversions. From a genomic structure perspective, variants like CNVs and InDels are also core components of individual genetic characteristics, and in some disease-related studies, association signals for these variants can be even more significant than those for SNPs. From the perspective of privacy risk logic, the genotypes of these variants (e.g., copy number for CNVs, base insertion/deletion status for InDels) can also be quantified and could theoretically be inferred backwards using the combination of ”summary statistics +high-dimensional phenotypes”. Their privacy leakage risks might differ from those of SNPs(for instance, rare CNVs might be more easily re-identified due to higher genetic specificity)

      This point raises an important clarification regarding variant types beyond SNPs. We would like to clarify that our mathematical framework is not inherently restricted to SNPs. In fact, it is broadly applicable to any genetic variant that can be represented numerically, e.g., allelic dosage (0/1/2), copy number counts for CNVs, or presence/absence indicators for InDels. Conceptually, CNVs , InDels, and other structural variants can be incorporated in the same way as SNPs.

      The main limitation arises from the current availability of GWAS summary statistics for these non-SNP variant types (e.g., CNV dosages≥3), which are still relatively scarce. As a result, empirically evaluating our framework on these variant classes would be challenging. In the revision, we will explicitly emphasize the general applicability of our framework to diverse genetic variants while clearly noting this practical limitation. We also plan to include simulations to investigate the recovery accuracy associated with CNVs and InDels, which will further demonstrate the extensibility of our approach. It should be noted, however, that leaking genotypic data of ordinary SNPs already raises concerns, regardless of other types of genetic variants.

      (2) Bias in data applicability scope:

      Both the simulation experiments and real-data validation in the study primarily rely on European population samples (e.g.,489 Europe an samples from the 1000 Genomes Project; the genetic background of whole blood tissue samples from the GTEx project is not explicitly mentioned regarding non-European proportions). It only briefly notes a higher risk for African populations in the individual re-identification risk assessment, without conducting systematic analyses for other populations, such as East Asian, South Asian, or admixed American populations. Significant differences in genetic structure (e.g., MAF distribution, linkage disequilibrium patterns) exist across different populations. This may result in the R/N threshold and the relationship between MAF and recovery accuracy identified in the study not being fully applicable to other populations.

      Hence, addressing the aforementioned issues through supplementary work would enhance the study’s scientific rigor and application value, potentially providing more comprehensive theoretical and technical support for” privacy protection” in genomic data sharing.

      We acknowledge this valid concern regarding the generalizability of our findings. Our analysis already identifies MAF as a key factor influencing recovery accuracy, which begins to address population-specific genetic differences. Importantly, because our reconstruction method treats each variant independently, its success does not rely on population-specific LD patterns. The core determinant of feasibility is the ratio of phenotypic dimensions to sample size(R/N), a relationship we expect to hold a cross populations.

      Nevertheless, we agree that further validation across diverse ancestries can be helpful. In the revised manuscript, we will try to include additional cohorts as extended validation analyses

    1. eLife Assessment

      The manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Using a convincing combination of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the authors reveal how cargo binding induces an allosteric transition that propagates to the motor domains and enhances MAP7 binding. Despite limitations arising from conformational heterogeneity and structural resolution, the study presents a unified mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain an important question with respect to intracellular transport. It has implications for the accuracy and efficiency of motor transport by different motor families, for example, the direction of cargos towards one or other microtubules.

      Strengths:

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that occur. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry, and simple cell-based imaging. Each set of experiments is thoughtfully designed, and the intrinsic limitations of each method are offset by other approaches such that the assembled data convincingly support the authors' conclusions. This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field.

      Weaknesses:

      It is not always straightforward to follow the design logic of a particular set of experiments, with the result that the internal consistency of the data appears unconvincing in places. For example, i) the Figure 1 AlphaFold3 models do not include motor domains whereas the nearly all of the rest of the data involve constructs with the motor domains; ii) the kinesin constructs are chemically cross-linked prior to TEM sample preparation - this is clear in the Methods but should be included in the Results text, together with some discussion of how this might influence consistency with other methods where crosslinking was not used. Can those cross-links themselves be used to probe the intramolecular interactions in the molecular populations by mass spec? In general, the information content of some of the figure panels can also be improved with more annotations (e.g. angular relationship between views in Figure 1B, approximate interpretations of the various blobs in Fig 3F, and more thought given to what the reader should extract from the representative micrographs in several figures - inclusion of the raw data is welcome but extraction and magnification of exemplar particles (as is done more effectively in Fig S5) could convey more useful information elsewhere.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS), and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility and enhancing interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use a clever construct design - e.g., delta-Elbow, ElbowLock, CC-Di, and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or affecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies. The paper is well-written and easy to follow, though some more attention to figure labels and legends would improve the manuscript (detailed in recommendations for the authors).

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is, and perhaps this could have been explored more here. The authors have shown by HDX-MS that the motor domains become more mobile on KinTag binding, but perhaps molecular dynamics would also be useful for modelling how that might occur.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation - how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation.

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

    1. eLife Assessment

      This important study describes a new link between nutrient signaling and chromosome regulation, providing compelling evidence that reduced activity in the central nutrient-sensing pathway governed by TORC1 improves chromosome stability and alters gene expression in S. pombe through effects on cohesin. While the biological importance of this newly described circuit is not yet fully known, and some data would benefit from further clarification, the overall body of evidence supports the main conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Besson et al. investigate how environmental nutrient signals regulate chromosome biology through the TORC1 signaling pathway in Schizosaccharomyces pombe. Specifically, the authors explore the impact of TORC1 on cohesin function - a protein complex essential for chromosome segregation and transcriptional regulation. Through a combination of genetic screens, biochemical analysis, phospho-proteomics, and transcriptional profiling, they uncover a functional and physical interaction between TORC1 and cohesin. The data suggest that reduced TORC1 activity enhances cohesin binding to chromosomes and improves chromosome segregation, with implications for stress-responsive gene expression, especially in subtelomeric regions.

      Strengths:

      This work presents a compelling link between nutrient sensing and chromosome regulation. The major strength of the study lies in its comprehensive and multi-disciplinary approach. The authors integrate genetic suppression screens, live-cell imaging, chromatin immunoprecipitation, co-immunoprecipitation, and mass spectrometry to uncover the functional connection between TORC1 signaling and cohesin. The use of phospho-mutant alleles of cohesin subunits and their loader provides mechanistic insight into the regulatory role of phosphorylation. The addition of transcriptomic analysis further strengthens the biological relevance of the findings and places them in a broader physiological context. Altogether, the dataset convincingly supports the authors' main conclusions and opens up new avenues of investigation.

      Weaknesses:

      While the study is strong overall, a few limitations are worth noting. The consistency of cohesin phosphorylation changes under different TORC1-inhibiting conditions (e.g., genetic mutants vs. rapamycin treatment) is unclear and could benefit from further clarification. The phosphorylation sites identified on cohesin subunits do not match known AGC kinase consensus motifs, raising the possibility that the modifications are indirect. The study relies heavily on one TORC1 mutant allele (mip1-R401G), and additional alleles could strengthen the generality of the findings. Furthermore, while the results suggest that nutrient availability influences cohesin function, this is not directly tested by comparing growth or cohesin dynamics under defined nutrient conditions.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors follow up on a previous suppressor screen of a temperature-sensitive allele of mis4 (mis4-G1487D), the cohesin loading factor in S. pombe, and identify additional suppressor alleles tied to the S. pombe TORC1 complex. Their analysis suggests that these suppressor mutations attenuate TORC1 activity, while enhanced TORC1 activity is deleterious in this context. Suppression of TORC1 activity also ameliorates chromosome segregation and spindle defects observed in the mis4-G1487D strain, although some more subtle effects are not reconstituted. The authors provide evidence that this genetic suppression is also tied to the reconstitution of cohesin loading. Moreover, disrupting TORC1 also enhances Mis4/cohesin association with chromatin (likely reflecting enhanced loading) in WT cells, while rapamycin treatment can enhance the robustness of chromosome transmission. These effects likely arise directly through TORC1 or its downstream effector kinases, as TORC1 co-purifies with Mis4 and Rad21; these factors are also phosphorylated in a TORC1-dependent fashion. Disrupting Sck2, a kinase downstream of TORC1, also suppresses the mis4-G1487D allele while simultaneous disruption of Sck1 and Sck2 enhances cohesin association with chromatin, albeit with differing effects on phosphorylation of Mis4 and Psm1/Scm1. Phosphomutants of Mis4 and Psm1 that mimic observed phosphorylation states identified by mass spectrometry that are TORC1-dependent also suppressed phenotypes observed in the mis4-G1487D background. Last, the authors provide evidence that the mis4-G1487D background and TORC1 mutant backgrounds display an overlap in the dysregulation of genes that respond to environmental conditions, particularly in genes tied to meiosis or other "stress".

      Overall, the authors provide compelling evidence from genetics, biochemistry, and cell biology to support a previously unknown mechanism by which nutrient sensing regulates cohesin loading with implications for the stress response. The technical approaches are generally sound, well-controlled, and comprehensive.

      Specific Points:

      (1) While the authors favor the model that the enhanced cohesin loading upon diminished TORC1 activity helps cells to survive harsh environmental conditions, as starvation of S. pombe also drives commitment to meiosis, it seems as plausible that enhanced cohesin loading is related to preparing the chromosomes to mate.

      (2) Related to Point 1, the lab of Sophie Martin previously published that phosphorylation of Mis4 characterizes a cluster of phosphotargets during starvation/meiotic induction (PMID: 39705284). This work should be cited, and the authors should interrogate how their observations do or do not relate to these prior observations (are these the same phosphosites?).

      (3) It would be useful for the authors to combine their experimental data sets to interrogate whether there is a relationship between the regions where gene expression is altered in the mis4-G1487D strain and changes in the loading of cohesin in their ChIP experiments.

      (4) Given that the genes that are affected are predominantly sub-telomeric while most genes are not affected in the mis4-G1487D strain, one possibility that the authors may wish to consider is that the regions that become dysregulated are tied to heterochromatic regions where Swi6/HP1 has been implicated in cohesin loading.

      (5) It would be helpful to show individual data points from replicates in the bar graphs - it is not always clear what comprises the data sets, and superplots would be of great help.

    1. eLife Assessment

      Mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress, and this study on the mitochondrial Transcription Factor A (TFAM) presents valuable data concerning the possible mechanisms involved. The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. However, the evidence is inadequate to support the primary claims.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate how UVC-induced DNA damage alters the interaction between the mitochondrial transcription factor TFAM and mtDNA. Using live-cell imaging, qPCR, atomic force microscopy (AFM), fluorescence anisotropy, and high-throughput DNA-chip assays, they show that UVC irradiation reduces TFAM sequence specificity and increases mtDNA compaction without protecting mtDNA from lesion formation. From these findings, the authors suggest that TFAM acts as a "sensor" of damage rather than a protective or repair-promoting factor.

      Strengths:

      (1) The focus on UVC damage offers a clean system to study mtDNA damage sensing independently of more commonly studied repair pathways, such as oxidative DNA damage. The impact of UVC damage is not well understood in the mitochondria, and this study fills that gap in knowledge.

      (2) In particular, the custom mitochondrial genome DNA chip provides high-resolution mapping of TFAM binding and reveals a global loss of sequence specificity following UVC exposure.

      (3) The combination of in vitro TFAM DNA biophysical approaches, combined with cellular responses (gene expression, mtDNA turnover), provides a coherent multi-scale view.

      (4) The authors demonstrate that TFAM-induced compaction does not protect mtDNA from UVC lesions, an important contribution given assumptions about TFAM providing protection.

      Weaknesses:

      (1) The authors show a decrease in mtDNA levels and increased lysosomal colocalization but do not define the pathway responsible for degradation. Distinguishing between replication dilution, mitophagy, or targeted degradation would strengthen the interpretation

      (2) The sudden induction of mtDNA replication genes and transcription at 24 h suggests that intermediate timepoints (e.g., 12 hours) could clarify the kinetics of the response and avoid the impression that the sampling coincidentally captured the peak.

      (3) The authors report no loss of mitochondrial membrane potential, but this single measure is limited. Complementary assays such as Seahorse analysis, ATP quantification, or reactive oxygen species measurement could more fully assess functional integrity.

      (4) The manuscript briefly notes enrichment of TFAM at certain regions of the mitochondrial genome but provides little interpretation of why these regions are favored. Discussion of whether high-occupancy sites correspond to regulatory or structural elements would add valuable context.

      (5) It remains unclear whether the altered DNA topology promotes TFAM compaction or vice versa. Addressing this directionality, perhaps by including UVC-only controls for plasmid conformation, would help disentangle these effects if UVC is causing compaction alone.

      (6) The authors provide a discrepancy between the anisotropy and binding array results. The reason for this is not clear, and one wonders if an orthogonal approach for the binding experiments would elucidate this difference (minor point).

      Assessment of conclusions:

      The manuscript successfully meets its primary goal of testing whether TFAM protects mtDNA from UVC damage and the impact this has on the mtDNA. While their data points to an intriguing model that TFAM acts as a sensor of damaged mtDNA, the validation of this model requires further investigation to make the model more convincing. This is likely warranted for a follow-up study. Also, the biological impact of this compaction, such as altering transcription levels, is not clear in this study.

      Impact and utility of the methods:

      This work advances our understanding of how mitochondria manage UVC genome damage and proposes a structural mechanism for damage "sensing" independent of canonical repair. The methodology, including the custom TFAM DNA chip, will be broadly useful to the scientific community.

      Context:

      The study supports a model in which mitochondrial genome integrity is maintained not only by repair factors, but also by selective sequestration or removal of damaged genomes. The demonstration that TFAM compaction correlates with damage rather than protection reframes an interesting role in mtDNA quality control.

    3. Reviewer #2 (Public review):

      Summary:

      King et al. present several sets of experiments aimed to address the potential impact of UV irradiation on human mitochondrial DNA as well as the possible role of mitochondrial TFAM protein in handling UV-irradiated mitochondrial genomes. The carefully worded conclusion derived from the results of experiments performed with human HeLa cells, in vitro small plasmid DNA, with PCR-generated human mitochondrial DNA, and with UV-irradiated small oligonucleotides is presented in the title of the manuscript: "UV irradiation alters TFAM binding to mitochondrial DNA". The authors also interpret results of somewhat unconnected experimental approaches to speculate that "TFAM is a potential DNA damage sensing protein in that it promotes UVC-dependent conformational changes in the [mitochondrial] nucleoids, making them more compact." They further propose that such a proposed compaction triggers the removal of UV-damaged mitochondrial genomes as well as facilitates replication of undamaged mitochondrial genomes.

      Strengths:

      (1) The authors presented convincing evidence that a very high dose (1500 J/m2) of UVC applied to oligonucleotides covering the entire mitochondrial DNA genome alleviates sequence specificity of TFAM binding (Figure 3). This high dose was sufficient to cause UV lesions in a large fraction of individual oligonucleotides. The method was developed in the lab of one of the corresponding authors (reference 74) and is technically well-refined. This result can be published as is or in combination with other data.

      (2) The manuscript also presents AFM evidence (Figure 4) that TFAM, which was long known to facilitate compaction of the mitochondrial genome (Alam et al., 2003; PMID 12626705 and follow-up citations), causes in vitro compaction of a small pUC19 plasmid and that approximately 3 UVC lesions per plasmid molecule result in a slight, albeit detectable, increase in TFAM compaction of the plasmid. Both results can be discussed in line with a possible extrapolation to in vivo phenomena, but such a discussion should include a clear statement that no in vivo support was provided within the set of experiments presented in the manuscript.

      Weaknesses:

      Besides the experiments presented in Figures 3 and 4, other results do not either support or contradict the speculation that TFAM can play a protective role, eliminating mitochondrial genomes with bulky lesions by way of excessive compaction and removing damaged genomes from the in vivo pool.

      To specify these weaknesses:

      (1) Figure 1 - presents evidence that UVC causes a reduction in the number of mitochondrial spots in cells. The role of TFAM is not assessed.

      (2) Figure 2 - presents evidence that UVC causes lesions in mitochondrial genomes in vivo, detectable by qPCR. No direct assessment of TFAM roles in damage repair or mitochondrial DNA turnover is assessed despite the statements in the title of Figure 2 or in associated text. Approximately 2-fold change in gene expression of TFAM and of the three other genes does not provide any reasonable support to suggestion about increased mitochondrial DNA turnover over multiple explanations on related to mitochondrial DNA maintenance.

      (3) Figure 5. Shows that TFAM does not protect either mitochondrial nucleoids formed in vitro or mitochondrial DNA in vivo from UVC lesions as well as has no effect on in vivo repair of UV lesions.

      (4) Figure 6: Based on the above analysis, the model of the role of TFAM in sensing mtDNA damage and elimination of damaged genomes in vivo appears unsupported.

      (5) Additional concern about Figure 3 and relevant discussion: It is not clear if more uniform TFAM binding to UV irradiated oligonucleotides with varying sequence as compared to non-irradiated oligonucleotides can be explained by just overall reduced binding eliminating sequence specific peaks.

    4. Reviewer #3 (Public review):

      Summary:

      The study is grounded in the observations that mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress. The manuscript focuses on the effects of UVC-induced DNA damage on TFAM-DNA binding in vitro and in cells. The authors demonstrate increased TFAM-DNA compaction following UVC irradiation in vitro based on high-throughput protein-DNA binding and atomic force microscopy (AFM) experiments. They did not observe a similar trend in fluorescence polarization assays. In cells, the authors found that UVC exposure upregulated TFAM, POLG, and POLRMT mRNA levels without affecting the mitochondrial membrane potential. Overexpressing TFAM in cells or varying TFAM concentration in reconstituted nucleoids did not alter the accumulation or disappearance of mtDNA damage. Based on their data, the authors proposed a plausible model that, following UVC-induced DNA damage, TFAM facilitates nucleoid compaction, which may serve to signal damage in the mitochondrial genome.

      Strengths:

      The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. The proposed model may inspire future follow-up studies to further study the role of TFAM in sensing UVC-induced damage.

      Weaknesses:

      The manuscript could be further improved by refining specific interpretations and ensuring terminology aligns precisely with the data presented.

      (1) In line 322, the claim of increased "nucleoid compaction" in cells should be removed, as there is a lack of direct cellular evidence. Given that non-DNA-bound TFAM is subject to protease digestion, it is uncertain to what extent the overexpressed TFAM actually integrates into and compacts mitochondrial nucleoids in the absence of supporting immunofluorescence data.

      (2) In lines 405 and 406, the authors should avoid equating TFAM overexpression with compaction in the cellular context unless the compaction is directly visualized or measured.

      (3) In lines 304 and 305 (and several other places throughout the manuscript), the authors use the term "removal rates". A "removal rate" requires a direct comparison of accumulated lesion levels over a time course under different conditions. Given the complexity of UV-induced DNA damage-which involves both damage formation and potential removal via multiple pathways-a more accurate term that reflects the net result of these opposing processes is "accumulated DNA damage levels." This terminology better reflects the final state measured and avoids implying a single, active 'removal' pathway without sufficient kinetic data.

      (4) In line 357, the authors refer to the decrease in the total DNA damage level as "The removal of damaged mtDNA". The decrease may be simply due to the turnover and resynthesis of non-damaged mtDNA molecules. The term "removal" may mislead the casual reader into interpreting the effect as an active repair/removal process.

    1. eLife Assessment

      This study investigates the folding and unfolding behavior of the doubly knotted protein TrmD-Tm1570, providing insight into the molecular mechanisms underlying protein knotting. The findings reveal multiple unfolding pathways and suggest that the formation of double knots may require chaperone assistance, offering valuable insights into topologically complex proteins. The evidence is solid, supported by consistent agreement between simulation and experiment, though some aspects of the presentation and experimental scope could be clarified or expanded.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the thermal and mechanical unfolding pathways of the doubly knotted protein TrmD-Tm1570 using molecular simulations, optical tweezers experiments, and other methods. In particular, the detailed analysis of the four major unfolding pathways using a well-established simulation method is an interesting and valuable result.

      Strengths:

      A key finding that lends credibility to the simulation results is that the molecular simulations at least qualitatively reproduce the characteristic force-extension distance profiles obtained from optical tweezers experiments during mechanical unfolding. Furthermore, a major strength is that the authors have consistently studied the folding and unfolding processes of knotted proteins, and this paper represents a careful advancement building upon that foundation.

      Weaknesses:

      While optical tweezers experiments offer valuable insights, the knowledge gained from them is limited, as the experiments are restricted to this single technique.

      The paper mentions that the high aggregation propensity of the TrmD-Tm1570 protein appears to hinder other types of experiments. This is likely the reason why a key aspect, such as whether a ribosome or molecular chaperones are essential for the folding of TrmD-Tm1570, has not been experimentally clarified, even though it should be possible in principle.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors combined coarse-grained structure-based model simulation, optical tweezer experiments, and AI-based analysis to assess the knotting behavior of the TrmD-Tm1570 protein. Interestingly, they found that while the structure-based model can fold the single knot from TrmD and Tm1570, the double-knot protein TrmD-Tm1570 cannot form a knot itself, suggesting the need for chaperone proteins to facilitate this knotting process. This study has strong potential to understand the molecular mechanism of knotted proteins, supported by many experimental and simulation evidence. However, there are a few places that appear to lack sufficient details, and more clarification in the presentation is needed.

      Strengths:

      A combination of both experimental and computational studies.

      Weaknesses:

      There is a lack of detail to support some statements.

      (1) The use of the AI-based method, SOM, can be emphasized further, especially in its analysis of the simulated unfolding trajectories and discovery of the four unfolding/folding pathways. This will strengthen the statistical robustness of the discovery.

      (2) The manuscript would benefit from a clearer description of the correlation between the simulation and experimental results. The current correlation, presented in the paragraph starting from Line 250, focuses on measured distances. The authors could consider providing additional evidence on the order of events observed experimentally and computationally. More statistical analyses on the experimental curves presented in Figure 4 supplement would be helpful.

      (3) How did the authors calibrate the timescale between simulation and experiment? Specifically, what is the value \tau used in Line 270, and how was it calculated? Relevant information would strengthen the connection between simulation and experiment.

      (4) In Line 342, the authors comment that whether using native contacts or not, they cannot fold double-knotted TrmD-Tm1570. Could the authors provide more details on how non-native interactions were analyzed?

      (5) It appears that the manuscript lacks simulation or experimental evidence to support the statement at Line 343: While each domain can self-tie into its native knot, this process inhibits the knotting of the other domain. Specifically, more clarification on this inhibition is needed.

    1. eLife Assessment

      This study used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and demonstrated that its deletion has no effect on retinal neurogenesis or cell fate specification, thereby challenging the prevailing view of Ptbp1 as a master regulator of neuronal fate. The data are convincing, supported by transcriptomic analysis, histology, and proliferation assays. This study is important, and the broader implications for other CNS regions warrant further investigation.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research.

      Strengths:

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is well-organized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications.

      Weaknesses:

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function.

    3. Reviewer #2 (Public review):

      Summary:

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development.

      Strengths:

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS.

      Weaknesses:

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.

      Comments on revisions:

      The authors have thoroughly and satisfactorily addressed all my previous comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research. 

      Strengths: 

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is wellorganized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications. 

      We thank the Reviewer for their evaluation of the strengths of the study.

      Weaknesses: 

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. The claim that Ptbp1 is "fully dispensable" for retinal development may be toned down, given the transcriptional and splicing modifications identified. The possibility of subtle or transitory impacts, such as ectopic neuron development followed by cell death, is postulated, but not completely investigated. Furthermore, as the authors point out, the compensating potential of increased Ptbp2 warrants additional exploration. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function. While 864 splicing events have been found, the functional significance of these alterations, notably the 7% that are neuronalenriched and the 35% that are rod-specific, has not been thoroughly investigated. The manuscript might be improved by describing how these splicing changes affect retinal development or function. 

      We have revised the text to address these points as requested.

      Reviewer #2 (Public review): 

      Summary: 

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development. 

      Strengths: 

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS. 

      We thank the Reviewer for their evaluation of the strengths of the study.

      Weaknesses: 

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.  

      We have revised the text to address the potential limitations of the use of the Chx10-Cre driver in this study.

      Reviewer #1 (Recommendations for the authors):

      (1) The author only selected scRNA-Seq datasets to examine the expression patterns of Ptbp1 in the retina; incorporating immunostaining analysis in the mouse retina is necessary.

      Ptbp1 expression patterns in the mouse retina were performed in Fig. 1b-1d, where Ptbp1 expression was analyzed via immunostaining for Ptbp1 protein in Chx10-Cre control and Ptbp1KO retinas at E14, P1, and P30, and are quantified in Fig. 1e. 

      (2) In Figure 1, Ptbp1 signals were still detected in the KO mice, with the author suggesting that this may indicate cross-reactivity with an unknown epitope. Why is this unknown epitope only detected in the ganglion cell layer? Additional antibodies are needed to confirm the staining results. Furthermore, it is essential to verify the KO at the mRNA level using PCR. 

      We are unsure of the identity of this cross-reacting epitope, although it might be Ptbp2, which is enriched expressed in immature retinal ganglion cells (Fig. S1).  In any case, we do not believe that the identity of this epitope is not relevant to assessing the efficiency of Ptbp1 deletion, as it is not detectably expressed in retinal ganglion cells in any case (Fig. S1).

      Although the heatmap in Figure 2B indicates a decrease in Ptbp1 levels in the KO mice, the absence of statistical data makes it difficult to evaluate the KO efficiency. 

      Respectfully, we believe that Ptbp1 knockout efficiency is adequately addressed using immunohistochemistry, and that further statistical analysis is not essential here. 

      Cre staining of the Chx10-Cre;Ptbp1lox/lox mice or using reporter lines is also suggested to indicate the theoretically knockout cells. Providing high-power images of the Ptbp1 staining would help readers clearly recognize the staining signals.

      To clarify the identity of the knockout cells, we have updated Figure 1 to include the Chx10-CreEGFP staining which more clearly delineates the cells in which Ptbp1 is deleted. Regarding verification of the knockout, we believe additional PCR assays are not necessary, as we have already demonstrated efficient loss of Ptbp1 in Chx10-Cre-expressing cells at the RNA level by both single-cell RNA-sequencing and bulk RNA-sequencing, and also at the protein level by immunohistochemistry. Sun1-GFP Cre reporter lines are also used in Figures 1 and S2 to visualize patterns of Cre activity, a point which is now highlighted in the text. Together, these approaches provide sufficient evidence for effective Ptbp1 knockout. 

      (3) The possibility of ectopic neuron formation followed by cell death is intriguing but underexplored. Consider adding apoptosis assays (e.g., TUNEL staining) at early developmental stages to test this hypothesis.

      While apoptosis assays such as TUNEL staining would be helpful to address this hypothesis, we feel incorporating these additional experiments is currently beyond the scope of this study. We agree the possibility of cell death is intriguing and plan to explore this in future work.

      (4) On page 4, the statement "We did not observe any significant differences ... Chx10Cre;Ptbp1lox/lox mice (Fig. 2b,c)" should refer to Fig. 3b,c instead.

      We have changed the text to refer to Fig. 3b,c.

      (5) The labeling in Figure 3 as "Cre-Ptbp1" is inconsistent with the figure legend "Ptbp1-Ctrl.".

      This language was used because the samples for EdU staining in Figure 3 were Chx10-Cre negative Ptbp1<sup>lox/lox</sup> mice. We have updated the language in the manuscript and figure to reflect the genotypes more clearly. 

      (6) P30 mice are still sexually immature; the term "adolescent" or "juvenile" should be used instead of "adult."

      We have updated the language in the text from “adult” to “adolescent” to describe P30 mice, although the retina itself is mature by this age.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned in the public review, a limitation of the study is that Ptbp1 KO is not induced prior to E11. The authors should acknowledge this limitation and include in the Discussion that the use of the Chx10-Cre line does not permit evaluation of a potential role for Ptbp1 during very early stages of retinal development, should it be expressed at that time (an aspect that would be important to determine).

      We and have added this limitation to the Discussion in the sentence highlighted below.

      Furthermore, the use of the Chx10-Cre transgene in this study does not exclude a potential role for Ptbp1 during very early stages of retinal development prior to E11 (pg. 6).

      (2) While the data convincingly show no significant changes in retinal cell type distribution in Ptbp1 mutants, the claims in the abstract and introduction that Ptbp1 is "dispensable for retinal development" or "dispensable for the process of neurogenesis" may be overstated. Indeed, the results indicate that loss of Ptbp1 function influences retinal development by promoting neurogenesis through induction of a neuronal-like splicing program in neural progenitors. Concluding solely that Ptbp1 is dispensable for retinal cell fate specification, rather than for retinal development as a whole, would thus seem more accurate.

      We have updated the language in the text to reflect Ptbp1’s role in regulating retinal cell fate specification more clearly.

      (3) The authors conclude from Figure 5 that "No changes in the identity or composition of any retinal cell type were observed." Which statistical test was applied to support this conclusion? The figure indicates that Müller cells comprise 10.5% of the total cell population in controls versus 8.2% in Ptbp1-KO retinas. It may be important to consider the overall distribution of glia versus all neurons (rather than each neuron subtype individually). While the observed difference (~2% more glia at the expense of neurons) appears modest, it would be important to determine whether this trend is consistent and statistically significant.

      To evaluate cell type composition, we performed differential expression analysis across all major retinal cell types and compared proportional cell type representation between control and Ptbp1 KO retinas. While these analyses did not reveal marked differences in any specific cell type, we acknowledge that the scRNA-Seq dataset includes a single experimental replicate, containing two retinas in each replicate. Therefore, we cannot draw firm statistical conclusions regarding the relative distribution of glia versus neurons, and the modest difference observed in glia cell proportion should be interpreted with caution. We agree that assessing glia-to-neuron ratios across additional replicates will be important in future studies.

      (4) Referringx to Figure S1 (scRNA-seq data), the authors state that Ptbp1 mRNA is robustly expressed in retinal progenitors and Müller glia in both mouse and human retina. While the immunostaining in Figure 4 indeed clearly shows strong expression in Müller cells, the scRNAseq data presented in Figure S1 do not support the claim of "robust" expression in Müller glia in the mouse retina. This is even more striking in the human data, where panels F and H show that Ptbp1 is expressed at extremely low, certainly not "robust", levels in Müller cells. The corresponding sentence in the Results section should therefore be revised to more accurately reflect the data presented in Figure S1, or be supported by complementary immunofluorescence evidence.

      We thank the reviewer for this comment. We have revised this section of the Results to better reflect Fig S1, as follows:

      We observe high expression levels of Ptbp1 mRNA in primary retinal progenitors in both species and Müller glia in mouse retina, with weaker expression in neurogenic progenitors, and little expression detectable in neurons at any developmental age.

      (5) When mentioning potential compensation by Ptbp2, the authors may also consider discussing the possibility that compensatory mechanisms can differ between knockdown and knockout approaches. In this context, it is noteworthy that a recent study by Konar et al., Exp Eye Res, 2025 (published after the submission of the present manuscript) reports that Ptbp1 knockdown promotes Müller glia proliferation in zebrafish.

      We thank the reviewer for this suggestion. To address this, we have included a section considering this possibility in the discussion section highlighted below.

      It is also possible that compensatory mechanisms differ between knockdown and knockout approaches. Notably, a recent study (Konar et al. 2025) reported that Ptbp1 knockdown promotes Müller glia proliferation in zebrafish, suggesting that effects of acute reduction of Ptbp1 may not fully mirror those of complete loss-of-function. 

      (6) The statistical analyses were performed using a t-test. However, this parametric test is not appropriate for experiments with low sample sizes. A non-parametric test, such as the MannWhitney test, would be more suitable in this context. Furthermore, performing statistical analysis on n = 2 (Figure 3C) is not statistically valid.

      We thank the reviewer for this comment. We agree that with a small n, non-parametric tests are more appropriate. We have added additional retinas (now n=5) for the Ptbp1-KO condition in Figure 3C and reanalyzed with the appropriate non-parametric Mann-Whitney test. For all other datasets with sufficient replicates (n≥ 4/genotype), parametric tests such as unpaired t-tests remain valid, and the results are consistent with non-parametric testing. 

      (7) Figure S3 is accompanied by only a brief explanation in the Results section (a single sentence despite the figure containing six panels), which makes it difficult for readers unfamiliar with this type of data to interpret.

      We thank the reviewer for the suggestion. To address this, we have included a more detailed explanation of Supplementary Figure S3 to better clarify our analysis of mature neuronal and glial cell types in both Ptbp1-deficient and wild-type animals. The relevant text now reads:

      Notably, splicing patterns in Ptbp1-deficient retinas showed stronger correlation with Thy1positive neurons— which exhibit low Ptbp1 expression—and minimal overlap with microglia and auditory hair cells, the adult cell types with the highest Ptbp1 levels (Fig. S3).

      Gene expression and splicing changes were compared across several reference tissues: heart tissue and Thy1-positive neurons, mature hair cells, microglia, and astrocytes (Fig. S3a,b). A heatmap of differentially expressed genes showed that while Ptbp1-deficient retinas diverged from WT retinas, their expression profiles did not resemble those of fully differentiated cell types like rods, astrocytes, or adult WT retina (Fig. S3c). Consistently, Pearson correlation analysis revealed that Ptbp1-deficient and WT retinas were more similar to each other than to fully differentiated neuronal or glial populations (Fig. S3d). Splicing profile analysis further revealed that while there was high correlation of PSI between Ptbp1-deficient and WT retinas, Ptbp1deficient retinas more closely resembled Thy1-positive neurons, whereas WT retinas aligned more strongly with mature cells such as astrocytes, microglia, and auditory hair cells (Fig. S3ef). Together, these results suggest that although Ptbp1 loss induces hundreds of alternative splicing events, the magnitude of PSI changes in the KO retinas remains considerably lower than that seen in fully differentiated cell types (Extended Data 3). Thus, while a subset of splicing events overlaps with those characteristic of mature neurons or rods, the overall splicing and expression profiles of KO retinas are more similar to those of developing retinal tissue rather than terminally differentiated neuronal or glial populations.

      (8) To assess progenitor proliferation, the authors performed EdU labeling experiments in P0 retinas. Is there a rationale for not examining earlier developmental time points to evaluate potential effects on early RPCs?

      We thank the reviewer for this comment. We chose to perform EdU labeling experiments at P0 for several reasons. P0 represents a developmental stage where RPCs are actively proliferating and represent ~35% of all retina cells, and the retina is transitioning to intermediate-late-stage development, providing sufficient time to ensure efficient and widespread disruption of Ptbp1. Earlier embryonic timepoints were not examined here, as addressing all stages of development was beyond the scope of this current study. However, we agree that investigating whether Ptbp1 plays stage-specific roles during development on early RPCs is an important question and potential future direction.

      (9) In Figure S2, panel D shows staining in GCL under the Ptbp1 condition that does not make sense and is inconsistent with panel C. If possible, the authors should provide an alternative image to prevent any confusion.

      Thank you for bringing this to our attention. The image shown for Ptbp1-KO in Figure 2d shows Sun1-eGFP labeling, which labels every cell affected by the Cre condition. The genotype for this mouse was Chx10-Cre;Ptbp1lox/lox;Sun1-GFP. We apologize for any confusion and have updated the genotype in the figure legend.

      (10) The authors should revise the following sentence at the end of the Discussion section, as its meaning is unclear: "...and conditions for in vitro analysis may have accurately replicated conditions in the native CNS."

      We thank the reviewer for this comment and have revised this sentence in the discussion for the sentence below.

      Previous studies using knockdown may have been complicated by off-target effects (Jackson et al. 2003), and conditions for in vitro analysis may not have accurately replicated conditions in the native CNS.

    1. eLife Assessment

      This study demonstrates the cartilage-protective effects of osteoactivin in inflammatory experimental models. The work offers valuable insights advancing current knowledge regarding regulation of joint inflammation and tissue degeneration. The evidence provided is compelling and suggests that osteoactivin may serve as a promising therapeutic target for inflammatory joint diseases.

    2. Reviewer #1 (Public review):

      Summary:

      While previous studies by this group and others have demonstrated the anti-inflammatory properties of osteoactivin, its specific role in cartilage homeostasis and disease pathogenesis remains unknown.

      Strengths:

      Strengths of the study include its clinical relevance, given the lack of curative treatments for osteoarthritis, as well as the clarity of the narrative and the quality of most results."

      Weaknesses:

      A limitation of the study is the reliance on standard techniques; however, this is a minor concern that does not diminish the overall impact or significance of the work.

      Comments on revisions:

      The authors have satisfactorily addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents compelling evidence for a novel anti-inflammatory function of glycoprotein non-metastatic melanoma protein B (GPNMB) in chondrocyte biology and osteoarthritis (OA) pathology. Through a combination of in vitro, ex vivo, and in vivo models, including the destabilization of the medial meniscus (DMM) surgery in mice, the authors demonstrate that GPNMB expression is upregulated in OA-affected cartilage and that recombinant GPNMB treatment reduces the expression of key catabolic markers (MMPs, Adamts-4, and IL-6) without impairing anabolic gene expression. Notably, DBA/2J mice lacking functional GPNMB exhibit exacerbated cartilage degradation post-injury. Mechanistically, GPNMB appears to mitigate inflammation via the MAPK/ERK pathway. Overall, the work is thorough, methodologically sound, and significantly advances our understanding of GPNMB as a protective modulator in osteoarthritic joint disease. The findings could open pathways for therapeutic development.

      Strengths:

      (1) Clear hypothesis addressing a well-defined knowledge gap.

      (2) Robust and multi-modal experimental design: includes human, mouse, cell-line, explant, and surgical OA models.

      (3) Elegant use of DBA/2J GPNMB-deficient mice to mimic endogenous loss-of-function.

      (4) Mechanistic insight provided through MAPK signaling analysis.

      (5) Statistical analysis appears rigorous and the figures are informative.

      Weaknesses:

      (1) Clarify the strain background of the DBA/2J GPNMB+ mice: While DBA/2J GPNMB+ is described as a control, it would help to explicitly state whether these are transgenically rescued mice or another background strain. Are they littermates, congenic, or a separate colony?

      (2) Provide exact sample sizes and variance in all figure legends: Some figures (e.g., Figure 2 panels) do not consistently mention how many replicates were used (biological vs. technical) for each experimental group. Standardizing this across all panels would improve reproducibility.

      (3) Expand on potential sex differences: The DMM model is applied only in male mice, which is noted in the methods. It would be helpful if the authors added 1-2 lines in the discussion acknowledging potential sex-based differences in OA progression and GPNMB function.

      (4) Visual clarity in schematic (Figure 7): The proposed mechanism is helpful but the text within the schematic is somewhat dense and could be made more readable with spacing or enlarged font. Also, label the MAPK/ERK pathway explicitly in panel B.

      Comments on revisions:

      The authors have addressed all the concerns raised in the initial review.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Weaknesses:

      A limitation of the study is the reliance on standard techniques; however, this is a minor concern that does not diminish the overall impact or significance of the work.

      We agree that standard techniques were utilized. We believe this approach enhances the reliability and reproducibility of our findings. These methods are well-validated in the field and allow for robust interpretation of the results presented.

      Reviewer #2 (Public Reviews):

      Weaknesses: 

      (1) Clarify the strain background of the DBA/2J GPNMB+ mice: While DBA/2J GPNMB+ is described as a control, it would help to explicitly state whether these are transgenically rescued mice or another background strain. Are they littermates, congenic, or a separate colony?

      The following language was added to the manuscript, “The DBA/2J GPNMB+ mice are a coisogenic strain purchased from Jackson Laboratories. Jackon Laboratories generated these mice by knocking in the wild-type allele of Gpnmb into the DBA/2J background. By doing so, they rescued the phenotype of the DBA/2J mice. This description has been highlighted in our previous publications (Abdelmagid et al., 2014; Abdelmagid et al., 2015).”

      (2) Provide exact sample sizes and variance in all figure legends: Some figures (e.g., Figure 2 panels) do not consistently mention how many replicates were used (biological vs. technical) for each experimental group. Standardizing this across all panels would improve reproducibility.

      The manuscript has been updated to include replicates in each figure legend.

      (3) Expand on potential sex differences: The DMM model is applied only in male mice, which is noted in the methods. It would be helpful if the authors added 1-2 lines in the discussion acknowledging potential sex-based differences in OA progression and GPNMB function. 

      To our knowledge there are no sexbased differences in OA progression and GPNMB function in the literature. It was initially reported that only male C57BL/6J mice (Jackson Laboratories) develop OA following DMM however, recent literature has shown that both male and female mice develop the disease (Hwang et al., 2021; Ma et al., 2007). For the purpose of this manuscript, only male mice were used to provide preliminary results, however, we plan to repeat the included studies in female mice in the near future.  

      (4) Visual clarity in schematic (Figure 7): The proposed mechanism is helpful, but the text within the schematic is somewhat dense and could be made more readable with spacing or enlarged font. Also, label the MAPK/ERK pathway explicitly in panel B.

      We updated the schematic diagram in figure 7 and the figure legend.

      Reviewer #1 (Recommendations for the Authors):

      Several concerns must be addressed to improve the clarity and scientific rigor of the manuscript: 

      (1) Abstract: Specify which MMPs and MAPKs are modulated by osteoactivin.

      We specified the MMPs and clarified that GPNMB plays a role in pERK inhibition following inflammation induced by IL-1β stimulation. 

      (2) Human explant validation: The regulation of MMP-9, MMP-13, and IL-6 should be validated in the human cartilage explant model to support the claim that "GPNMB has an anti-inflammatory role in human primary chondrocytes" (line 123). Additionally, the anatomical origin of the explants must be stated.

      Thank you very much for the recommendation. We agree that validating the explant culture for MMP-9, MMP-13, and IL-6 would strengthen our data. Unfortunately, this experiment has been terminated and we no longer have access to the tissue. Human explants were obtained from discarded knee articular cartilage following arthroplasty. The manuscript has been updated to include this information.

      (3) DBA/2J GPNMB expression: GPNMB is known to be produced as a truncated protein in DBA/2J cells. The manuscript should address why its expression is reduced. Does this involve mRNA instability? Also, the nomenclature "DBA/2J GPNMB+" versus "DBA/2J" is confusing, especially since both mRNA and protein are still detectable, albeit at reduced levels. Figure 2C is not convincing; therefore, Figures 2C and 2D can be omitted.

      The following language was added to the manuscript, “Our results are consistent with the literature which shows that that the GPNMB gene in DBA/2J mice carries a nonsense mutation that leads to reduced RNA stability (Anderson et al., 2008).” We can appreciate that the nomenclature "DBA/2J GPNMB+" versus "DBA/2J" could be confusing. However, this is the standard language used in multiple publications, and we want to remain consistent with the literature. Based on your recommendation we have removed Figure 2 C and D and updated the methods and results sections accordingly.   

      (4) Figures 2J-L: The claim that gene expression changes are "significantly higher in DBA/2J animals compared to fold changes seen in chondrocytes from DBA/2J GPNMB+ controls" is not supported by the current presentation. The data should be plotted on the same graphs, and appropriate statistical analysis (e.g., two-way ANOVA) must be performed.

      Graphs for figure 2 have been updated and the appropriate analyses have been performed. 

      (5) Figure 6: The GPNMB expression data in the presence and absence of IL-1β at 0 and 10 minutes are missing.

      We apologize for the confusion. We corrected the mistake and removed the mention of the timepoints 0 and 10 minutes.  

      Reviewer #2 (Recommendations for the Authors):

      Consider unifying terminology around "GPNMB" and "osteoactivin": The term "osteoactivin" is used in some contexts and "GPNMB" in others. Since the focus is GPNMB's role in cartilage, suggest using a single term throughout to prevent confusion.

      Thank you for your comment. We include osteoactivin for clarification purposes once in the abstract, introduction and discussion. 

      In summary, we believe we have addressed all comments/concerns raised by the reviewers. We appreciate the opportunity to improve the quality of our manuscript.

      References

      Abdelmagid, S. M., Belcher, J. Y., Moussa, F. M., Lababidi, S. L., Sondag, G. R., Novak, K. M., Sanyurah, A. S., Frara, N. A., Razmpour, R., & Del Carpio-Cano, F. E. (2014). Mutation in osteoactivin decreases bone formation in vivo and osteoblast differentiation in vitro. The American journal of pathology, 184(3), 697-713. 

      Abdelmagid, S. M., Sondag, G. R., Moussa, F. M., Belcher, J. Y., Yu, B., Stinnett, H., Novak, K., Mbimba, T., Khol, M., Hankenson, K. D., Malcuit, C., & Safadi, F. F. (2015). Mutation in Osteoactivin Promotes Receptor Activator of NFκB Ligand (RANKL)-mediated Osteoclast Differentiation and Survival but Inhibits Osteoclast Function. J Biol Chem, 290(33), 2012820146. https://doi.org/10.1074/jbc.M114.624270  

      Anderson, M. G., Nair, K. S., Amonoo, L. A., Mehalow, A., Trantow, C. M., Masli, S., & John, S. W. (2008). GpnmbR 150Xallele must be present in bone marrow derived cells to mediate DBA/2J glaucoma. BMC genetics, 9(1), 1-14. 

      Hwang, H., Park, I., Hong, J., Kim, J., & Kim, H. (2021). Comparison of joint degeneration and pain in male and female mice in DMM model of osteoarthritis. Osteoarthritis and Cartilage, 29(5), 728738. 

      Ma, H.-L., Blanchet, T., Peluso, D., Hopkins, B., Morris, E., & Glasson, S. (2007). Osteoarthritis severity is sex dependent in a surgical mouse model. Osteoarthritis and Cartilage, 15(6), 695-700.

    1. Author Response:

      Reviewer #1 (Public Review):

      This ms targets an interesting question, whether changes of feedforward inhibition at the DG-CA3 synapses regulate the representational capabilities of contextual fear memory at CA1 and the anterior cingulate cortex (ACC). The paper exploits a recent tool developed by the group (viral-mediated shRNA interference of Ablim3 in DG), to enhance PV+ mediated inhibition of CA3 pyramidal cells by increasing both their recruitment by DG cells and their number of contacts over postsynaptic cells. Using micro-endoscopic imaging of mice experiencing contextual fear conditioning, the authors nicely evaluate the effect of feedforward inhibitory control of CA3 outputs in the formation, stabilization and specificity of contextual fear memory representations in the CA1 and ACC. Data is relevant to understand how specific microcircuit motifs can influence representational dynamics in downstream regions. I have some methodological comments and recommendations for authors to improve their presentation and to exclude potential confounding factors.

      1- Since imaging is performed in CA1 and ACC separately, the study design entails 4 groups: shNT vs shRNA which is the main experimental manipulation, plus CA1 vs ACC. While data is in general carefully presented, some analysis may require additional validation to discard whether some regional effects caused by manipulation may actually reflect group differences. This is important because there may be some differences between ACC and CA1 groups in some behavioral readout (e.g. Fig.2c; Fig.S2b) which may actually explains different effect of manipulation. Formal comparisons of behavior in ACC and CA1 shNT groups may be required to discard this effect.

      We compared behavior data in the control groups across brain region to test if our calcium imaging findings are driven by differences in groups rather than virus manipulation. We did not find a significant difference for any of the data sets (see figure legend Rebuttal Figure 1 a-d for details). In general, we tried to avoid presenting the same (or part of the same) dataset in multiple figures. An alternative would be to plot all 4 groups in 1 graph and test as such but that would decrease readability in our opinion. Therefore, we are happy to provide the additional graphs and analysis but prefer not to include them in the main manuscript. (Rebuttal Figure 1a-d).

      2- Differences of activity level (calcium rate) are examined using bins of 5 seconds for a total of 360 sec of exploratory activity. To discard motility effects an analysis is implemented using 1 sec bins. Thus, the two data samples are not commensurate. Also, an ANOVA on calcium rate is applied over uneven multiple comparisons to account for statistical effects of region x time or context x time. This is relevant for fig.1g vs 1i and Fig.S2j,l and may require correction.

      We assume you mean “1 minute” and not “1 sec” here. We presented the two datasets (calcium event rate) and moving index indeed using different time bins (5 sec and 1 minute respectively). It is true that a difference in binning and therefore different sample size in one factor (time) could affect the result of the ANOVA. Rebuttal Figure 1 e-f shows the behavior comparison made in Suppl.Figure 2b in the original manuscript with a 5 second bin. A 2-Way ANOVA with repeated measurements reveals no main virus effect [Two-way repeated measures ANOVA, ACC (e): virus x time effect 0.0113; virus main effect N.S., time main effect N.S., n=5 per group; CA1 (f): virus x time effect N.S.; virus main effect N.S., time main effect N.S., n=5 shNT, n=6 shRNA]. In ACC, we find a significant interaction effect but a posthoc Sidak test did not reveal a difference between virus groups at any time point. This confirms our previous findings that differences in movement do not seem to drive the differences between virus groups.

      3- Fig.3 nicely show accurate context classification based on calcium activity from A&C contexts neurons using support-vector machine. The authors report very interesting representational effects for shNT vs shRNA manipulations. Is prediction accuracy of the SVM classifier correlated with behavioral discrimination? That would reinforce conclusions.

      Thank you for raising this very interesting point and indeed, we found a positive correlation between the discrimination ratio and the accuracy of the SVM classifier (Pearson’s r, shNT: R2 = 0.5794, p= 0.0282, n=4; shRNA: R2= 0.5771, p= 0.0288 , n=4. We added these data in Figure 4 (Figure 4c) and in Rebuttal Figure 1g.

      Regarding conclusions and physiological relevance, the authors may need to discuss why enhanced feedforward inhibition at DG-CA3 synapses is not naturally established given the beneficial effect in context discrimination.

      We apologize that we did not make that aspect of our manipulation clearer in our discussion. We edited the introduction and discussion (LL 65, LL 365) to clearly convey that FFI in DG-CA3 is naturally temporarily increased following learning (Ruediger 2011, Ruediger 2012, Guo et al 2018).

      Reviewer #3 (Public Review):

      In this study, Twarkowski et al. aim to understand the role of a specific circuit motif, dentate gyrus (DG) to CA3 feed-forward inhibition (FFI), for memory encoding and consolidation. FFI is a ubiquitous circuit motif in the brain. As a result, providing insights on its function is an interesting and a potentially very impactful contribution to neuroscience.

      To tackle this issue, the authors describe how increasing DG-CA3 FFI impacts the ensemble activity in hippocampal area CA1 and the anterior cingulate cortex (ACC) in mice undergoing a contextual fear conditioning paradigm. To selectively increase FFI onto CA3 neurons, the study uses a molecular tool (downregulation of Ablim3 using virally mediated expression of shRNA), which has been developed by the same group (Guo et al, 2018, Nature Medicine). The impact of this manipulation is assessed via chronic in vivo one-photon Ca2+ imaging of dorsal CA1 and ACC neurons on the day of fear conditioning, one day after (recent recall), and 16 days after (remote recall) the fear conditioning. During and after fear conditioning, the results show in both experimental groups (shRNA and control) various population activity changes in both CA1 and ACC. Furthermore, the study finds improved context discrimination in the shRNA group only at the remote recall timepoint. The authors' conclusion is that increasing FFI enhances the formation of learning-specific ensembles, first in CA1 and later in ACC, which is associated with an improved memory recall. The experiments presented here were very technically challenging and produced a comprehensive and valuable dataset describing the parallel ensemble activity changes in CA1 and ACC after fear conditioning, with or without increasing DG-CA3 FFI. However, a causal relationship between the manipulation of DG-CA3 FFI, the network activity changes in CA1 and ACC, and the behavioral improvement is, in my opinion, not fully demonstrated. This is for a couple of reasons:

      1) The magnitude of the effect of the shRNA manipulation on the immediate downstream area CA3 remains unclear. Therefore, the findings in the downstream areas CA1 or even ACC (which is at least three synapses removed from CA3) are, in my opinion, difficult to interpret. This uncertainty includes (1) the extent of the virus injection in the dentate gyrus and the extent of subsequent changes in CA3, and (2) the effect of the manipulation on CA3 pyramidal cell activity in vivo. The original paper (Guo et al, 2018) uses in vitro voltage-clamp recordings to record EPSCs/IPSCs in CA3, but does not exclude possible compensatory changes in vivo, e.g., in the excitability of CA3 neurons, which could result from increasing FFI chronically over a few weeks. The data in Figures 1f and g seems to suggest that there are baseline activity changes in CA1, which might be caused by changes in the upstream CA3 network activity. Along the same lines, I am unsure how to interpret the comparisons between CA1 and ACC in Figure 1; within brain region comparisons are more relevant and should be shown instead.

      This is a great point and was raised by all reviewers. We acknowledge the weakness of this comparison, apologize for this misstep in our analysis and have accordingly, removed this dataset from our manuscript. Instead, we performed new experiments using in vivo electrophysiology to allow for cross-region comparison of LFPs in CA1 and ACC within the same animal. We removed data from Figure 1 e-i and added new, simultaneous electrophysiological LFP recordings (Figure 5 and supplementary Figure 4 in revised manuscript).

      We found an increased number of CA1 ripples that are coupled with ACC spindles (“coupled ripples”) in shRNA mice compared to control mice prior to a learning event (Figure 5c, two-tailed unpaired student’s t-test with Welch’s correction, p=0.0499, n=5) with no difference in time spend in slow-wave sleep (SWS) (supplementary Figure 4a) or total numbers of spindles or ripples (supplementary Figure 4b-c). Control mice show a learning-dependent increase in coupled ripples (Figure 5f, two-tailed paired student’s t-test, p=0.019, n=5) to a similar level as seen in shRNA mice prior to learning. No further increase is seen in shRNA mice indicating a saturation of circuit changes that cannot be further amplified following learning.

      2) Several parameters are used in this study to describe the network activity in CA1 and ACC. These include the number of correlated neuron pairs, the number of neurons active in both the training context and a neutral context (so-called A-C neurons), or the event rate observed in these A-C neurons. Most of the activity changes observed do not appear specific to the shRNA group and occur also under control condition, suggesting that they are not caused by an increase in DG-CA3 FFI. It would be helpful to clarify the sequence, how increasing FFI onto CA3 is hypothesized to cause the changes in CA1 or even ACC.

      We apologize for failing to make this clearer. Prior work has shown that learning increases FFI in DG-CA3 and downregulates Ablim3 in DG (Ruediger 2011, 2012, Guo et al 2018). Therefore, it is not surprising that we observe similar changes in the control (shNT) group as shRNA group.

      From previous work we know that shNT mice show increased DG-CA3 FFI following learning (training day) for approximately 24 hours (Guo et al, 2018). Thus, our manipulation allows us to mimic and boost a naturally occurring learning-induced synaptic modification in an inhibitory microcircuit in DGCA3 and examine the impact on network mechanisms underlying systems consolidation. Importantly, enhanced feedforward inhibition at the DG-CA3 synapses is naturally established for several hours following a spatial learning event (see Ruediger et al, 2011, Guo et al, 2018). Leveraging a molecular tool to enhance FFI prior to learning, we were able to reveal that DG-CA3 FFI plays a role in tuning the circuit towards cross-regional long-term storage of precise neuronal representations. (see also edits in text, LL 365).

    1. Author Response:

      Reviewer #1 (Public Review):

      [...]

      1. A notable shortcoming of the authors' interpretation is the generalization of their findings to preterm premature rupture of membranes (PPROM). As noted by the authors, term labor is considered a "sterile" process, which is particularly important in terms of the authors' findings since TLR4 in the fetal membranes may be responding to endogenous signals such as danger signals. However, a large proportion of PPROM cases are associated with microbial invasion of the amniotic cavity, and thus in this context TLR4 would be responding to bacterial products.

      To bring in some new elements and address this reviewer’s concern, along with the potential extrapolation between physiological rupture and pathological rupture in the case of PPROM, we decided first to remove Figure 3C (expression of TLR4 in the presence of LPS from bacterial origin) from the revised version of the manuscript. To address this comment, it is well known that the percentage of PPROM associated with microbial invasion are variable based on the weeks of gestation. In fact, early gestational ages are clearly linked to high-microbial-associated intra-amniotic inflammation prevalence (64.3% when <25 WGA) whereas this percentage subsequently decreases throughout gestation (Romero et al., 2015), reaching one-third at term, which better links with the gestational stage of the current study. Such observations support the fact that the TLR4 model in physiological rupture could be transposed—at least in part—to sterile PPROM and initiated by the presence of alarmins (i.e., HMGB1) and their binding to such type of receptors. Indeed, TLR4 is now well described as being stimulated by ligands other than LPS, such as HMGB1, a member of the DAMPs (Robertson et al., 2020). Furthermore, the quantification of TLR4 mRNA expression and protein in the case of PPROM without chorioamnionitis compared with term no labor without chorioamnionitis was already carried out (Kim et al., 2004), indicating an absence of clear link between the chorioamnionitis and TLR4 expression. Finally, in an animal model of PPROM, an article underlined the importance of TLR4 in preterm labor by using TLR4 mice mutants in a sterile context (Wahid et al., 2015).

      1. It is a well-known concept that TLR4 is expressed by the fetal membranes and is responsive to LPS stimulation, and thus the confirmatory set of experiments performed by the authors do not seem to be as novel. Indeed, given that this study was focused on the "sterile" process of term labor, perhaps the utilization of danger signals that can interact with TLR4 would be more appropriate.

      The choice to use LPS (Figure 3C) was only to confirm that TLR4 leads to a proinflammation activation in the amnion and choriodecidua, demonstrating the functional pathway after TLR4 activation in the fetal membranes environment. We completely agree these are not novel data; this is why we decided to remove this part of results in the revised version of the manuscript. Furthermore, we decided to not repeat the use of DAMPs (such as HMGB1) to stimulate the TLR4 pathway in this work because it was already published in the fetal membranes context (Bredeson et al., 2014). To be in accordance with your comments, we have modified the end of the results paragraph entitled ‘Combination of transcriptomic and methylomic results in the ZAM zone demonstrate that genes more expressed in the choriodecidua are linked to pregnancy pathologies’ to better justify the choice to focus on TLR4 global transcriptional regulation.

      1. The distinction between the ZAM and ZIM seems to have been lost among the TLR4-focused experiments, and thus it is unclear how these fetal membrane zones fit into the conceptual model proposed by the authors in the final figure.

      The reviewer is correct here, so to avoid confusion between the ZIM and ZAM used, we decided to do the following: - Read carefully all the successive paragraphs of the results to check for the presence of ‘ZAM specification’ - Add ‘ZAM’ in the legend of Figure 4. This information was present in the related text of the article. - Update Figure 7 and its legend (model of regulation). We had ‘ZAM zone’ in the discussion part regarding Figure 7.

      1. The study is largely descriptive and would benefit from the addition of fetal membrane tissues from pregnancy complications such as PPROM and/or animal models in which premature rupture of the membranes has been induced.

      We agree that animal models are available. Nevertheless, we considered that such models are far from the human reality. In fact, animal models are often used for fetal membrane studies, but they are different regarding pregnancy physiology, structure and uterine environment, which hamper their use. We used ‘term’ fetal membrane to decipher the physiological rupture of membrane and demonstrate the importance of the TLR4 actor. To bring some elements regarding this comment and the possible extrapolation between physiological rupture and pathological rupture in the case of PPROM, we decided to remove Figure 3C (expression of TLR4 in the presence of LPS from bacterial origin) to focus more on the physiological rupture of fetal membranes without the involvement of bacterial presence. Previous bibliographic data answer the reviewer’s question: Kim et al. (2004) well demonstrated that TLR4 mRNA levels are higher in PPROM (31.2 weeks of gestation) fetal membranes without chorioamnionitis than in term (39.1 week of gestation) ones without chorioamnionitis.

      1. The study focuses on the mechanisms of rupture of membranes, but does not provide an explanation as to how the regulation of TLR4 mediates the process of membrane rupture.

      We agree with your comment; however, ‘how the regulation of TLR4 mediates the process of membrane rupture’ is not the topic of the manuscript. In addition, this has already been well established in previous publications. Nevertheless, we added a sentence in the introduction part between the lines 97-100 : ‘The mechanisms implying TLR4 in the physiological or pathological rupture of membrane in case of PPROM are well known. Triggering TLR4 will lead to NFκB activation, leading to an increase of the release of proinflammatory cytokine, concentration of matrix metalloprotease and prostaglandin, which are well established actors of fetal membrane rupture (Robertson et al., 2020).

      Reviewer #2 (Public Review):

      This is a well-conceived and executed paper that adds novel data to improve our understanding of rupture of the human fetal membranes. The new information presented not only addresses gaps in our understanding of normal parturition mechanisms but also the significant issue of preterm birth. The authors highlight the need to understand the understudied human fetal membranes to be able to understand its role in normal parturition but also to lower the rates of preterm birth. They not only establish the need to study this tissue but also to improve our appreciation for regional differences within it, using a comprehensive genetic approach. The authors provide data from a genome wide methylation study and cross reference this with transcriptome data. Using this new knowledge, they then zero in on a specific gene of interest TLR4. This receptor is already established as an extremely important receptor for preterm birth but little is known about its role in normal parturition. Strengths of this paper stem from the comprehensive data set provided, answering both the questions pertaining to the specific aims of this paper but also potentially future questions and providing potential focused targets of study. One example of this may be the common methylated genes that are found in both the ZIM and ZAM, illustrating not regional changes but gestational programming of this tissue.

      We thank the reviewer for the positive and constructive comments regarding the article. Following all the reviewers’ comments, we now have an improved version.

      Reviewer #3 (Public Review):

      Manuscript by Belville et al describes the significance of epigenetic and transcription associated changes to TLR4 as a mechanistic event for sterile inflammation associated with fetal membrane weakening, specifically in the zone of altered morphology. This manuscript is timely in an understudied area of research.

      The authors have taken an extensive set of experiments to derive their conclusions.

      However, it is unclear why the focus is on TLR4. Although LPS is a ligand for TLR4, gram negative infections are rare in PPROM but mostly genital Mycoplasmas. The methylome and transcriptome analysis does not necessarily warrant examination of a single marker. A clear rationale would need to be included.

      We would like to thank the reviewer for their comments regarding the article. For the last part of the public review, we would like to underline the following:

      -The choice of focusing on TLR4 is explained in the article text between lines 161 and 165 by the following sentences: ‘Of all the genes classified in these processes, TLR4 was the only one represented in all these biological processes and, therefore, seems to play a central role in parturition at term. To validate this in-silico observation and pave the way for describing TLR4’s importance, immunofluorescence experiments were first conducted to confirm the protein’s presence in the amnion and choriodecidua of the ZAM (Figure 3B)’. Furthermore, this choice arises from analysis described in Figure 3A, which underlines that the four GO terms most represented have only one common gene: ‘TLR4’. The combination of two high-scale studies does not permit us to individually characterize how each gene is regulated. Nevertheless, the focus on TLR4 provides an original and interesting hypothesis on how a specific layer regulation between the amnion and choriodecidua could be cellular realised in the ZAM’s weaker zone. Finally, because the high-scale study results are public, this type of analysis could be conducted on other candidate genes.

      -Throughout the text, we changed all the ‘E. Coli’ to ‘Gram-negative bacteria’. Furthermore, as found in the literature, genital mycoplasma are considered ‘Gram-negative bacteria’. We focused on the ‘sterile inflammation phenomenon’, and to support the hypothesis concerning the importance of TLR4, we realised a supplementary transcriptome ‘ZAM heatmap’, which confirmed a sur-expression of DAMP in choriodecidua, S100A7, A8 and A9, for example, which are well-known ligands of TLR4 (given below as an image).

      Heatmap of genes differentially expressed in the ZAM zone in relation to the sterile inflammation phenomenon.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors analyzed several models for predicting the early onset of T2D, where they trained and tested on a UKB based cohort, aged 40 - 69 and suggest two simple logistic regression models: the anthropometric and the five blood tests models in reference to FINDRISC and GDRS models. Their models achieved better auROC, APS, and decile prevalence OR, and better-calibrated predictions.

      Strengths:

      1.The authors have neatly explained their objectives and performed well-justified analyses.

      2.The authors highlight how using both features - HbA1C% measure and reticulocyte count may provide a better indication of the average blood sugar level during the last two-three months than using just the standard HbA1C% measure.

      3.Further verification of the proposed anthropometric-based and 5 blood-test results-based modelscan discriminate discriminating within a group of normoglycemic participants and within a group of pre-diabetic participants resulted in outperforming the FINDRISC and the GDRS based models.

      Weaknesses:

      1. As the authors point out in the manuscript that these models are suited for the UKB cohort or populations with similar characteristics. It limits the extrapolation of these findings onto another cohort from a different background until analyzed on another country/continent-based cohort.

      We agree with this comment as we indeed pointed in the paper. We recommend to adjust these models when applying it to populations with distinct characteristics.

      1. In the methods section, an additional explanation of how the T2D prevalence bins were formed would be useful to a reader.

      We thank the reviewer for this note, we added the following explanation in section 4.11: “We considered several potential risk score limits that separate T2D onset probability in each of the scores groups, and we chose boundaries that showed a separation between the risk groups on the validation datasets. Once we decided on the boundaries of the score, we report the prevalence in each risk group on the test set and we report these results.”

      1. The authors have mentioned that the prevalence of diabetes has been rising more rapidly in low and middle-income countries (LMICs) than in high-income countries and the objective of the present research was to develop clinically usable models which are easy to use and highly predictive of T2D onset. As lifestyle is also one of the contributory factors for T2D, additional analysis that includes a comparison of groups between low-income and high-income subjects within UKB-based cohort provided such metadata available would help understand if the prevalence for T2D differs or not between such groups.

      We thank the reviewer for this comment, we added below an analysis that we run on our data, showing the deprivation indexes differences between sick and healthy populations. The sick population has a higher deprivation index as expected. When running a Mann-Whitney U Test on the data we get a p value of zero, creating this with a sample of just 1000 participants from each group, we get a p-value of 2.37e-137. This indicates that there is a significant correlation between deprivation index and tendency to develop T2D. We also add this finding to the supplementary material and a reference to it.

      You can also find below a SHAP diagram showing tht higher Townsend deprivation index is pushing the prediction for T2D upwards.

    1. Author Response

      Reviewer #2 (Public Review):

      Summary: This substantial collaborative effort utilized virus-based retrograde tracing from cervical, thoracic and lumbar spinal cord injection sites, tissue clearing and cutting-edge imaging to develop a supraspinal connectome or map of neurons in the brain that project to the spinal cord. The need for such a connectome-atlas resource is nicely described, and the combination of the actual data with the means to probe that data is truly outstanding.

      They then compared the connectome from intact mice to those of mice with mild, moderate and severe spinal cord injuries to reveal the neuronal populations that retain axons and synapses below the level of injury. Finally, they look for correlations between the remaining neuronal populations and functional recovery to reveal which are likely contributing to recovery and its variability after injury. Overall, they successfully achieve their primary goals with the following caveats: The injury model chosen is not the most widely employed in the field, and the anatomical assessment of the injuries is incomplete/not ideal.

      Concerns/issues:

      1) I would like to see additional discussion/rationale for the chosen injury model and how it compares to other more commonly employed animal models and clinical injuries. Please relate how what is being observed with the supraspinal connectome might be different for these other models and for clinical injuries.

      We have added text to the Results and Discussion to explain our rationale for selecting the crush injury model, and to acknowledge differences between this model and more clinically relevant contusion models. (Results: line 360-364, Discussion 608-615). We agree wholeheartedly that a critical future direction will be to deploy brain-wide quantification in contusion models, and we are currently seeking funding to obtain the needed equipment.

      2) The assessment of the thoracic injuries employed is not ideal because it provides no anatomical description of spared white matter (or numbers of spared axons) at the injury epicenter.

      We address this more fully in the related point below. Briefly, we agree with a need to improve the assessment of the lesion but are hampered by tissue availability. We are unable to assess white matter sparing but can offer quantification of the width of residual astrocyte tissue bridges in four spinal sections from each animal (new Figure 5 – figure supplement 3). As discussed below, however, we recognize the limitations of the lesion assessment and agree with the larger point that the current quantification methods do not position us to make claims about the relative efficacy of spinal injury analyses versus whole-brain sparing analyses to stratify severity or predict outcomes. Our approach should be seen as a complement, not a substitute, for existing lesion-based analyses. We have edited language throughout the manuscript to make this position clearer.

      3) Related to this, but an issue that requires separate attention is the highly variable appearance of the injury and tracer/virus injection sites, the variability in the spatial relationship with labeled neurons (lumbar) and how these differences could influence labeling, sprouting of axons of passage and interpretation of the data. In particular this is referring to the data shown in Figure 6 (and related data).

      It is true that there is some variability in the relative position of the injury and injection, a surgical reality. The degree of variability was perhaps exaggerated in the original Figure 6 (Now Figure 5), in which one image came from one of two animals in the cohort with a notably larger gap between the injury and injection. Nevertheless, this comment raises the important question of how variability in injection-to-injury distance might affect supraspinal label. First, we would emphasize the data in Figure 1 – Figure Supplement 6, in which we showed that the number of retrogradely labeled supraspinal neurons is relatively stable as injection sites are deliberately varied across the lower thoracic and lumbar cord. Indeed, the question raised here is precisely the reason we performed this early test to determine how sensitive the results might be to shifts in segmental targeting. The results indicate that retrograde labeling is fairly insensitive to L1 versus L4 targeting. As an additional check for this specific experiment we also measured the distance between the rostral spread of viral label and the caudal edge of the lesion and plotted it against the total number of retrogradely labeled neurons in the brain. If a smaller injury/injection gap favored more labeling we might expect negative correlation, but none is apparent. We conclude that although the injury/injection distance did vary in the experiment, it likely did not exert a strong influence on retrograde labeling.

      Reviewer #3 (Public Review):

      In this manuscript, Wang et al describe a series of experiments aimed at optimizing the experimental and computational approach to the detection of projection-specific neurons across the entire mouse brain. This work builds on a large body of work that has developed nuclear-fused viral labelling, next-generation fluorophores, tissue clearing, image registration, and automated cell segmentation. They apply their techniques to understand projection-specific patterns of supraspinal neurons to the cervical and lumbar spinal cord, and to reveal brain and brainstem connections that are preferentially spared or lost after spinal cord injury.

      Strengths:

      Although this work does not put forward any fundamentally new methodologies, their careful optimization of the experimental and quantification process will be appreciated by other laboratories attempting to use these types of methods. Moreover, the observations of topological arrangement of various supraspinal centres are important and I believe will be interesting to others in the field.

      The web app provided by the authors provides a nice interface for users to explore these data. I think this will be appreciated by people in the field interested in what happens to their brain or brainstem region of interest.

      Weaknesses:

      Overall the work is well done; however, some of the novelty claims should be better aligned with the experimental findings. Moreover, the statistical approaches put forward to understand the relationship between spinal cord injury severity and cell counts across the mouse brain needs to be more carefully considered.

      The authors state that they provide an experimental platform for these types of analysis to be done. My apologies if I missed it but I could not find anywhere the information on viral construct availability or code availability to reproduce the results. Certainly both of these aspects would be required for people to replicate the pipeline. Moreover, the described methodology for imaging and processing is quite sparse. While I appreciate that this information is widely provided in papers that have developed these methods, I do not think it is appropriate to claim to have provided a platform for people to enable these types of analyses without a more in-depth description of the methods. Alternatively, the authors could instead focus on how they optimized current methodologies and avoid the overstatement that this work provides a tool for users. The exception to this is of course the viral constructs, the plasmids of which should be deposited.

      We agree that we have not provided a tool per se, more of an example that could be followed. We have revised language in the abstract, introduction, and discussion to make it clear that we optimized existing methods and provide an example of how this can be done, but are not offering a “plug and play” solution to the problem of registration that would, for example, allow upload of external data. For example, in the abstract we replaced “We now provide an experimental platform” with “Here we assemble an experimental workflow.” (Line 28). The term “platform” no longer appears in the manuscript and has been replaced throughout by “example.” We how this matches the intention of the comment and are happy to revise further as needed. Note that the plasmids have been deposited to Addgene.

      It was not completely to me clear why or when the authors switch back and forth between different resolutions throughout the manuscript. In the abstract it states that 60 regions were examined, but elsewhere the number is as many as 500. My understanding is that current versions of the Allen Brain Annotation include more than 2000 regions. I think it would make things clear for the readers if a single resolution was used throughout, or at least justified narratively throughout the text to avoid confusion.

      Thank you for pointing this out. The Cellfinder application recognizes 645 discrete regions in the brain, and across all experiments we detected supraspinal nuclei in 69 of these. This number, however, includes some very fine distinctions, for example three separate subregions of vestibular nuclei, three subregions of the superior olivary complex, etc. True experts may desire this level of information, but with the goal of accessibility we find it useful to collapse closely related / adjacent regions to an umbrella term. Doing so generates a list of 25 grouped or summary regions. In the revised version we move the 69-region data completely to the supplemental data (there for the experts who wish to parse), and use the consistent 25-region system (plus cervical spinal cord in later sections) to present data in the main figures. We have added text to the Results section (lines 157-162) to clarify this grouping system.

      The others provide an interesting analysis of the difference between cervical and lumbar projections. I think this might be one of the more interesting aspects of the paper - yet I found myself a bit confused by the analysis, and whether any of the differences observed were robust. Just prior to this experiment the authors provide a comparison of the mScarlet vs. the mGL, and demonstrate that mGL may label more cells. Yet, in the cervical vs. lumbar analysis it appears they are being treated 1 to 1. Moreover, I could not find any actual statistical analysis of this data? My impression would be that given the potential difference in labelling efficiency between the mScarlet and mGL this should be done using some kind of count analysis that takes into account the overall number of neurons labelled, such as a Chi-sq test or perhaps something more sophisticated. Then, with this kind of statistical analysis in place, do any of the discussed differences hold up? If not, I do not think this would detract from the interesting topological observations - but would call on the authors to be a bit more conservative about their statements and discussion regarding differences in the proportions of neurons projecting to certain supraspinal centers.

      This is an important point. In response to this input and related comments from other reviewers we performed new experiments to assess co-localization. The new data address the point above by including quantification of the degree of colocalization that results from titer-matched co-injection of the two fluorophores, providing baseline data. The results of this can be found in Figure 3 – figure supplement 3 and form the basis for statistical comparisons to experimental animals shown in Figure 3.

      Finally, I do have some concerns about the author's use of linear regression in their analysis of brain regions after varying severities of SCI. First of all, the BMS score is notoriously non-linear. Despite wide use of linear regressions in the field to attempt to associate various outcomes to these kinds of ordinal measures, this is not appropriate. Some have suggested a rank conversion of the BMS prior to linear analyses, but even this comes with its own problems. Ultimately, the authors have here 2-3 clear cohorts of behavioral scores and drawing a linear regression between these is unlikely to be robustly informative. Moreover, it is unclear whether the authors properly adjusted their p-values from running these regressions on 60 (600?) regions. Finally, the statement in the abstract and discussion that the authors "explain more variability" compared to typical lesion severity analysis is also unsupported. My suggestion would be the following:

      Remove the linear regression analyses associated with BMS. I do not think these add value to the paper, and if anything provide a large window of false interpretation due to a violation of the assumptions of this test.

      Consider adding a more appropriate statistical analysis of the brain regions, such as a non-parametric group analysis. Knowing which brain regions are severity dependent, and which ones are not, would already be an interesting finding. This finding would not be confounded by any attempt to link it to crude measures of behavior.

      We agree that the linear regression approach was flawed and appreciate the opportunity to correct it. After consultation with two groups of statisticians we were forced to conclude that the data are simply underpowered for mixed model and ranking approaches. We therefore adopted a much simpler strategy. As you point out (and as noted by the statisticians), the behavioral data are bimodal; one group of animals regained plantar stepping ability, albeit with varying degrees of coordination (BMS 6-8), while the others showed at most rare plantar steps (BMS 0-3.5). We therefore asked whether the number of spared neurons in each brain region differed between the two groups and also examined the degree of “overlap” in the sparing values between the two groups. The data are now presented in Figure 6.

      If the authors would like to state anything about 'explaining more variability' then the proper statistical analysis should be used, which in this case would be to compare the models using a LRT or equivalent. However, as I mentioned it does not seem to be appropriate to be doing this with linear models so the authors should consider a non-linear equivalent if they choose to proceed with this.

      We thank the reviewer for the excellent suggestion. However as we explained above after consultation with two groups of statisticians we were forced to conclude that the data are underpowered and could not apply some of the methods suggested. Especially in light of our simplified analysis, we think it is better to remove any claims of the relative success of the sparing in different regions to explain more or less variability. Instead we can simply report that sparing in some regions, but not others, is significantly different between “low-performing” and “high-performing” groups.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper focuses on the role of historical evolutionary patterns that lead to genetic adaptation in cytokine production and immune mediated diseases including infectious, inflammatory, and autoimmune diseases. The overall goal of this research was to track the evolutionary trajectories of cytokine production capacity over time in a number of patients with different exposure to infectious organisms, infectious disease, autoimmune and inflammatory diseases using the 500 Functional Genomics cohort of the Human Functional Genomics Project. The identified cohort is made up of 534 individuals of Western European ancestry. Much of this focus is on the impact and limitations of certain datasets that they have chosen to use such as the "average genotyped dosage" to be substituted for missing variants and data interpretation.

      We fully agree with the reviewer, we replace missing variants in a sample with its average dosage in the entire dataset. This makes it so missing variants in a sample do not bias the trends over time we observe. If we were to correct it using only samples from within their own era we would be inflating differences between the different era's. Whereas only using shared variants would increase the noise for older samples due to higher error rates associated with DNA degradation.

      Moreover, some data pairings in the data set are not complete or had varying time points .

      The stimulation periods were chosen based on extensive studies that showed that the timepoints used were best suited for assessing monocyte-derived and lymphocyte-derived cytokines per stimulus. Not all the stimuli induce the production of all cytokines, so the selection of the cytokine-stimulus pairs was performed for those pairs in which a cytokine production could be measured (PMID: 1385767; PMID: 19380112; PMID: 27814509; PMID: 27814508; PMID: 27814507). The differences in the cytokine availability and time points are adjusted to the optimal time of production per stimuli. Monocyte-derived cytokines (IL-1b, IL-6 and TNFa) are early response cytokines, produced by innate immune cells shortly after stimulation. IFNg, IL-17 and IL-22 are lymphocyte-derived cytokines, produced by adaptive immune cells, in this case T helper cells. These cells need to differentiate for several days before they start to produce these cytokines, this is the reason why the time point of the measurements of these cytokines is 7 days. In the case of IFNg, it can also be produced by NK cells, so it was measured after 48h after stimulation in whole blood samples. We have included these considerations in the new version of the text (lines 82 to 87).

      Similarly, a split was done to look at before and after the Neolithic era and the linear regression correspond to those two eras. However, the authors do not comment or show the data to demonstrate why they choose that specific breakpoint as opposed to looking at every historical era transition, i.e., from early upper paleolithic to late upper paleolithic to Mesolithic to Neolithic to post-Neolithic to modern.

      We thank the reviewer for this remark and acknowledge that we do not address the rationale behind our choice to look at this split specifically sufficiently. We hypothesized that the start of the Neolithic with its increase in population density and contact with animals would also be a turning point for many immune responses and immune related traits. We added various analyses to better highlight this and also show differences between different adjacent time periods.

      -The original figures showed only models using two separate linear regression lines and the different thresholds for missing genotype rates showed consistent results. In the new figures we depict LOESS regression models to better show the difference in mean PRS at every point in time and we additionally show boxplots with the different major age periods pooling the paleolithic and mesolithic samples together as pre-neolithic samples in order to account for the lower sample number in the earlier historical periods. To highlight this we have added a new section in lines 123 to 129 and new versions of the figures 1, 2, 3 and 4.

      -In the new figure 2 we add LOESS regression models for which we do not bias our analysis into defining a break at a certain time period. We furthermore show boxplots with pairwise comparisons (student’s T-test) for broader time periods highlighting the changes in PRS that would correspond with major changes in human lifestyle such as the shift from a hunter-gatherer to a neolithic lifestyle or the rapid urbanization of human society.

      -In the new Figure 3 we confirm that the various traits showing a clear change in PRS start at the advent of the Neolithic or post-Neolithic era using both the LOESS regression and pairwise comparisons (student T-test).

      -Similarly the heatmap in our original figure 4 has also been revised to only show the large sample set.

      Lastly, the authors should highlight additional limitations of this current study in terms of the generalizability to other populations or to clearly state that this is limited to the European population at the specified latitude and longitudes used.

      We thank the reviewer for his feedback and agree we should put more emphasis on this. In our study we focus on summary statistics obtained from European populations and only employ European aDNA samples, so our results should not be extrapolated to other populations from other geographical areas. We have included this in the Discussion of the new version of the manuscript (lines 289 to 292). However, our findings are mostly in agreement with previous studies in other populations, which adds robustness to the results of our study.

      Reviewer #2 (Public Review):

      In "Evolution of cytokine production capacity in ancient and modern European populations", Dominguez-Andrés et al. collect a large amount of trait association data from various studies on immune-mediated disorders and cytokine production, and use this data to create polygenic scores in ancient genomes. They then use the scores to attempt to test whether the Neolithic transition was characterized by strong changes in the adaptive response to pathogens. The impact of pathogens in human prehistory and the evolutionary response to them is an intriguing line of inquiry that is now beginning to be approachable with the rapidly increasing availability of ancient genomes.

      While the study shows a commendable collection of association data, great expertise in immune biology and an interesting study question, the manuscript suffers from severe statistical issues, which makes me doubt the validity and robustness of their conclusions. I list my concerns below, in rough order of how important I believe they are to the claims of the paper:

      —In addition to the magnitude of an effect away from the null, P-values are a function of the amount of data one has to fit a model or test a hypothesis. In this case, the authors have vastly more data after the Neolithic Revolution than before, and so have much higher power to reject the null hypothesis of "no relationship to time" after the revolution than before. One can see this in the plots the authors provided, which show vastly more data after the Neolithic, and consequently a greater ability to fit a significant linear model (in any direction) afterwards as well.

      We thank the reviewer for raising this very important point. In order to account for this difference in sample size for the different historical periods we pooled all samples prior to the neolithic era together to test for differences in mean PRS between neighbouring historical periods. This way we lose some strength in terms of the carbon-dated age of each sample but we gain the ability to compare more different pairings than just pre- and post-neolithic samples. We added various analyses to better highlight this and also show differences between different adjacent time periods:

      -The original figures showed only models using two separate linear regression lines and the different thresholds for missing genotype rates showed consistent results. In the new figures we depict LOESS regression models to better show the difference in mean PRS at every point in time and we additionally show boxplots with the different major age periods pooling the paleolithic and mesolithic samples together as pre-neolithic samples in order to account for the lower sample number in the earlier historical periods. To highlight this we have added a new section in lines 123 to 129 and new versions of the Figures 1, 2, 3 and 4.

      -In the new figure 2 we add LOESS regression models for which we do not bias our analysis into defining a break at a certain time period. We furthermore show boxplots with pairwise comparisons (student’s T-test) for broader time periods highlighting the changes in PRS that would correspond with major changes in human lifestyle such as the shift from a hunter-gatherer to a neolithic lifestyle or the rapid urbanization of human society.

      -In the new figure 3 we confirm that the various traits showing a clear change in PRS start at the advent of the Neolithic or post-Neolithic era using both the LOESS regression and pairwise comparisons (student T-test).

      -Similarly the heatmap in our original figure 4 has also been revised to only show the large sample set.

      —The authors argue that Figure S2 makes their results robust to sample size differences, but showing a consistency in direction before and after downsampling in the post-neolithic samples is not enough, because:

      1) you still lack power to detect changes in direction before the Neolithic.

      2) even for the post-Neolithic, the relationship may be in the same direction but no longer significant after downsampling. How much the significance of the linear model fit is affected by the downsampling is not shown.

      We thank the reviewer for pointing this out. The low sample count dating back to before the Neolithic era makes it indeed hard to accurately detect changes in PRS significantly correlated with time. Instead, we now aim to pool these samples together and compare the distribution of their PRS with those of Neolithic samples to better be able to detect significant differences in PRS between these historical time periods.

      In order to show the significance of each linear model as well we now show the -Log10 of the P value multiplied by the sign of the correlation coefficient. This way we can better highlight the consistency in direction as well as significance and show that downsampling affects the order of significance. Please see the new Figure 4-figure supplement 1. We have also discussed this more in depth on lines 267-272 of the new version of the text.

      —The authors chose to test "relationship between PRS with time" before and after the Neolithic as a way to demonstrate that "the advent of the Neolithic was a turning point for immune-mediated traits in Europeans". A more appropriate way to test this would be creating a model that incorporates both sets of scores together, accounts for both sample size and genetic drift in the change of polygenic scores, and shows a significant shift occurs particularly in the Neolithic, rather in any other time period, instead of choosing the Neolithic as an "a priori" partition of the data. My guess is that one could have partitioned the data into pre- and post-Mesolithic and gotten similar results, largely due to imbalances in data availability.

      We agree with the reviewer that the exact pairing of the groups might influence the conclusions, showing the importance of remaining unbiased in our a priori partitioning of the data like the reviewer accurately pointed out. We aim to account for sample imbalances by pooling the paleolithic and mesolithic samples together and instead of just testing pre- versus post- Neolithic samples we perform a pairwise comparison between neighbouring historical periods using a T test thereby taking into account the sample size of each group.

      —The authors only talk about partitions before and after the Neolithic, but plots are colored by multiple other periods. Why is the pre- and post-Neolithic the only transition that is mentioned?

      Our initial hypothesis was that the pre-versus post-Neolithic shift was a turning point for immune responses. However, based on the suggestions of the reviewers, we have decided to perform the analysis in a more unbiased way, so we show the comparison of different individual era's. The new analyses and the new Figures provided address these issues.

      —Extrapolating polygenic scores to the distant past is especially problematic given recent findings about the poor portability of scores across populations (Martin et al. 2017, 2019) and the sensitivity of tests of polygenic adaptation to the choice of GWAS reference used to derive effect size estimates (Berg et al. 2019, Sohail et al. 2019). In addition to being more heavily under-represented, paleolithic hunter-gatherers are the most differentiated populations in the time series relative to the GWAS reference data, and so presumably they are also the genomes for which PGS estimates built using such a reference would have higher error (see, e.g. Rosenberg et al. 2019). Some analyses showing how believable these scores are is warranted (perhaps by comparing to phenotypes in distant present-day populations with equivalent amounts of differentiation to the GWAS panel).

      A similar study regarding standing height in ancient populations (PMID: 31594846) validated this approach when comparing polygenic scores based on modern populations with skeletal remains from ancient individuals. We do acknowledge the absolute results of the polygenic scores are less accurate for aDNA samples compared to a modern European cohort. The effect size estimates gained using a modern cohort are less accurate for aDNA samples than unrelated modern samples, and this is certainly an unavoidable limitation of the study.This is the reason why we focus on the direction of change of the trends and not on the absolute polygenic scores since such subtle differences do not affect the conclusions of our study.

      —In multiple parts of the paper, the authors mention "adaptation" as equivalent to the patterns they claim to have found, but alternative hypotheses like genetic drift are not tested (see e.g. Guo et al. 2018 for a review of methods that could be used for this).

      We thank the reviewer for this feedback. Based on this, we have added an Fst based test for selection to determine whether the changes we see in PRS over time are due to selection or due to genetic drift. This test shows that changes between the pre-Neolithic to Neolithic are not significantly different from drift whereas after the onset of the Neolithic we do see significant amount of selection. We have explained this further in the manuscript on lines 130-135 and included the new Table S2.

      New Table S2 : Tests for selection as opposed to genetic drift were performed between populations from adjacent time periods. A two tailed test was used to determine whether mean trait Fst between pre-Neolithic - Neolithic, Neolithic - post-Neolithic, and post-Neolithic - Modern samples was significantly different compared to 10000 random LD and MAF matched mean Fst’s calculated using a same amount of SNP’s.

      —250 kb window is too short a physical distance for ensuring associated loci that are included in the score are not in LD, and much shorter than standard approaches for building polygenic scores in a population genomic context (e.g. see Berg et al. 2019, Berisa et al. 2016). Is this a robust correction for LD?

      We thank the reviewer for this remark, we tested multiple thresholds for window sizes, increasing the window size from 250 kb to 500 kb and 1000 kb (please see below new Figure 1-figure supplement 2) Although the level of significance changes for a few traits, the direction of the change remains stable across the three thresholds, demonstrating the robustness of our results. We have chosen this approach because the aDNA samples present a too high error rate and contain a relatively high amount of missing data to accurately determine LD, and determining LD using a modern reference cohort would bias our analysis by assuming the aDNA samples have a similar LD structure as modern samples.

      New Figure 1-figure supplement 2: PRS correlation pre- and post-Neolithic revolution using polygenic scores calculated at varying window sizes.

      We have edited the manuscript accordingly to show the consistency between these varying window sizes on lines 111-113.

      —If one substitutes dosage with the average genotyped dosage for a variant from the entire dataset, then one is biasing towards the partitions of the dataset that are over-represented, in this case, post-Neolithic samples.

      We fully agree with the reviewer, however the substitution of missing dosages with average dosages prevents the introduction of the bias in our models caused by varying amounts of missing SNPs in the older samples. Although our average scores on an absolute level are largely influenced by the more abundant post-Neolithic samples, this reduces the odds of wrongfully observing significant trends caused by the sparsity of the data. While the absolute scores might be biased towards a certain value, the differences and thus the direction of the change in PRS is affected by the non-missing variants in each sample.

      —It seems from Figure 2, that some scores are indeed very sensitive to the choice of P-value cutoff (e.g., Malaria, Tuberculosis) and to the amount of missing data (e.g. HIV). This should be highlighted in the main text.

      The reviewer is right, and this is largely due to the fewer number of SNPs that are included in the model at stricter p-value cutoffs, which is in part a limitation of the available GWAS summary statistics. Using fewer SNPs in our PRS calculations reduces the variability between different samples which weakens our ability to accurately model changes in these specific complex traits and detect statistical significance. We have highlighted this in the main text on lines 193-196.

      —Some of the score distributions look a bit strange, like the Tuberculosis ones in Figure 2, which appear concentrated into particular values. Could this be because some of the scores are made with very few component SNPs?

      We thank the reviewer for pointing this out and this is indeed correct. At stricter thresholds fewer significant QTLs will be included in the polygenic score model. We chose to still show these plots to point out those results might more easily differ if more variants could be included. At more lenient thresholds more variants can be included increasing the power of the model but the score might be less informative for the trait that way.

    1. Author Response

      Reviewer #3 (Public Review):

      Myelodysplastic syndrome (MDS) is a heterogenous, clonal hematopoietic stem cell disorder characterized by morphological dysplasia in one or more hematopoietic lineages, cytopenias (most frequently anemia), and ineffective hematopoiesis. In patients with MDS, transfusion therapy treatment causes clinical iron overload; however it has been unclear if treatment with iron chelation yields clinical benefits. In the present study, the authors use a transgenic mouse model of MDS, NUP98-HOXD13 (referred to here as "MDS mice") to investigate this area. Starting at 5 months of age (before MDS mice progress to acute leukemia), the authors administered DFP in the drinking water for 4 weeks, and compared parameters to untreated MDS mice and WT controls.

      The authors first show that MDS mice exhibit systemic iron overload and macrocytic anemia that is improved by treatment with the iron chelator deferiprone (DFP). They then perform a detailed characterization the effects of DFP treatment on erythroid differentiation and various parameters related to iron transport and trafficking in MDS erythroblasts. Strengths of the work are the use of a well-characterized mouse model of MDS with appropriate animal group sizes and detailed analyses of systemic iron parameters and erythroid subpopulations. A remediable weakness is that in certain areas of the Results and Discussion, the authors overinterpret their findings by inferring causation when they have only shown a correlation. Additionally, when drawing conclusions based on changes in erythroblast mRNA expression levels between groups, the authors should consider that translation efficiency may be altered in MDS and that the NUP98 fusion protein itself, by acting as a chimeric transcription factor, may also impact gene expression profiles. Given that the application of chelators for treatment of MDS remains controversial, this work will be of interest to scientists focused on erythroid maturation and iron dysregulation in MDS, as well as clinicians caring for patients with this disorder.

      Major Comments

      1) The authors define the stages of erythroblast differentiation using the CD44-FSC method, which assumes that CD44 expression levels during the stages of erythroid differentiation are not altered by MDS itself. Are morphologically abnormal erythroblasts, such as bi-nucleate forms, captured in this analysis, and if so, are they classified in the appropriate subset? The percentage of erythroblasts in the bone marrow of MDS mice in this current study is lower than that reported by Suragani et al (Nat Med 2014), who employed a different strategy to define erythroid precursors. While representative erythroblast gating is presented as Supplemental Figure 17, it would be important to present representative gating from all 3 animal groups: WT, MDS, and MDS+DFP mice.

      We appreciate this comment and have added representative gating for all 3 groups to Supplemental Figure 17 (new Figure 3 – figure supplement 6 in the revised manuscript).

      2) Methods, "Statistical analysis." The authors state that all comparisons were done with 2-tailed student paired t test, which would not be appropriate for comparisons being made between independent animals groups (i.e. when groups are not "paired").

      We appreciate this comment and have reanalyzed all revised mouse data using one-way ANOVA with multiple comparisons and Tukey post-test analyses when more than 2 groups were compared. This has been edited in the Methods section in the revised manuscript.

      3) The Results (p.7) indicates that both sexes showed similar responses to DFP; however, the figure legends do not indicate sex. Given that systemic iron metabolism in mice shows sex-related differences, sex should be specified.

      We appreciate this comment and present here the gender-specific data for the reviewers’ evaluation (Author respone image 1). Similarly elevated transferrin saturation (a) (n = 3-4 male mice/group and n = 4-6 female mice/group) and hemoglobin (b) (n = 4-6 male mice/group and n = 4-9 female mice/group) are observed in male and female DFP-treated MDS mice. (c) Bone marrow erythroblasts are decreased to a greater degree in male relative to female DFP-treated MDS mice (n = 4-7 male mice/group and n = 8-9 female mice/group). We have added the data on gender-specific measures to new Figure 1 - figure supplement 3, Figure 2 – figure supplement 1, and Figure 3 – figure supplement 1 in the revised manuscript.

      Author respone image 1.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Xu et. al. does a very thorough characterization and molecular dissection of the role of SSH2 in spermatogenesis. Loss of SSh2 in germ cells results in germ cell arrest In step2-3 spermatids and eventually leads to germ cell loss by apoptosis. Molecular characterization of the mutant mice shows that the loss of SSH2 prevents the fusion of proacrosomal vesicles leading to the formation of a fragmented acrosome. The fragmentation of the acrosome is due to the impaired actin bundling and dephosphorylation of COFILIN. In short, this is a comprehensive body of work.

      We thank the referee for these insightful comments.

      Reviewer #2 (Public Review):

      The acrosome is a unique sperm-specific subcellular organelle required for the fertilization process, and it is also an organelle undergoing extensive morphological and structural transformation during sperm development. The mechanism underlying the extensive acrosome morphogenesis and biogenesis remains incompletely understood. Xu et al in their manuscript entitled "The Slingshot phosphatase 2 is required for acrosome biogenesis during spermatogenesis in mice" reported that the Slingshot Phosphatase 2 is essential for acrosome biogenesis and male fertility through their characterization of spermatogenic and acrosomal defects in Ssh2 knockout mice they generated. Specifically, the authors provided molecular, genetic, and subcellular evidence supporting that Ssh2 mutation impaired the phosphorylation of an acting-binding protein, COFILIN during spermiogenesis and accordingly actin cytoskeleton remodeling, crucial for proacrosomal vesicle trafficking and acrosome biogenesis. The manuscript by Xu et. al. does a very thorough characterization and molecular dissection of the role of SSH2 in spermatogenesis. Loss of SSh2 in germ cells results in germ cell arrest In step2-3 spermatids and eventually leads to germ cell loss by apoptosis. Molecular characterization of the mutant mice shows that the loss of SSH2 prevents the fusion of proacrosomal vesicles leading to the formation of a fragmented acrosome. The fragmentation of the acrosome is due to the impaired actin bundling and dephosphorylation of COFILIN. In short, this is a comprehensive body of work.

      We appreciate and thank Referee #2 for the positive feedback and insightful comments.

      Strengths:

      Nicely written manuscript, addresses an important mechanistic question of the roles of cytoskeleton remodeling in acrosome biogenesis and provided genetic, subcellular, and molecular evidence to build up their support for their hypothesis that Ssh2 regulates actin cytoskeleton remodeling, a process essential for proacrosomal vesicle trafficking and acrosome biogenesis, through dephosphorylation actin-binding protein during spermiogenesis.

      We again thank to the Referee #2 for appreciating and encouraging us regarding our current research work.

      Weaknesses:

      For body weight, and testis weight of the mutants, the authors concluded that there is no significant difference between the mutant and wildtype (Fig 1E -1G), but they appear to use mice between 6-8 wk old, both the testis and body weight of males at 6-8 wks is still growing, with the number of mice analyzed being six, you could easily miss the significant difference of the testis size and or body weight with such a varied age and a small sample size.

      We thank the referee for their prompting of this important discussion point, which we now cover in our revised manuscript. In our originally submitted manuscript, we only presented the data for body weight, testis weight, and T/B ratio for mice between the age of 6–8 weeks, however, we have added the additional data of mice with age more than 8 weeks in the revised manuscript in a new Figure 1E-1G with the sample size of 12 for each genotype. We have also updated the relevant content in the figure caption. The revised figure caption for Figure 1 panels E–G reads as follows: “(E-G) Body weights (26.3609 ± 0.4914 for WT; 25.1741 ± 0.5189 for Ssh2 KO), weights of the testes (0.0862 ± 0.0036 for WT; 0.0788 ± 0.0023 for Ssh2 KO), and the testis-to-body weight ratio (0.3281 ± 0.0153 for WT; 0.3154 ± 0.0135 for Ssh2 KO) of adult WT and Ssh2 KO males (n = 12). Data are presented as the mean ± SEM; p > 0.05 calculated by Student’s t-test. Bars indicate the range of the data.”

      Other points:

      Comments: 1) Could the uniform cytoplasmic distribution of diminutive actin filaments in the wild type and disrupted actin filament remodeling be examined at the EM level on the round spermatids?

      We apologize for the confusion. Previously, we conducted a transmission electron microscopy (TEM) analysis on the testes samples to discover the distribution and ultrastructural organization of F-actin in WT and Ssh2 KO round spermatids. Unfortunately, even at high magnification (30,000x, right panel of Figure R1-Response Figure 1) by TEM of testicular section no diminutive actin filament was observed in the cytoplasm of round spermatids except for the acroplaxome-an actin-rich specialized structure anchors the acrosome-in WT spermatids as well as some thick bundle-like structures located at the acrosomal region of Ssh2 KO spermatids (Fig. R1). According to their unique characteristic of appearance, we interpreted these electron-dense bundles as the aberrantly aggregated actin filaments whose lengths are in accordance with the lengths of COFILIN-saturated F-actin fragments (Bamburg et al., 2021), suggesting the disrupted actin filament remodeling during acrosome biogenesis resulted from Ssh2 KO. However, due to the technological limitations of TEM and the complexity of intracellular environment of round spermatids, we only recognized few aggregated actin bundles with the loss of filamentous appearance in Ssh2 KO spermatids and no typical diminutive actin filament was detected which had been imaged under high-resolution cryo-TEM (Haviv et al., 2008) or live-cell total internal reflection fluorescence microscopy (Johnson et al., 2015) on the purified actin bundles and cultured cells. Given the lack of effective approaches to culture murine round spermatids in vitro, confocal microscopy of flourescence-labelled F-actin (e.g., IF staining by FITC-phalloidin) is a more accessible method for visualizing the disruption of actin remodeling than EM in murine spermatids as the actin-related findings that several other studies demonstrated (Djuzenova et al., 2015; Meenderink et al., 2019).

      Comments: 2) Any other defects are seen besides acrosome in the mutant testis given the important roles of actin cytoskeleton network and high expression of Ssh2 in spermatocytes, were chromatoid bodies or mitochondria affected in any way? Any other defects in the mice overall including female fertility and other organs, given the previously reported roles in the nervous system. It could be helpful information for others interested in Ssh 2 protein and actin cytoskeleton's roles in general.

      The referee has here raised an interesting point. Firstly, besides the acrosome-related defects in Ssh2 KO spermatids, we identified increased germ cell apoptosis and aberrant activation of apoptotic Bcl-2/Caspase-3 pathway in the testes of Ssh2 KO mice which were speculated to be triggered by the disordered COFILIN-mediated F-actin remodeling and have attracted our attention to further elucidate the underlying mechanisms in the future. Secondly, given the high expression of SSH2 in spermatocytes demonstrated by IF staining shown in figure 4B and 4C,we thus performed the surface chromosome spreading on spermatocytes to observe whether the morphology of chromatid bodies and the meiotic progression was affected by Ssh2 KO and no obvious defects were observed as shown in supplementary Figure S3 in originally submitted manuscript. Thirdly, no obvious morphological abnormality in chromatin or mitochondrial structure was detected in Ssh2 KO germ cells such as spermatocytes and round spermatids under TEM which prevents us to pursue it further. Fourthly, we have observed the potential effect(s) of Ssh2 KO on female fertility using Ssh2 KO female mice and did not find any obvious infertility defect in Ssh2 KO females compared to their WT littermates as demonstrated by the data of the body weight, ovary weight, ovary-to-body weight ratio, size of ovaries and fertility test as well as the images of ovarian HE staining (Fig. R1). Moreover, given that during our investigation period, Ssh2 KO males and females did not manifest any defective physical development, aberrant physiological status or mental disorder notwithstanding the roles of SSH2 in neurite extension had been reported (Endo, Ohashi, & Mizuno, 2007), we did not conduct the experiments to observe the effect(s) of SSH2 in other organs except for the female fertility.

      Fig. R1 No reproductive defects were found in Ssh2 KO females. (A-C) Body weights, weights of the ovaries, and the ovary-to-body weight ratio of adult WT and Ssh2 KO females aged 8-10 weeks (n = 5); p > 0.05 calculated by Student’s t-test. Bars indicate the range of data. (D) The size of ovaries from Ssh2 KO were indistinguishable from ovaries of WT mice age 8 weeks, n = 4. (E) Histology of the ovaries from WT and Ssh2 KO mice. Sections were stained with hematoxylin and eosin. Scale bars: 200 μm. Images are representative of ovaries extracted from 8-week-old adult female mice per genotype. (F) Number of pups per litter from WT and Ssh2 KO male mice (8 weeks old) after crossing with WT adult male mice (n =3); p > 0.05 calculated by Student’s t-test. Bars indicate the range of the data.

      Comments: 3) Providing detailed information on the number of animals used and cells analyzed in the legend is nice, but it might be even better for the readers to include sample size and the number of cells examined in the figure/graph if possible.

      We appreciate the suggestions from the reviewer. We have integrated some information of sample size in the figures where appropriate. Firstly, we integrated sample size in the figure 1C, 1E, 1F, 1G and 1I. Secondly, we included sample size and the number of seminiferous tubule/epididymal duct we evaluated for TUNEL (+) cell counting in figure 2C and figure 2D. Thirdly, we included sample size and the number of spermatids for co-localization in figure 6B and figure 6D.

      Comments: 4) Nice discussion and comparison with GOPC and GM130, how about comparison and discussion with other acrosome defective mutants like PICK1, and ATG to provide some insights into acrosome biogenesis and proacrosomal vesicle trafficking?

      We greatly appreciate the referee for positive appraisal of our work with constructive suggestions, unfortunately, we are unable to address these defective mutants with certainty due to the lack of proper sample accessibility (only 3 of 16-month-old Ssh2 KO mice are accessible now). We compared the cytological staining of GM130 and GOPC in WT and Ssh2 KO spermatids using tubule squash sections as the description in the originally submitted manuscript which are prepared from fresh testes originated from 8-week-old mice and we now have several aged Ssh2 KO mice which prevent us to achieve the staining of PICK1 and ATG. PICK1 was previously reported to facilitate vesicle trafficking from the Golgi apparatus to the acrosome which co-localizes with GOPC in the proacrosomal granules (Xiao et al., 2009) and the phenotypes of Pick1 KO mice share a lot of similar characteristics with that of Ssh2 KO mice such as the fragmentation of the acrosome and increased germ cell apoptosis. Both autophagy-related ATG5 (Huang et al., 2021) and ATG7 (Wang et al., 2014) were reported to participate in the process of acrosome biogenesis and ATG7 is required for proacrosomal vesicle transportation/fusion by conjugating LC3 to the membrane of proacrosomal vesicles. Although the spermatids evaluated in these KO mice models could still be developed into spermatozoa with defective acrosome that is different from the situation in Ssh2 KO mice, it would be meaningful to discover the affects by Ssh2 KO on the localization of these regulators of acrosome biogenesis in spermatids and their potential interactions with SSH2. Indeed, in future work, we plan to pursue these issues and the content related to PICK1 has been added to the discussion in the revised manuscript as follows: “Moreover, it is intriguing to note that the phenotypes of Ssh2 KO mice share a lot of similarities with that of Pick1 KO model (Xiao et al., 2009) such as acrosome fragmentation and enhanced germ cell apoptosis, suggesting the possibility that SSH2 and PICK1 work together in a same trafficking machinery functioning in acrosome biogenesis which needs to be clarified further.”

      Comments: 5) Given the literature on Cofilin's requirement for male fertility and the increased p-Cofilin in Ssh2 mutant testis by Western and IF, the authors have a strong case for their hypothesis. But given the general role of phosphatase, it might be prudent to discuss alternative possibilities.

      We thank the reviewer for these valuable suggestions. Given that p-COFILIN is the only known substrate of SSH2 based on previous reports, we focused principally on this cascade to conduct our investigation. As a phosphatase, SSH2 is very likely to interact with many other proteins functioning in various cellular processes other than the actin-binding proteins which remain elusive. As directed, we now have added some content related to the regarding above concern in the discussion section of the revised manuscript as follows: “Given the diverse physiological roles reported for Slingshot family proteins, the possibility of the alternative mechanism underlying involvement of SSH2 in cellular events beyond the COFILIN-mediated actin remodeling should be noted. According to some publicly accessible databases as the indicators of potential protein–protein interactions such as BioGRID (Oughtred et al., 2019) and IntAct (Del Toro et al., 2022), SSH2 might interact with a set of actin-based molecular motors covering MYH9, MYO19 and MYO18A, which have been implicated in the maintenance of Golgi morphology and Golgi anterograde vesicular trafficking via the PI4P/GOLPH3/MYO18A/F-actin pathway (Rahajeng et al., 2019).”

    1. Author Response

      Reviewer #2 (Public Review):

      Zylbertal and Bianco propose a new model of trial-to-trial neuronal variability that incorporates the spatial distance between neurons. The 7-parameter model is attractive because of its simplicity: A neuron's activity is a function of stimulus drive, neighboring neurons, and global inhibition. A neuroscientist studying almost any brain area in any model organism could make use of this model, provided that they have access to 1) simultaneously-recorded neurons and 2) the spatial locations of those neurons. I could foresee this model being the de-facto model to compare to all future models, as it is easy to code up and interpret. The paper explores the effectiveness of this distance model by modeling neural activity in the zebrafish optic tectum. They find that this distance-based model can capture 1) bursting found in spontaneous activity, 2) ongoing co-fluctuations during stimulus-evoked activity, and 3) adaptation effects during prey-catching behavior.

      Strengths:

      The main strength of the paper is the interpretability of the distance-based model. This model is agnostic to the brain area from which the population of neurons is recorded, making the model broadly applicable to many neuroscientists. I would certainly use this model for any baseline comparisons of trial-to-trial variability.

      The model is assessed in three different contexts, including spontaneous activity and behavior. That the model provides some prediction in all three contexts is a strong indicator that this model will be useful in other contexts, including other model organisms. The model could reasonably be extended to other cognitive states (e.g., spatial attention) or accounting for other neuron properties (such as feature tuning, as mentioned in the manuscript).

      The analyses and intuition to show how the distance-based model explains adaptation were insightful and concise.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      Model evaluation and comparison: The paper does not fully evaluate the model or its assumptions; here, I note details in which evaluation is needed. A key assumption of the model - that correlations fall off in a gaussian manner (Fig. 1C-E - is not supported by Fig. 1C, which appears to have an exponential fall-off. Functions other than gaussian may provide better fits.

      A key feature of our model is that connection strengths smoothly decrease with distance. However, we did not intend to make strong claims about the exact function parametrizing this distance relationship. In light of the reviewer’s comment, we have additionally tested an exponential function and find that it too can describe activity correlations in OT with a negligible decrease in r2 (Figure 1 – figure supplement 1A-C). The main purpose of the analysis was to show that the correlation is maximal around the seed and decays uniformly with distance from it (i.e. no sub-networks or cliques are detected). We have emphasized this in a revised conclusion paragraph and note that while multiple functions can be used to parameterize the relationship, they are nonetheless certainly simplifications. Secondly, we also ran a version of the network simulation where the connections decay in space according to an exponential rather than Gaussian function and show that, as expected, tectal bursting is robust to this change.

      Furthermore, it is not clear whether the r^2s in Fig. 1E are computed in a held-out manner (more details about what goes into computing r^2 are needed).

      These values are computed by fitting the 2-d Gaussian (or exponential function) to all neurons excluding the seed itself (added a short clarification in the Methods).

      Assessing the model based on peak location alone (Fig. 1E) is not sufficient, as other smooth monotonically-decreasing functions may perform similarly.

      As discussed above, an exponential function indeed performs similarly to a Gaussian. However, goodness of fit is secondary to the main aim of Fig 1E, which is to show that the correlation peak tends to fall near the seed cell.

      Simulating from the model greatly improves the reader's understanding (Fig. 2D), but no explanation is given for why the simulations (Fig. 2D) have almost no background spikes and much fewer, non-co-occurring bursts than those of real data (Fig. 2E).

      In part this is because the simulation results depicted in Fig 2D were derived from the ‘baseline model’, prior to optimizing to match biological bursting statistics. It is thus expected that activity will differ from experimental observation and was our main motive to tune the model parameters (now emphasized in the text). However, the model will certainly not account for all aspects of tectal activity; rather, it was designed to reproduce bursting as a prominent feature of ongoing activity and in the second part of the paper we explore the extent to which it can account for other phenomena. As noted above, in the revised abstract, introduction and discussion we have tried to clarify the motivation for developing the model and how it was used to gain insight into activity-dependent changes in network excitability.

      A key assumption of the distance model (Fig. 2A) is that each neuron has the same gaussian fall-off (i.e., sigma_excitation and sigma_inhibition), but it is unclear if the data support this assumption.

      We intentionally opted for a simple model (i.e. described by few parameters), in part due to the lack of connectivity data and additionally to set a lower bound on the extent to which multiple features of tectal activity could be accounted for. More complex models with additional degrees of freedom (such as cell-specific connectivity) may well describe the data better, but likely at the cost of interpretability. We consider such extensions are beyond the scope of the present study but might be fruitful avenues for future research.

      Although an excitatory and inhibitory gain is assumed (Fig. 2A), it is not clear from the data (Fig. 1C) that an inhibitory gain is needed (no negative correlations are observed in Fig. 1C-D).

      This is now explored in the revised Figure 3A which includes the condition of zero inhibition gain. See also response to reviewer 1.

      After optimization (Fig. 3), the model is evaluated on predicting burst properties but not evaluated on predicting held-out responses (R^2s or likelihoods), and no other model (e.g., fitting a GLM or a model with only an excitatory gain) is considered. In particular, one may consider a model in which "assemblies" do exist - does such an assembly model lead to better held-out prediction performance?

      The model we developed is a mechanistic, generative model. In contrast to Pillow et al 2008, we did not fit the model to data but rather we used it to simulate network activity and tuned the seven parameters (using EMOO) to best match biological observations. Thus, rather than assessing goodness-of-fit using cross-validation, our approach involved comparison of summary statistics related to the target emergent phenomenon (tectal bursting). This was necessary as bursting appears highly stochastic. Further to the comments above, we have expanded the parameter space to include instances with only an excitatory gain (where bursting failed) and no distance-dependence (again, busting failed). Introducing assemblies into the model will inevitably support bursting (and introduce many more free parameters), but one of our key observations is that such assemblies are not required for this aspect of spontaneous activity. Again, our aim was not to produce a detailed picture of tectal connectivity, but rather to develop a minimal model and estimate the extent to which it can account for observed features of activity. Note that the second half of the paper (Figure 4 onwards) shows the model can explain phenomena that were not considered during parameter tuning.

      It is unclear why a genetic algorithm (Fig. 1A-C) is necessary versus a grid search; it appears that solutions in Generation 2 (Fig. 3C, leftmost plot, points close to the origin) are as good as solutions in Generation 30 and that the spreads of points across generations do not shrink (as one would expect from better mutations). Given the small number of parameters (7), a grid search is reasonable, computationally tractable, and easier to understand for all readers (Fig. 3A).

      Perhaps in hindsight a grid search would have worked, but at increased computational cost (each instantiation of the model is computationally expansive). At the time we chose EMOO, and since it produced satisfactory results, we kept it. As often happens with multi-objective optimization, an improvement in one objective usually happens at the expense of other objectives, so the spread of the points does not shrink much but they move closer to the axes (i.e. reduced error). The final parameter combination is closer to the origin than any point in generation 2, though admittedly not by much. Importantly, however, optimizing the model using the training features generalized to other burst-related statistics.

      It is unclear why the excitatory and inhibitory gains of the temporal profiles (Fig. 3I) appear to be gaussian but are formulated as exponential (formula for I_ij^X in Methods).

      The interactions indeed have exponential decay in time. These might appear Gaussian because the axis scale is logarithmic.

      Overall, comparing this model to other possible (similar) models and reporting held-out prediction performance will support the claim that the distance model is a good explanation for trial-to-trial variability.

      See comments above. A key point we want to stress is that we intentionally explored a minimal network model and found that, despite obvious simplifications of the biology, it was nonetheless able to explain multiple aspects of tectal physiology and behaviour. We hope that it inspires future studies and can be extended, in parallel to experimental findings, to more accurately represent the cell-type diversity and cell-specific connectivity of the tectal network.

      Data results: Data results were clear and straightforward. However, the explanation was not given for certain results. For example, the relationship between pre-stimulus linear drive and delta R was weak; the examples in Fig. 4C do not appear to be representative of the other sessions. The example sessions in Fig. 4C have R^2=0.17 and 0.19, the two outliers in the R^2 histogram (Fig. 4D).

      The revised figure 4 is based on new data and new analysis (see below), and the presented examples no longer represent the extreme tail of the distribution (they still, however, represent strong examples, as is now explicitly indicated in the figure legend).

      The black trace in Fig. 4D has large variations (e.g., a linear drive of 25 and 30 have a change in delta R of ~0.1 - greater than the overall change of the dashed line at both ends, ~0.08) but the SEMs are very tight. This suggests that either this last fluctuation is real and a major effect of the data (although not present in Fig. 4C) or the SEM is not conservative enough. No null distribution or statistics were computed on the R^2 distribution (Fig. 4C, blue distribution) to confirm the R^2s are statistically significant and not due to random fluctuations.

      We agree that this was not sufficiently robust and in response to this comment we undertook a significant revision to figure 4 and the associated text:

      i) The revised figure is based on an entirely new dataset, allowing us to verify the results on independent data. We used 5 min ISI for all stimulus presentations, regardless of stimulus type (high or low elevation), thus ensuring that we are only examining differences in state brought about by previous ongoing activity, without risk of ‘contamination’ by evoked activity.

      ii) As per the reviewer’s suggestion, we compared model-estimated pre-stimulus state to a null estimate using randomly sampled time-points. We additionally compared the optimised model with the baseline model. Whereas the null (random times) estimates had no predictive power, both models using pre-stimulus activity were able to explain a fraction of the response residuals with the optimised model performing better.

      iii) We refined the binning process by first computing, for each response, the mean of response residuals across neurons for each bin of estimated linear drive, and then averaging across responses. This prevents the relationship being skewed by rare instances involving unusually large numbers of neurons for a particular linear drive bin, and thereby eliminates the fluctuations the reviewer was referring to.

      The absence of any background activity in Fig. 6B (e.g., during the rest blocks) is confusing, given that in spontaneous activity many bursts and background activity are present (Fig. 2E).

      The raster only presents evoked responses and no background activity is shown. This has been clarified in the revised figure and legend.

      Finally, it appears that the anterior optic tectum contributes to convergent saccades (CS) (Fig. 7E) but no post-saccadic activity is shown to assess how activity changes after the saccade (e.g., plotting activity from 0 to 60).

      Activity before and after the saccade is shown in Fig 7A. Fig 7E shows the ‘linear drive’ (or ‘excitability’), and how it changes leading up to the saccade. Since we were interested in the association between pre-saccade state and saccade-associated activity, we did not plot post-saccadic linear drive. However, as can be seen in the below figure for the reviewer, linear drive is strongly suppressed by the saccade, as expected due to CS-associated activity.

      No explanation is given why activity drops ~30 seconds before a convergent saccade (Fig. 7E).

      This is no longer shown after we trimmed the history data in Fig 7E in accordance with a comment from reviewer 1. We speculate, however, that the mean linear drive of a compact population of neurons would be somewhat periodical, since a high linear drive leads to a burst which results in a prolonged inhibition (low linear drive) with a slow recovery and so on.

      No statistical test is performed on the R^2 distribution (Fig. 7H) to confirm the R^2s (with a mean close to R^2=0.01) are meaningful and not due to random fluctuations.

      We revised the analysis in Fig 7 along the same lines as the revision of Fig 4. Model-estimated linear drive predicts CS-associated activity whereas a null estimate (random times) shows no such relationship.

      Presentation: A disjointed part of the paper is that for the first part (Figs. 1-3), the focus is on capturing burst activity, but for the second part (Figs. 4-7), the focus is on trial-to-trial variability with no mention of bursts. It is unclear how the reader should relate the two and if bursts serve a purpose for stimulus-evoked activity.

      In the first part of the paper (Figs. 1-3), we use ongoing activity to develop an understanding (formulated as a network model) of how activity modulates the network state. In the second part, we test this understanding in the context of evoked responses and show that model-estimated network state explains a fraction of visual response variability and experience-dependent changes in activity and behaviour. In the revised MS we further emphasize this idea and have edited the results text to strengthen the connections between these parts of the study. See also comments above.

      Citations: The manuscript may cite other relevant studies in electrophysiology that have investigated noise correlations, such as:

      • Luczak et al., Neuron 2009 (comparing spontaneous and evoked activity).

      • Cohen and Kohn, Nat Neuro 2011 (review on noise correlations).

      • Smith and Kohn, JNeurosci 2008 (looking at correlations over distance).

      • Lin et al., Neuron 2015 (modeling shared variability).

      • Goris et al., Nat Neuro 2014 (check out Fig. 4).

      • Umakantha et al., Neuron 2021 (links noise correlation and dim reduction; includes other recent references to noise correlations).

      We agree that the manuscript could benefit from citing some of these suggested studies and have added citations accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors find CpGs within 500Kb of a gene that associate with transcript abundance (cis-eQTMs) in children from the HELIX study. There is much to admire about this work. With two notable exceptions, their work is solid and builds/improves on the work that came before it. Their catalogue of eQTMs could be useful to many other researchers that utilize methylation data from whole blood samples in children. Their annotation of eQTMs is well thought out and exhaustive. As this portion of the work is descriptive, most of their methods are appropriate.

      Unfortunately, their use of results from a model that does not account for cell-type proportions across samples diminishes the utility and impact of their findings. I believe that their catalog of eQTMs contains a great deal of spurious results that primarily represent the differences in cell-type proportions across samples.

      Lastly, the authors postulate that the eQTM gene associations found uniquely in their unadjusted model (in comparison to results from a model that does account for cell type proportion) represent cell-specific associations that are lost when a fully-adjusted model is assumed. To test this hypothesis, the authors appear to repurpose methods that were not intended for the purposes used in this manuscript. The manuscript lacks adequate statistical validation to support their repurposing of the method, as well as the methodological detail needed to peer review it. This section is a distraction from an otherwise worthy manuscript. But provide evidences that enriched for cell sp CpGs.

      Major points

      1. Line 414-475: In this section, the authors are suggesting that CpGs that are significant without adjusting for cell type are due to methylation-expression associations that are found only in one cell type, while association found in the fully adjusted model are associations that are shared across the cell types. I do not agree with this hypothesis, as I do not agree that the confounding that occurs when cell-type proportions are not accounted for would behave in this way. Although restricting their search for eQTMs to only those CpGs proximal to a gene will reduce the number of spurious associations, a great deal of the findings in the authors' unadjusted model likely reflect differences in cell-type proportions across samples alone. The Reinius manuscript, cited in this paper, indicates that geneproximal CpGs can have methylation patterns that vary across cell types.

      Following reviewers’ recommendations, we have reconsidered our initial hypothesis about the role of cellular composition in the association between methylation and gene expression. Although we still think that some of the eQTMs only found in the model unadjusted for cellular composition could represent cell specific effects, we acknowledge that the majority might be confounded by the extensive gene expression and DNA methylation differences between cell types. Also, we recognize that more sophisticated statistical tests should be applied to prove our hypothesis. Because of this, we have decided to report the eQTMs of the model adjusted for cellular composition in the main manuscript and keep the results of the model unadjusted for cellular composition only in the online catalogue.

      1. Line 476-488: Their evidence due to F-statistics is tenuous. The authors do not give enough methodological detail to explain how they're assessing their hypothesis in the results or methods (lines 932-946) sections. The methods they give are difficult to follow. The results in figure S19A are not compelling. The citation in the methods (by Reinius) do not make sense, because Reinius et al did not use F-statistics as a proxy for cell type specificity. The citation that the authors give for this method in the results does not appear to be appropriate for this analysis, either. Jaffe and Irizarry state that a CpG with a high Fstatistic indicates that the methylation at that CpG varies across cell type. They suggest removing these CpGs from significant results, or estimating and correcting for cell type proportions, as their presence would be evidence of statistical confounding. The authors of this manuscript indicate that they find higher F-statistics among the eQTMs uniquely found in the unadjusted model, which seems to only strengthen the idea that the unadjusted model is suffering from statistical confounding.

      We recognize the miss-interpretation of the F-statistic in relation to cellular composition. We have deleted all this part from the updated version of the manuscript.

      1. The methods used to generate adjusted p-values in this manuscript are not appropriate as they are written. Further, they are nothing like the methods used in the paper cited by the authors. The Bonder paper used permutations to estimate an empirical FDR and cites a publication by Westra et al for their method (below). The Westra paper is a better one to cite, because the methods are more clear. Neither the Bonder nor the Westra paper uses the BH procedure for FDR.

      Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238-1243 (2013).

      We apologize for this misleading citation. Although Bonder et al applied a permutation approach to adjust for multiple testing, our approach was inspired by the method applied in the GTEx project (GTEx consortium, 2020), using CpGs instead of SNPs. The citation has been corrected in the manuscript. Moreover, we have explained in more detail the whole multiple-testing processes in the Material and Methods section (page 14, line 316):

      “To ensure that CpGs paired to a higher number of Genes do not have higher chances of being part of an eQTM, multiple-testing was controlled at the CpG level, following a procedure previously applied in the Genotype-Tissue Expression (GTEx) project (Gamazon et al., 2018). Briefly, our statistic used to test the hypothesis that a pair CpGGene is significantly associated is based on considering the lowest p-value observed for a given CpG and all its pairs Gene (e.g. those in the 1 Mb window centered at the TSS). As we do not know the distribution of this statistic under the null, we used a permutation test. We generated 100 permuted gene expression datasets and ran our previous linear regression models obtaining 100 permuted p-values for each CpG-Gene pair. Then, for each CpG, we selected among all CpG-Gene pairs the minimum p-value in each permutation and fitted a beta distribution that is the distribution we obtain when dealing with extreme values (e.g. minimum) (Dudbridge and Gusnanto, 2008). Next, for each CpG, we took the minimum p-value observed in the real data and used the beta distribution to compute the probability of observing a lower p-value. We defined this probability as the empirical p-value of the CpG. Then, we considered as significant those CpGs with empirical p-values to be significant at 5% false discovery rate using BenjaminiHochberg method. Finally, we applied a last step to identify all significant CpG-Gene pairs for all eCpGs. To do so, we defined a genome-wide empirical p-value threshold as the empirical p-value of the eCpG closest to the 5% false discovery rate threshold. We used this empirical p-value to calculate a nominal p-value threshold for each eCpG, based on the beta distribution obtained from the minimum permuted p-values. This nominal p-value threshold was defined as the value for which the inverse cumulative distribution of the beta distribution was equal to the empirical p-value. Then, for each eCpG, we considered as significant all eCpG-Gene variants with a p-value smaller than nominal p-value.”

      References:<br /> GTEx consortium, The GTEx Consortium atlas of genetic regulatory effects across human tissues, Science (2020) Sep 11;369(6509):1318-1330. doi: 10.1126/science.aaz1776.

      Reviewer #2 (Public Review):

      Strength:

      Comprehensive analysis Considering genetic factors such as meQTL and comparing results with adult data are interesting.

      We thank the reviewer for his/her positive feedback on the manuscript. We agree that the analysis of genetic data and the comparison with eQTMs described in adults are two important points of the study.

      Weakness:

      • Manuscript is not summarized well. Please send less important findings to supplementary materials. The manuscript is not well written, which includes every little detail in the text, resulting in 86 pages of the manuscript.

      Following reviewers’ comments, we have simplified the manuscript. Now only the eQTMs identified in the model adjusted for cellular composition are reported. In addition, functional enrichment analyses have been simplified without reporting all odds ratios (OR) and p-values, which can be seen in the Figures.

      • Any possible reason that the eQTM methylation probes are enriched in weak transcription regions? This is surprising.

      Bonder et al also found that blood eQTMs were slightly enriched for weak transcription regions (TxWk). Weak transcription regions are highly constitutive and found across many different cell types (Roadmap Epigenetics Consortium, 2015). However, hematopoietic stem cells and immune cells have lower representation of TxWk and other active states, which may be related to their capacity to generate sub-lineages and enter quiescence.

      Given that we analyzed whole blood and that ROADMAP chromatin states are only available for blood specific cell types, each CpG in the array was annotated to one or several chromatin states by taking a state as present in that locus if it was described in at least 1 of the 27 bloodrelated cell types. By applying this strategy we may be “over-representing” TxWk chromatin states, in the case TxWk are cell-type specific. As a result, even if each blood cell type might have few TxWk, many positions can be TxWk in at least one cell type, inflating the CpGs considered as TxWk. This might have affected some of the enrichments.

      On the other hand, CpG probe reliability depends on methylation levels and variance. TxWk regions show high methylation levels, which tend to be measured with more error. This also might have impacted the results, however the analysis considering only reliable probes (ICC >0.4) showed similar enrichment for TxWk.

      Besides these, we do not have a clear answer for the question raised by the reviewer.

      References:

      Bonder MJ, Luijk R, Zhernakova D V, Moed M, Deelen P, Vermaat M, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet [Internet]. 2017 [cited 2017 Nov 2];49:131–8. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27918535

      Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu YC, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R, Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh KH, Feizi S, Karlic R, Kim AR, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V, Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA, Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, De Jager PL, Farnham PJ, Fisher SJ, Haussler D, Jones SJ, Li W, Marra MA, McManus MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai LH, Wang W, Waterland RA, Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, Kellis M. Integrative analysis of 111 reference human epigenomes. Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248. PMID: 25693563; PMCID: PMC4530010.

      • The result that the magnitude of the effect was independent of the distance between the CpG and the TC TSS is surprising. Could you draw a figure where x-axis is the distance between the CpG site and TC TSS and y-axis is p-value?

      As suggested by the reviewer, we have taken a more detailed look at the relationship between the effect size and the distance between the CpG and the TC’s TSS. First, we confirmed that the relative orientation (upstream or downstream) did not affect the strength of the association (p-value=0.68). Second, we applied a linear regression between the absolute log2 fold change and the log10 of the distance (in absolute value), finding that they were inversely related. We have updated the manuscript with this information (page 22, line 504):

      “We observed an inverse linear association between the eCpG-eGene’s TSS distance and the effect size (p-value = 7.75e-9, Figure 2B); while we did not observe significant differences in effect size due to the relative orientation of the eCpG (upstream or downstream) with respect to the eGene’s TSS (p-value = 0.68).”

      Results are shown in Figure 2B. Of note, we winsorized effect size values in order to improve the visualization. The winsorizing process is also explained in Figure 2 legend. Moreover, we have done the plot suggested by the reviewer (see below). It shows that associations with smallest p-values are found close to the TC’s TSS. Nonetheless, as this pattern is also observed for the effect sizes, we have decided to not include it in the manuscript.

      • Concerned about too many significant eQTMs. Almost half of genes are associated with methylation. I wonder if false positives are well controlled using the empirical p-values. Using empirical p-value with permutation may mislead since especially you only use 100 permutations. I wonder the result would be similar if they compare their result with the traditional way, either adjusting p-values using p-values from entire TCs or adjusting pvalues using a gene-based method as commonly used in GWAS. Compare your previous result with my suggestion for the first analysis.

      Despite the number of genes (TCs) whose expression is associated with DNA methylation is quite high, we do not think this is due to not correctly controlling false positives. Our approach is based on the method used by GTEx (GTEx consortium) and implemented in the FastQTL package (Ongen et al. 2016), to control for positives in the eQTLs discovery. As in GTEx, we run 100 permutations to estimate the parameters of a beta distribution, which we used to model the distribution of p-values for each CpG. Then, to correct for the number of TCs among significant CpGs, we applied False Discovery Rate (FDR) at a threshold < 0.05. Finally, we defined the final set of significant eQTMs using the beta distribution defined in a previous step.

      For illustration, we compared the number of eQTMs with our approach to what we would obtain by uniquely applying the FDR method (adjusted p-value <0.05), getting fewer associations with our approach: eQTMs (45,203 with FDR vs 39,749 with our approach), eCpGs (24,611 vs 21,966) and eGenes (9,937 vs 8,886). Among the 8,886 significant eGenes, 6,288 of them are annotated to coding genes, thus representing 27% of the 23,054 eGenes coding for a gene included in the array.

      References:

      GTEx consortium, The GTEx Consortium atlas of genetic regulatory effects across human tissues, Science (2020) Sep 11;369(6509):1318-1330. doi: 10.1126/science.aaz1776.

      Ongen et al. Fast and efficient QTL mapper for thousands of molecular phenotypes, Bioinformatics (2016) May 15;32(10):1479-85. doi: 10.1093/bioinformatics/btv722. Epub 2015 Dec 26.

      • I recommend starting with cell type specific results. Without adjusting cell type, the result doesn't make sense.

      As suggested by other reviewers, we have withdrawn the model unadjusted for cellular composition.

      Reviewer #3 (Public Review):

      Although several DNA methylation-gene expression studies have been carried out in adults, this is the first in children. The importance of this is underlined by the finding that surprisingly few associations are observed in both adults and children. This is a timely study and certain to be important for the interpretation of future omic studies in blood samples obtained from children.

      We agree with the reviewer that eQTMs in children are important for interpreting EWAS findings conducted in child cohorts such as those of the Pregnancy And Childhood Epigenetics (PACE) consortium.

      It is unfortunate that the authors chose to base their reporting on associations unadjusted for cell count heterogeneity. They incorrectly claim that associations linked to cell count variation are likely to be cell-type-specific. While possible, it is probably more likely that the association exists entirely due to cell type differences (which tend to be large) with little or no association within any of the cell types (which tend to be much smaller). In the interests of interpretability, it would be better to report only associations obtained after adjusting for cell count variation.

      Following reviewers’ recommendations, we have reconsidered our initial hypothesis about the role of cellular composition in the association between methylation and gene expression. Although we still think that some of the eQTMs only found in the model unadjusted for cellular composition could represent cell specific effects, we acknowledge that the majority might be confounded by the extensive gene expression and DNA methylation differences between cell types. Also, we recognize that more sophisticated statistical tests should be applied to prove our hypothesis. Because of this we have decided to report the eQTMs of the model adjusted for cellular composition in the main manuscript and keep the results of the model unadjusted for cellular composition only in the online catalogue.

      Several enrichments could be related to variation in probe quality across the DNA methylation arrays.

      For example, enrichment for eQTM CpG sites among those that change with age could simply be due to the fact age and eQTM effects are more likely to be observed for CpG sites with high quality probes than low quality probes. It is more informative to instead ask if eQTM CpG sites are more likely to have increasing rather than decreasing methylation with age. This avoids the probe quality bias since probes with positive associations with age would be expected to have roughly the same quality as those with negative associations with age. There are several other analyses prone to the probe quality bias.

      See answer to question 2, below.

    1. Author Response:

      Reviewer #1:

      This work provides insight into the effects of tetraplegia on the cortical representation of the body in S1. By using fMRI and an attempted finger movement task, the researchers were able to show preserved fine-grained digit maps - even in patients without sensory and motor hand function as well as no spared spinal tissue bridges. The authors also explored whether certain clinical and behavioral determinates may contribute to preserving S1 somatotopy after spinal cord injury.

      Overall I found the manuscript to be well-written, the study to be interesting, and the analysis reasonable. I do, however, think the manuscript would benefit by considering and addressing two main suggestions.

      1) Provide additional context / rationale for some of the methods. Specific examples below:

      a) The rationale behind using the RSA analysis seemed to be predicated on the notion that the signals elicited via a phase-encoded design can only yield information about each voxel's preferred digit and little-to-no information about the degree of digit overlap (see lines 163-166 and 571-575). While this is the case for conventional analyses of these signals, there are more recently developed approaches that are now capable of estimating the degree of somatotopic overlap from phase-encoded data (see: Da Rocha Amaral et al., 2020; Puckett et al., 2020). Although I personally would be interested in seeing one of these types of analyses run on this data, I do not think it is necessary given the RSA data / analysis. Rather, I merely think it is important to add some context so that the reader is not misled into believing that there is no way to estimate this type of information from phase-encoded signals. - Da Rocha Amaral S, Sanchez Panchuelo RM, Francis S (2020) A Data-Driven Multi-scale Technique for fMRI Mapping of the Human Somatosensory Cortex. Brain Topogr 33 (1):22-36. doi:10.1007/s10548-019-00728-6 - Puckett AM, Bollmann S, Junday K, Barth M, Cunnington R (2020) Bayesian population receptive field modeling in human somatosensory cortex. Neuroimage 208:116465. doi:10.1016/j.neuroimage.2019.116465

      We did not intend to give the impression that inter-finger overlap can only be estimated using RSA. To clarify this, we included a sentence in our methods section stating that inter-finger overlap cannot be estimated using the traditional travelling wave approach, but new methods have estimated somatotopic overlap from travelling wave data. Since our RSA approach lends itself for estimating inter-finger overlap and is currently the gold standard in characterizing these representational patterns, we opt –in accordance with the reviewer’s comment– not to include this additional analysis.

      Revised text Methods:

      “While the traditional traveling wave approach is powerful to uncover the somatotopic finger arrangement, a fuller description of hand representation can be obtained by taking into account the entire fine-grained activity pattern of all fingers. RSA-based inter-finger overlap patterns have been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). RSA-based measures are furthermore not prone to some of the problems of measurements of finger selectivity (e.g., dependence on map thresholds). The most common approach for investigating inter-finger overlap is RSA, as used here, though note that somatotopic overlap has recently been estimated from travelling wave data using an iterated Multigrid Priors (iMGP) method and population receptive field modelling (Da Rocha Amaral et al., 2020; Puckett et al., 2020).”

      b. The rationale for using minimally thresholded (Z>2) data for the Dice overlap analysis as opposed to the threshold used in data visualization (q<0.05) was unclear. Providing the minimally thresholded maps (in Supplementary) would also aid interpretation of the Dice overlap results.

      We followed previously published procedures for calculating the Dice overlap between the two split-halves of the data (Kikkert et al., 2016; J. Kolasinski et al., 2016; Sanders et al., 2019). We used minimally thresholded data to calculate the dice overlap to ensure that our analysis was sensitive to overlaps that would be missed when using high thresholds. We clarified this in the revised manuscript. We thank the reviewer for their suggestion to add a Figure displaying the minimally thresholded split-half hard-edged finger maps - we have added this to the revised manuscript as Figure 2-Figure supplement 1.

      To ensure that our thresholding procedure did not change the results of the dice overlap analysis, we repeated this analysis using split-half maps that were thresholded using a q < 0.05 FDR criterion (as was used to create the travelling wave maps in Figures 2A-B). We found the same results as when using the Z >2 thresholding criterion: Overall, split-half consistency was not significantly different between patients and controls, as tested using a robust mixed ANOVA (F(1,17.69) = 0.08, p = 0.79). There was a significant difference in split- half consistency between pairs of same, neighbouring, and non-neighbouring fingers (F(2,14.77) = 38.80, p < 0.001). This neighbourhood relationship was not significantly different between the control and patient groups (i.e., there was no significant interaction; F(2,14.77) = 0.12, p = 0.89). We have included this analysis and the relating figure as Figure 2- Figure supplement 2 in the revised manuscript.

      Revised text Methods:

      “We followed previously described procedures for calculating the DOC between two halves of the travelling wave data (Kikkert et al., 2016; Kolasinski et al., 2016; Sanders et al., 2019). The averaged finger-specific maps of the first forward and backward runs formed the first data half. The averaged finger-specific maps of the second forward and backward runs formed the second data half. The finger-specific clusters were minimally thresholded (Z>2) on the cortical surface and masked using an S1 ROI, created based on Brodmann area parcellation using Freesurfer (see Figure 2– figure supplement 1 for a visualisation of the minimally thresholded split-half hard-edged finger maps used to calculate the DOC). We used minimally thresholded finger-specific clusters for the DOC analysis to ensure we were sensitive to overlaps that would be missed when using high thresholds. Note that results were unchanged when thresholding the finger-specific clusters using an FDR q < 0.05 criterion (see Figure 2 – figure supplement 2).”

      2) Provide a more thorough discussion - particularly with respect to the possible role of top-down processes (e.g., attention).

      a) The authors discuss a few potential signal sources that may contribute to the maintenance of (and ability to measure) the somatotopic maps; however, the overall interpretation seems a bit "motor efferent heavy". That is, it seems the authors favor an explanation that the activity patterns measured in S1 were elicited by efference copies from the motor system and that occasional corollary discharges or attempted motor movements play a role in their maintenance over time. The authors consider other explanations, noting - for example - the potential role of attention in preserving the somatotopic representations given that attention has been shown to be able to activate S1 hand representations. The mention of this was, however, rather brief - and I believe the issue deserves a bit more of a balanced consideration.

      When the authors consider the possible role of attention in maintaining the somatotopic representations (lines 329-333), they mention that observing others' fingers being touched or attending to others' finger movements may contribute. But there is no mention of attending to one's own fingers (which has been shown to elicit activity as cited). I realize that the patients lack sensorimotor function (and hence may find it difficult to "attend" to their fingers); however, they have all had prior experience with their fingers and therefore might still be able to attend to them (or at least the idea of their digits) such that activity is elicited. For example, it is not clear to me that it would be any more difficult for the patients to be asked to attend to their digits compared to being asked to attempt to move their digits. I would even suggest that attempting to move a digit (regardless of whether you can or not) requires that one attends to the digit before attempting to initiate the movement as well as throughout the attempted motor movement. Because of this, it seems possible that attention-related processes could be playing a role in or even driving the signals measured during the attempted movement task - as well as those involved in the ongoing maintenance of the maps after injury. I don't think this possibility can be dismissed given the data in hand, but perhaps the issue could be addressed by a bit more thorough of a discussion on the process of "attempting to move" a digit (even one that does not move) - and the various top-down processes that might be involved.

      We thank the reviewer for their consideration and insights into the potential mechanisms underlying our results. We have now elaborated further on the possibility that attention- related processes might have contributed to the reported effects, also in consideration of comment 3.4.

      Revised text Discussion:

      “Spared spinal cord tissue bridges can be found in most patients with a clinically incomplete injury, their width being predictive of electrophysiological information flow, recovery of sensorimotor function, and neuropathic pain (Huber et al., 2017; Pfyffer et al., 2021, 2019; Vallotton et al., 2019). However, in this study, spared midsagittal spinal tissue bridges at the lesion level, motor function, and sensory function did not seem necessary to maintain and activate a somatotopic hand representation in S1. We found a highly typical hand representation in two patients (S01 and S03) who did not have any spared spinal tissue bridges at the lesion level, a complete (S01) or near complete (S03) hand paralysis, and a complete (S01) or near complete loss (S03) of hand sensory function. Our predictive modelling results were in line with this notion and showed that these behavioural and structural spinal cord determinants were not predictive of hand representation typicality. Note however that our sample size was limited, and it is challenging to draw definite conclusions from non-significant predictive modelling results.”

      “How may these representations be preserved over time and activated through attempted movements in the absence of peripheral information? S1 is reciprocally connected with various brain areas, e.g., M1, lateral parietal cortex, poster parietal area 5, secondary somatosensory cortex, and supplementary motor cortex (Delhaye et al., 2019). After loss of sensory inputs and paralysis through SCI, S1 representations may be activated and preserved through its interconnections with these areas. Firstly, it is possible that cortico-cortical efference copies may keep a representation ‘alive’ through occasional corollary discharge (London and Miller, 2013). While motor and sensory signals no longer pass through the spinal cord in the absence of spinal tissue bridges, S1 and M1 remain intact. When a motor command is initiated (e.g., in the form of an attempted hand movement) an efference copy is thought to be sent to S1 in the form of corollary discharge. This corollary discharge resembles the expected somatosensory feedback activity pattern and may drive somatotopic S1 activity even in the absence of ascending afferent signals from the hand (Adams et al., 2013; London and Miller, 2013). It is possible that our patients occasionally performed attempted movements which would result in corollary discharge in S1. Second, it is likely that attempting individual finger movements poses high attentional demands on tetraplegic patients. Accordingly, attentional processes might have contributed to eliciting somatotopic S1 activity. Evidence for this account comes from studies showing that it is possible to activate somatotopic S1 hand representations through attending to individual fingers (Puckett et al., 2017) or through touch observation (Kuehn et al., 2018). Attending to fingers during our attempted finger movement task may have been sufficient to elicit somatotopic S1 activity through top-down processes in the tetraplegic patients who lacked hand motor and sensory function. Furthermore, one might speculate that observing others’ or one’s own fingers being touched or directing attention to others’ hand movements or one’s own fingers may help preserve somatotopic representations. Third, it is possible that these somatotopic maps are relatively hardwired and while they deteriorate over time, they never fully disappear. Indeed, somatotopic mapping of a sensory deprived body part has been shown to be resilient after dystonia (Ejaz et al., 2016; though see Burman et al., (2009) and Taub et al., (1998)) and arm amputation (Bruurmijn et al., 2017; Kikkert et al., 2016; Wesselink et al., 2019). Fourth, it is possible that even though a patient is clinically assessed to be complete and is unable to perceive sensory stimuli on the deprived body part, there is still some ascending information flow that contributes to preserving somatotopy (Wrigley et al., 2018). A recent study found that although complete paraplegic SCI patients were unable to perceive a brushing stimulus on their toe, 48% of patients activated the location appropriate S1 area (Wrigley et al., 2018). However, the authors of this study defined the completeness of patients’ injuries via behavioural testing, while we additionally assessed the retained connections passing through the SCI directly via quantification of spared spinal tissue bridges through structural MRI. It is unlikely that spinal tissue carrying somatotopically organised information would be missed by our assessment (Huber et al., 2017; Pfyffer et al., 2019). Our experiment did not allow us to tease apart these potential processes and it is likely that various processes simultaneously influence the preservation of S1 somatotopy and elicited the observed somatotopic S1 activity.”

      Reviewer #2:

      The authors investigate SCI patients and characterize the topographic representation of the hand in sensorimotor cortex when asked to move their hand (which controls could do but patients could not). The authors compare some parameters of topographic map organization and conclude that they do not differ between patients and controls, whereas they find changes in the typicality of the maps that decrease with years since disease onset in patients. Whereas these initial analyses are interesting, they are not clearly related to a mechanistic model of the disorder and the underlying pathophysiology that is expected in the patients. Furthermore, additional analyses on more fine-grained map changes are needed to support the authors' claims. Finally, the major result of changed typicality in the patients is in my view not valid.

      • Concept 1. At present, there is no clear hypotheses about the (expected or hypothesized) mechanistic changes of the sensorimotor maps in the patients. The authors refer to "altered" maps and repeatedly say that "results are mixed" (3 times in the introduction).

      We thank the reviewer for highlighting to us that our introduction and hypotheses were unclear and/or incomplete to them. We have restructured our Introduction to better highlight competing hypotheses on how SCI may change S1 hand representations, the reasons for our analytical approach, and elaborate on our hypotheses.

      Revised text Introduction:

      “Research in non-human primate models of chronic and complete cervical SCI has shown that the S1 hand area becomes largely unresponsive to tactile hand stimulation after the injury (Jain et al., 2008; Kambi et al., 2014; Liao et al., 2021). The surviving finger-related activity became disorganised such that a few somatotopically appropriate sites but also other somatotopically nonmatched sites were activated (Liao et al., 2021). Seminal nonhuman primate research has further demonstrated that SCI leads to extensive cortical reorganisation in S1, such that tactile stimulation of cortically adjacent body parts (e.g., of the face) activated the deprived brain territory (e.g., of the hand; Halder et al., 2018; Jain et al., 2008; Kambi et al., 2014). Although the physiological hand representation appears to largely be altered following a chronic cervical SCI in non-human primates, the anatomical isomorphs of individual fingers are unchanged (Jain et al., 1998). This suggests that while a hand representation can no longer be activated through tactile stimulation after the loss of afferent spinal pathways, a latent and somatotopic hand representation could be preserved regardless of large-scale physiological reorganisation.

      A similar pattern of results has been reported for human SCI patients. Transcranial magnetic stimulation (TMS) studies induced current in localised areas of SCI patient’s M1 to induce a peripheral muscle response. They found that representations of more impaired muscles retract or are absent while representations of less impaired muscles shift and expand (Fassett et al., 2018; Freund et al., 2011a; Levy et al., 1990; Streletz et al., 1995; Topka et al., 1991; Urbin et al., 2019). Similarly, human fMRI studies have shown that cortically neighbouring body part representations can shift towards, though do not invade, the deprived M1 and S1 cortex (Freund et al., 2011b; Henderson et al., 2011; Jutzeler et al., 2015; Wrigley et al., 2018, 2009). Other human fMRI studies hint at the possibility of latent somatotopic hand representations following SCI by showing that attempted movements with the paralysed and sensory deprived body part can still evoke signals in the sensorimotor system (Cramer et al., 2005; Freund et al., 2011b; Kokotilo et al., 2009; Solstrand Dahlberg et al., 2018). This attempted ‘net’ movement activity was, however, shown to substantially differ from healthy controls: Activity levels have been shown to be increased (Freund et al., 2011b; Kokotilo et al., 2009; Solstrand Dahlberg et al., 2018) or decreased (Hotz- Boendermaker et al., 2008), volumes of activation have been shown to be reduced (Cramer et al., 2005; Hotz-Boendermaker et al., 2008), activation was found in somatotopically nonmatched cortical sites (Freund et al., 2011b), and activation was poorly modulated when patients switched from attempted to imagined movements (Cramer et al., 2005). These observations have therefore mostly been attributed to abnormal and/or disorganised processing induced by the SCI. It remains possible though that, despite certain aspects of sensorimotor activity being altered after SCI, somatotopically typical representations of the paralysed and sensory deprived body parts can be preserved (e.g., finger somatotopy of affected hand). Such preserved representations have the potential to be exploited in a functionally meaningful manner (e.g., via neuroprosthetics).

      Case studies using intracortical stimulation in the S1 hand area to elicit finger sensations in SCI patients hint at such preserved somatotopic representations (Fifer et al., 2020; Flesher et al., 2016), with one exception (Armenta Salas et al., 2018). Negative results were suggested to be due to a loss of hand somatotopy and/or reorganisation in S1 of the implanted SCI patient or due to potential misplacement of the implant (Armenta Salas et al., 2018). Whether fine-grained somatotopy is generally preserved in the tetraplegic patient population remains unknown. It is also unclear what clinical, behavioural, and structural spinal cord determinants may influence such representations to be maintained. Here we used functional MRI (fMRI) and a visually cued (attempted) finger movement task in tetraplegic patients to examine whether hand somatotopy is preserved following a disconnection between the brain and the periphery. We instructed patients to perform the fMRI tasks with their most impaired upper limb and matched controls’ tested hands to patients’ tested hands. If a patient was unable to make overt finger movements due to their injury, then we carefully instructed them to make attempted (i.e., not imagined) finger movements. To see whether patient’s maps exhibited characteristics of somatotopy, we visualised finger selectivity in S1 using a travelling wave approach. To investigate whether fine-grained hand somatotopy was preserved and could be activated in S1 following SCI, we assessed inter-finger representational distance patterns using representational similarity analysis (RSA). These inter-finger distance patterns are thought to be shaped by daily life experience such that fingers used more frequently together in daily life have lower representational distances (Ejaz et al., 2015). RSA-based inter-finger distance patterns have been shown to depict the invariant representational structure of fingers in S1 and M1 better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). Over the past years RSA has therefore regularly been used to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g., Dempsey-Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). We closely followed procedures that have previously been used to map preserved and typical somatotopic finger selectivity and inter-finger representational distance patterns of amputees’ missing hands in S1 using volitional phantom finger movements (Kikkert et al., 2016; Wesselink et al., 2019). However, in amputees, these movements generally recruit the residual arm muscles that used to control the missing limb via intact connections between the brain and spinal cord. Whether similar preserved somatotopic mapping can be observed in SCI patients with diminished or no connections between the brain and the periphery is unclear. If finger somatotopy is preserved in tetraplegic patients, then we should find typical inter-finger representational distance patterns in the S1 hand area of these patients. By measuring a group of fourteen chronic tetraplegic patients with varying amounts of spared spinal cord tissue at the lesion level (quantified by means of midsagittal tissue bridges based on sagittal T2w scans), we uniquely assessed whether preserved connections between the brain and periphery are necessary to preserve fine somatotopic mapping in S1 (Huber et al., 2017; Pfyffer et al., 2019). If spared connections between the periphery and the brain are not necessary for preserving hand somatotopy, then we would find typical inter-finger representational distance patterns even in patients without spared spinal tissue bridges. We also investigated what clinical and behavioural determinants may contribute to preserving S1 hand somatotopy after chronic SCI. If spared sensorimotor hand function is not necessary for preserving hand somatotopy, then we would find typical inter-finger representational distance patterns even in patients who suffer from full sensory loss and paralysis of the hand(s).”

      They do not in detail report which results actually have been reported before, which is a major problem, because those prior results should have motivated the analyses the authors conducted. For instance, two of the cited studies found that in SCI patients, only ONE FINGER shifted towards the malfunctioning area (i.e., the small finger) whereas all other fingers were the same. However, the authors do NOT perform single finger analyses but always average their results ACROSS fingers. This is even true in spite of some patients indeed showing MISSING FINGERS as is clearly evident in the figure, and in spite of the clearly reduced distance of the thumb in the patients as is also visible in another figure. Nothing of this is seen in the results, because the ANOVA and analyses never have the factor of "finger". Instead, the authors always average the analyses across finger. The conclusion that the maps do not differ is therefore not justified at present. This severely reduces any conclusions that an be drawn from the data at present.

      We apologise for the lack of clarity. We now added additional detail regarding studies showing altered sensorimotor processing following SCI. We also clarified that we based our analysis steps on previous studies investigating hand somatotopy following deafferentation (i.e., following arm amputation; Kikkert et al., 2016; Wesselink et al., 2019) and somatotopic reorganisation RSA- based inter-finger distance patterns have been shown to depict the invariant representational structure of fingers in S1 and M1 better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). Over the past years RSA has therefore regularly been used to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g. Dempsey-Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). It is believed to be the most appropriate measure to reliably detect subtle changes in somatotopy. We adjusted the text in our revised Introduction section to better highlight this.

      Please note that we do not average across fingers in our RSA typicality procedure. Instead, RSA considers how the (attempted) movement with one finger changes the activity pattern across the whole hand representation. Note that somatotopic reorganisation will change the inter-finger distance measured with this method as previously shown (Kieliba et al., 2021; Kolasinski et al., 2016; Wesselink et al., 2019).

      Still, as per the reviewer’s suggestion, we conducted a robust mixed ANOVA on the RSA distance measures with a within-subjects factor for finger pair (10 levels) and a between- subjects factor for group (2 levels: controls and SCI patients). We did not find a significant group effect (F(1,21.66) = 1.50, p = 0.23). There was a significant difference in distance between finger pairs (F(9,15.38) = 27.22, p < 0.001), but this was not significantly different between groups (i.e., no significant finger pair by group interaction; F(9,15.38) = 1.05, p = 0.45). When testing for group differences per finger pair, the BF only revealed inconclusive evidence (BF > 0.37 and < 1.11; note that we could not run a Bayesian ANOVA due to normality violations). We have added this analysis to the revised manuscript.

      Lastly, we would like to highlight that our argument is that the finger maps can be preserved in the absence of sensory and motor function, but over time they deteriorate and become less somatotopic. As such, we do not aim to state that they are unchanged overall – but rather that they can be unchanged even despite loss of sensory and motor function. We have clarified this in our abstract and manuscript to avoid confusion.

      Revised abstract:

      “Previous studies showed reorganised and/or altered activity in the primary sensorimotor cortex after a spinal cord injury (SCI), suggested to reflect abnormal processing. However,little is knownaboutwhether somatotopically-specific representations can be preserved despite alterations in net activity. In this observational study we used functional MRI and an (attempted) finger movement task in tetraplegic patients to characterise the somatotopic hand layout in primary somatosensory cortex. We further used structural MRI to assess spared spinal tissue bridges. We found that somatotopic hand representations can be preserved in absence of sensory and motor hand functioning, and no spared spinal tissue bridges. Such preserved hand somatotopy could be exploited by rehabilitation approaches that aim to establish new hand-brain functional connections after SCI (e.g., neuroprosthetics). However, over years since SCI the hand representation somatotopy deteriorated, suggesting that somatotopic hand representations are more easily targeted within the first years after SCI.”

      Revised text Methods:

      “Second, we tested whether the inter-finger distances were different between controls and patients using a robust mixed ANOVA with a within-participants factor for finger pair (10 levels) and a between-participants factor for group (2 levels: controls and patients).”

      Revised text Results:

      “We then tested whether the inter-finger distances were different across finger pairs between controls and SCI patients using a robust mixed ANOVA with a within-participants factor for finger pair (10 levels) and a between-participants factor for group (2 levels: controls and patients). We did not find a significant difference in inter-finger distances between patients and controls (F(1,21.66) = 1.50, p = 0.23). The inter-finger distances were significantly different across finger pairs, as would be expected based on somatotopic mapping (F(9,15.38) = 27.22, p < 0.001). This pattern of inter-finger distances was not significantly different between groups (i.e., no significant finger pair by group interaction; F(9,15.38) = 1.05, p = 0.45). When testing for group differences per finger pair, the BF only revealed inconclusive evidence (BF > 0.37 and < 1.11; note that we could not run a Bayesian ANOVA due to normality violations).”

      Revised text Discussion:

      “In this study we investigated whether hand somatotopy is preserved and can be activated through attempted movements following tetraplegia. We tested a heterogenous group of SCI patients to examine what clinical, behavioural, and structural spinal cord determinants contribute to preserving S1 somatotopy. Our results revealed that detailed hand somatotopy can be preserved following tetraplegia, even in the absence of sensory and motor function and a lack of spared spinal tissue bridges. However, over time since SCI these finger maps deteriorated such that the hand somatotopy became less typical.”

      • Concept 2: This also relates to the fact that the most prominent and consistent finding of prior studies was to show changes in map AMPLITUDE in the maps of patients. It is not clear to me how amplitude was measured here, because the text says "average BOLD activity". What should be reported are standard measures of signal amplitude both across the map area and for individual fingers.

      We apologise for the lack of clarity, “average BOLD activity” represented the average z- standardised activity within the S1 hand ROI. To comply with the reviewer’s comment, we adjusted this to the percent signal change underneath the S1 hand ROI and report this instead in our revised manuscript and in revised Figure 3A and revised Figure 3- Figure supplement 1. Note that results were unchanged.

      As per the reviewer’s suggestion, we further extracted the activity levels for individual fingers under finger-specific ROIs. To create finger-specific ROIs, probability finger maps were created based on the travelling wave data of the control group, thresholded at 25% (i.e., meaning that at least 5 out of 18 control participants needed to significantly activate a vertex for this vertex to be included in the ROI), and binarised. We then used the separately acquired blocked design data to extract the corresponding finger movement activity levels underlying these finger-specific ROIs per participant. Per ROI, we then compared the activity level between groups. After correction for multiple comparisons, there was no significant difference between groups for the thumb (U = 93, p = 0.37), index (t(30) = -0.003, p = 0.99), middle (t(30) = 1.11, p = 0.35), ring (t(30) = 2.02, p = 0.13), or little finger (t(30) = 2.14, p = 0.20). We have added this analysis to Appendix 1.

      Note that lower or higher BOLD amplitude levels do not influence our typicality scores per se. Indeed, typical inter-finger representational patterns have been shown to persist even in ipsilateral M1 that exhibited a negative BOLD response during finger movements (Berlot et al., 2019). As long as the typical inter-finger relationships are preserved, brain areas that have low amplitudes of activity can have a typical somatotopic representation.

      Revised text in Methods:

      "The percent signal change for overall task-related activity was then extracted for voxels underlying this S1 hand ROI per participant. A similar analysis was used to investigate overall task-related activity in an M1 hand ROI (see Figure 3- Figure supplement 1). We further compared activity levels in finger-specific ROIs in S1 between groups and conducted a geodesic distance analysis to assess whether the finger representations of the SCI patients were aligned differently and/or shifted compared to the control participants (see Appendix 1)."

      Revised text in Results:

      “Task-related activity was quantified by extracting the percent signal change for finger movement (across all fingers) versus baseline across within the contralateral S1 hand ROI (see Figure 3A). Overall, all patients were able to engage their S1 hand area by moving individual fingers (t(13)=7.46, p < 0.001; BF10=4.28e +3), as did controls (t(17)=9.92, p < 0.001; BF10=7.40e +5). Furthermore, patients’ task-related activity was not significantly different from controls (t(30)=-0.82, p=0.42; BF10=0.44), with the BF showing anecdotal evidence in favour of the null hypothesis.”

      Revised Appendix 1:

      “Percent signal change in finger-specific clusters To assess whether finger movement activity levels were different between patients and controls, we created finger-specific ROIs and extracted the activity level of the corresponding finger movement for each participant. To create the finger-specific ROIs, the probability finger surface maps that were created from the travelling wave data of the control group (see main manuscript) were thresholded at 25% (i.e., meaning that at least 5 out of 18 control participants needed to significantly activate a vertex for this vertex to be included in the ROI), and binarised. We then used the separately acquired blocked design data to extract the finger movement activity levels underlying these finger-specific ROIs. We first flipped the contrast images resulting from each participant’s fixed effects analysis (i.e., that was ran to average across the 4 blocked design runs) along the x-axis for the left-hand tested participants. Each participant’s contrast maps were then resampled to the Freesurfer 2D average atlas and the averaged z-standardised activity level was extracted for each finger movement vs rest contrast underlying the finger-specific ROIs. We compared the activity levels for each finger movement in the corresponding finger ROI (i.e., thumb movement activity in the thumb ROI, index finger movement activity in the index finger ROI, etc.) between groups. After correction for multiple comparisons, there was no significant difference between groups for the thumb (U = 93, p = 0.37), index (t(30) = -0.003, p = 0.99), middle (t(30) = 1.11, p = 0.35), ring (t(30) = 2.02, p = 0.13), or little finger (t(30) = 2.14, p = 0.20).”

      Appendix 1- Figure 1: Finger-specific activity levels in finger-specific regions of interest. A) Finger- specific ROIs were based on the control group’s binarised 25% probability travelling wave finger selectivity maps. B) Finger movement activity levels in the corresponding finger-specific ROIs. There were no significant differences in activity levels between the SCI patient and control groups. Controls are projected in grey; SCI patients are projected in orange. Error bars show the standard error of the mean. White arrows indicate the central sulcus. A = anterior; P = posterior.

      • Concept 3: The authors present a hypothesis on the underlying mechanisms of SCI that does not seem to reflect prior data. The argument is that changes in map alignment relate to maladaptive changes and pain. However, the literature that the authors cite does not support this claim. In fact, Freund 2011 promotes the importance of map amplitude but not alignment, whereas other studies either show no relation of activation to pain, or they even show that map shift relates to LESS pain, i.e., the reverse argument than what the authors say. My impression is that the model that the authors present is mainly a model that is used for phantom pain but not for SCI. Taking this into consideration, the findings the authors present are not surprising anymore, because in fact none of these studies claimed that the affected area should be absent in SCI patients; these papers only say that the other body parts change in location or amplitude, which is something the authors did not measure. It is important to make this clear in the text.

      As the reviewer states, the literature is debated regarding the relationship between reorganisation and pain in SCI patients. We did not highlight this clearly enough. To improve clarity and focus our message we have therefore removed the sentence regarding reorganisation and pain from the Introduction of our revised manuscript. Also taking comment 2.1 and 2.2 into consideration, we have restructured our Introduction.

      We respectfully disagree with the reviewer that our results are not novel or surprising. Whether the full fine-grained hand somatotopy is preserved following a complete motor and sensory loss through tetraplegia has not been considered before. Furthermore, to our knowledge, there is no paper that has inspected the full somatotopic layout in a heterogenous sample of SCI patients and shown that over time since injury, hand somatotopy deteriorates. We indeed cannot make claims regarding the reorganization in S1 with regards to neighbouring cortical areas activating the hand area, as we have now clarified further in the revised Discussion. We now also clarify in our discussion that our result does not exclude the possibility of reorganisation occurring simultaneously and that this is topic for further investigation. As described in the Discussion, it is very possible that reorganisation and preserved somatotopy could co-occur.

      Revised text Discussion:

      “We did not probe body parts other than the hand and could therefore not investigate whether any remapping of other (neighbouring and/or intact) body part representations towards or into the deprived S1 hand cortex may have taken place. Whether reorganisation and preservation of the original function can simultaneously take place within the same cortical area therefore remains a topic for further investigation. It is possible that reorganisation and preservation of the original function could co-occur within cortical areas. Indeed, non-human primate studies demonstrated that remapping observed in S1 actually reflects reorganisation in subcortical areas of the somatosensory pathway, principally the brainstem (Chand and Jain, 2015; Kambi et al., 2014). As such, the deprived S1 area receives reorganised somatosensory inputs upon tactile stimulation of neighbouring intact body parts. This would simultaneously allow the original S1 representation of the deprived body part to be preserved, as observed in our results when we directly probed the deprived S1 hand area through attempted finger movements.”

      • Concept 4: There is yet another more general point on the concept and related hypotheses: Why do the authors assume that immediately after SCI the finger map should disappear? This seems to me the more unlikely hypotheses compared to what the data seem to suggest: preservation and detoriation over time. In my view, there is no biological model that would suggest that a finger map suddenly disappears after input loss. How should this deterioration be mediated? By cellular loss? As already stated above, the finding is therefore much less surprising as the authors argue.

      We did not expect that finger maps would disappear, especially given the case studies using S1 intracortical stimulation studies in SCI patients and the result of preserved somatotopy of the missing hand in amputees. We are not sure which part of the manuscript might have caused this misunderstanding.

      With regards to the reviewer’s comment that there are no models to suggest that fingers maps would disappear: there is competing research on this as we now explain in our revised Introduction. Non-human primate research has shown that the S1 hand area becomes largely unresponsive to tactile hand stimulation after an SCI (Jain et al., 2008; Kambi et al., 2014; Liao et al., 2021). The surviving finger-related activity was shown to be disorganised such that a few somatotopically appropriate sites but also other somatotopically nonmatched sites were activated (Liao et al., 2021). These fingers areas in S1 became responsive to touch on the face. Furthermore, TMS studies that induce current in localised areas of M1 to induce a peripheral muscle response in SCI patients have shown that representations of more impaired muscles retract or are absent (Fassett et al., 2018; Freund et al., 2011a; Levy et al., 1990; Streletz et al., 1995; Topka et al., 1991; Urbin et al., 2019). We do not believe that this indicates that the S1 hand somatotopy is lost, but rather that tactile inputs and motor outputs no longer pass the level of injury. Indeed, non-human primate work showing immutable myelin borders between finger representations in S1 post SCI suggests that a latent hand representation may be preserved. Further hints for such preserved somatotopy comes from fMRI studies showing net sensorimotor activity during attempted movements with the paralysed body part, intracortical stimulation studies in SCI patients, and preserved somatotopic maps of the missing hand in amputees. We have restructured our Introduction accordingly, also taking into consideration comments 2.1, 2.2, and 2.4.

      • Methods & Results. The authors refer to an analyses that they call "typicality" where they say that they assess how "typical" a finger map is. Given this is not a standard measure, I was wondering how the authors decided what a "typical" finger map is. In fact, there are a few papers published on this issue where the average location of each finger in a large number of subjects is detailed. Rather than referring to this literature, the authors use another dataset from another study of themselves that was conduced on n=8 individuals and using 7T MRI (note that in the present study, 3T MRI was used) to define what "typical" is. This approach is not valid. First, this "typical" dataset is not validated for being typical (i.e., it is not compared with standard atlases on hand and finger location), second, it was assessed using a different MRI field strength, third, it was too little subjects to say that this should be a typical dataset, forth, the group differed from the patients in terms of age and gender (i.e., non-matched group), and fifth, the authors even say that the design was different ("was defined similarly", i.e., not the same). This approach is therefore in my view not valid, particularly given the authors measured age- and gender-matched controls that should be used to compare the maps with the patients. This is a critical point because changes in typicality is the main result of the paper.

      We respectfully disagree with the reviewer that the typicality measure is not standard, invalid, and inaccurate. RSA-based inter-finger overlap patterns have been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). RSA-based inter- finger representation measures have been shown to have more within-subject stability (both within the same session and between sessions that were 6 months apart) and less inter-subject variability (Ejaz et al., 2015) than these other measures of somatotopy. RSA-based measures are furthermore not prone to some of the problems of measurements of finger selectivity (e.g., dependence on map thresholds). Indeed, over the past years RSA has become the golden standard to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g. Dempsey-Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). Moreover, various papers have been published in eLife and elsewhere that used the same RSA-based typicality criteria to assess plasticity in finger representations (Dempsey-Jones et al., 2019; Ejaz et al., 2015; Kieliba et al., 2021; Wesselink et al., 2019). We now highlight this in the revised Introduction.

      The canonical RDM used in our study has previously been used as a canonical RDM in a 3T study exploring finger somatotopy in amputees (Wesselink et al., 2019) and was made available to us (note that we did not collect this data ourselves). We aimed to use similar measures as in Wesselink et al (2019) and therefore felt it was most appropriate to use the same canonical RDM. One of the strengths of RSA is it can be used to quantitatively relate brain activity measures obtained using different modalities, across different species, brain areas, brain and behavioural measures etc. (Kriegeskorte et al., 2008). As such, the fact that this canonical RDM was constructed based on data collected using 7T fMRI using a digit tapping task should not influence our results. We however agree with the reviewer it is good to demonstrate that our results would not change when using a canonical RDM based on the average RDM of our age-, sex- and handedness matched control group. We therefore recalculated the typicality of all participants using the controls’ average RDM as the canonical RDM. We found a strong and highly significant correlation in typicality scores calculated using the canonical RDM from the independent dataset and the controls’ average RDM (see figure below). This was true for both the patient (rs = 0.92, p < 0.001; red dots) and control groups (rs = 0.78, p < 0.001; grey dots).

      We then repeated all analysis using these newly calculated typicality scores. As expected, we found the same results as when using a canonical RDM based on the independent dataset (see below for details). This analysis has been added to the revised Appendix 1 and is referred to in the main manuscript.

      Revised text Introduction:

      “To investigate whether fine-grained hand somatotopy was preserved and could be activated in S1 following SCI, we assessed inter-finger representational distance patterns using representational similarity analysis (RSA). These inter-finger distance patterns are thought to be shaped by daily life experience such that fingers used more frequently together in daily life have lower representational distances (Ejaz et al., 2015). RSA-based inter-finger distance patterns have been shown to depict the invariant representational structure of fingers in S1 and M1 better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). Over the past years RSA has therefore regularly been used to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g., Dempsey- Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). We closely followed procedures that have previously been used to map preserved and typical somatotopic finger selectivity and inter-finger representational distance patterns of amputees’ missing hands in S1 using volitional phantom finger movements (Kikkert et al., 2016; Wesselink et al., 2019).”

      Revised text Results:

      “This canonical RDM was based on 7T finger movement fMRI data in an independently acquired cohort of healthy controls (n = 8). The S1 hand ROI used to calculated this canonical RDM was defined similarly as in the current study (see Wesselink and Maimon- Mor, (2017b) for details). Note that results were unchanged when calculating typicality scores using a canonical RDM based on the averaged RDM of the age-, sex-, and handedness-matched control group tested in this study (see Appendix 1).”

      Revised text Methods:

      “While the traditional traveling wave approach is powerful to uncover the somatotopic finger arrangement, a fuller description of hand representation can be obtained by taking into account the entire fine-grained activity pattern of all fingers. RSA-based inter-finger overlap patterns have been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). RSA-based measures are furthermore not prone to some of the problems of measurements of finger selectivity (e.g., dependence on map thresholds).”

      “Third, we estimated the somatotopic typicality (or normality) of each participant’s RDM by calculating a Spearman correlation with a canonical RDM. We followed previously described procedures for calculating the typicality score (Dempsey-Jones et al., 2019; Ejaz et al., 2015; Kieliba et al., 2021; Wesselink et al., 2019). The canonical RDM was based on 7T finger movement fMRI data in an independently acquired cohort of healthy controls (n = 8). The S1 hand ROI used to calculated this canonical RDM was defined similarly as in the current study (see Wesselink and Maimon-Mor, (2017b) for details). Note that results were unchanged when calculating typicality scores using a canonical RDM based on the averaged RDM of the sex-, handedness-, and age matched control group tested in this study (see Appendix 1).”

      Revised text Appendix 1:

      “Typicality analysis using a canonical RDM based on the controls’ average RDM

      To ensure that our typicality results did not change when using a canonical inter-finger RDM based on the age-, sex-, and handedness matched subjects tested in this study, we recalculated the typicality scores of all participants using the averaged inter-finger RDM of our control sample as the canonical RDM. We found a strong and highly significant correlation between the typicality scores calculated using the canonical inter-finger RDM from the independent dataset (reported in the main manuscript) and the typicality scores calculated using our controls’ average RDM. This was true for both the SCI patient (rs = 0.92, p < 0.001) and control groups (rs = 0.78, p < 0.001).

      We then repeated all typicality analysis reported in the main manuscript. As expected, using the typicality scores calculated using our controls’ average RDM we found the same results as when using the canonical inter-finger RDM from the independent dataset: There was a significant difference in typicality between SCI patients, healthy controls, and congenital one-handers (H(2)=27.61, p < 0.001). We further found significantly higher typicality in controls compared to congenital one-handers (U=0, p < 0.001; BF10=76.11). Importantly, the typicality scores of the SCI patients were significantly higher than the congenital one-handers (U=2, p < 0.001; BF10=50.98), but not significantly different from the controls (U=94, p=0.24; BF10=0.55). Number of years since SCI significantly correlated with hand representation typicality (rs=-0.54, p=0.05) and patients with more retained GRASSP motor function of the tested upper limb had more typical hand representations in S1 (rs=0.58, p=0.03). There was no significant correlation between S1 hand representation typicality and GRASSP sensory function of the tested upper limb, spared midsagittal spinal tissue bridges at the lesion level, or cross-sectional spinal cord area (rs=0.40, p=0.15, rs=0.50, p=0.10, and rs=0.48, p=0.08, respectively). An exploratory stepwise linear regression analysis revealed that years since SCI significantly predicted hand representation typicality in S1 with R2=0.33 (F(1,10)=4.98, p=0.05). Motor function, sensory function, spared midsagittal spinal tissue bridges at the lesion level, and spinal cord area did not significantly add to the prediction (t=1.31, p=0.22, t=1.62, p=0.14, t=1.70, p=0.12, and t=1.09, p=0.30, respectively).”

      • Methods & Results: The authors make a few unproven claims, such as saying "generally, the position, order of finger preference, and extent of the hand maps were qualitatively similar between patients and control". There are no data to support these claims.

      As indicated in this sentence, this claim substantiated a qualitative inspection of the finger maps in Figure 2 and we indeed do not support this claim with quantitative analysis. We have therefore removed this sentence from the revised manuscript and instead say, as per the suggestion of reviewer 1, that overall, there were aspects of somatotopic finger selectivity in the SCI patients’ hand maps,

      Revised text Results:

      “Overall, we found aspects of somatotopic finger selectivity in the maps of SCI patients’ hands, in which neighbouring clusters showed selectivity for neighbouring fingers in contralateral S1, similar to those observed in eighteen age-, sex-, and handedness matched healthy controls (see Figure 2A&B). A characteristic hand map shows a gradient of finger preference, progressing from the thumb (red, laterally) to the little finger (pink, medially). Notably, a characteristic hand map was even found in a patient who suffered complete paralysis and sensory deprivation of the hands (Figure 2. patient map 1; patient S01). Despite most maps (Figure 2, except patient map 3) displaying aspects of characteristic finger selectivity, some finger representations were not visible in the thresholded patient and control maps.”

      • Methods & Results: The authors argue that the map architecture is topographic as soon as the dissimilarity between two different fingers is above 0. First, what I am really wondering about is why the authors do not provide the exact dissimilarity values in the text but only give the stats for the difference to 0 (t-value, p-value, Bayes factor). Were the dissimilarity values perhaps very low? The values should be reported. Also, when this argument that maps are topographic as long as the value of two different fingers is above 0 should hold, then the authors have to show that the value for mapping the SAME finger is indeed 0. Otherwise, this argument is not convincing.

      We would like to clarify that a representation is not per se topographic when the RSA dissimilarity is > 0. The dissimilarity value provided by RSA indicates the extent to which a pair of conditions is distinguished – it can be viewed as encapsulating the information content carried by the region (Kriegeskorte et al., 2008). Due to cross-validation across runs, the expected distance value would be zero (but can go below 0) if two conditions’ activity patterns are not statistically different from each other, and larger than zero if there is differentiation between the conditions (fingers’ activity patterns in the S1 hand area in our case; Kriegeskorte et al., 2008; Nili et al., 2014). The diagonal of the RDM reflect comparisons between the same fingers and therefore reflect distances between the exact same activity pattern in the same run and are thus 0 by definition (Kriegeskorte et al., 2008; Nili et al., 2014). This was also the case in our individual participant RDMs. Since this is not a meaningful value (a distance between 2 identical activity patterns will always be 0) we chose not to report this. We have clarified the meaning of the separability measure in the revised Methods section.

      To investigate whether a representation is somatotopic, we have to take into account the full fine-grained inter-finger distance pattern. The full fine-grained inter-finger distance pattern is related to everyday use of our hand and has been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). To determine whether a participant’s inter-finger distance pattern is somatotopic one should associate it to a canonical RDM – which is done in the typicality analysis (see also our response to comment 2.6).

      What can be done to demonstrate the validity of an ROI, is to run RSA on a control ROI where one would not expect to find activity that is distinguishable between finger conditions. Rather than comparing your separability measure against 0, one can then compare the separability of your ROI that is expected to contain this information to that of your control ROI. We created a cerebral spinal fluid (CSF) ROI, repeated our RSA analysis in this ROI, and then compared the separability of the CSF and S1 hand area ROIs. As expected, there was a significant difference between separability (or representation strength) in the S1 hand area and CSF ROIs for both controls (W=171, p < 0.001; BF10=4059) and patients (W=105, p < 0.00; BF10=279). This analysis has been added to the revised manuscript.

      Individual participant separability values (i.e., distances averaged across fingers) are visualised in Figure 3D. Following the reviewer’s suggestion, we have included individual participant inter-finger distance plots for both the controls and SCI patients as Figure 3- Figure supplement 2 and Figure 3- figure supplement 3, respectively. The inter-finger distances for each finger pair and subject can be extracted from this. We feel this is more readily readable and interpretable than a table containing the 10 inter-finger distance scores for all 32 participants. These values have instead been made available online, together with our other data, on https://osf.io/e8u95/.

      Revised text Methods:

      “If there is no information in the ROI that can statistically distinguish between the finger conditions, then due to cross-validation the expected distance measure would be 0. If there is differentiation between the finger conditions, the separability would be larger than 0 (Nili et al., 2014). Note that this does not directly indicate that this region contains topographic information, but rather that this ROI contains information that can distinguish between the finger conditions. To further ensure that our S1 hand ROI was activated distinctly for different fingers, we created a cerebral spinal fluid (CSF) ROI that would not contain finger specific information. We then repeated our RSA analysis in this ROI and statistically compared the separability of the CSF and S1 hand area ROIs.”

      Revised text Results:

      “We found that inter-finger separability in the S1 hand area was greater than 0 for patients (t(13) = 9.83, p < 0.001; BF10 = 6.77e +4) and controls (t(17) = 11.70, p < 0.001; BF10 = 6.92e +6), indicating that the S1 hand area in both groups contained information about individuated finger representations. Furthermore, for both controls (W = 171, p < 0.001; BF10 = 4059) and patients (W = 105, p < 0.001; BF10 = 279) there was significant greater separability (or representation strength) in the S1 hand area than in a control cerebral spinal fluid ROI that would not be expected to contain finger specific information. We did not find a significant group difference in inter-finger separability of the S1 hand area (t(30) = 1.52, p = 0.14; BF10 = 0.81), with the BF showing anecdotal evidence in favour of the null hypothesis.”

      • Discussion. The authors argue that spared midsagittal spinal tissue bridges are not necessary because they were not predictive of hand representation typicality. First, the measure of typicality is questionable and should not be used to make general claims about the importance of structural differences. Second, given there were only n=14 patients included, one may question generally whether predictive modelling can be done with these data. This statement should therefore be removed.

      We would like to clarify that, like the reviewer, we do not believe that spared midsagittal spinal tissue bridges are unimportant. Indeed, a large body of our own research focuses on the importance of spared spinal tissue bridges in recovery of sensorimotor function and pain (Huber et al., 2017; Pfyffer et al., 2021, 2019; Vallotton et al., 2019). We have added a clarification sentence regarding the importance of tissue bridges with regards to recovery of function. We agree with the reviewer that given our limited sample size, it is difficult to make conclusive claims based on non-significant predictive modelling and correlational results. In the revised manuscript we therefore focus this statement (i.e., that sensory and motor hand function and tissue bridges are not necessary to preserve hand somatotopy) on our finding that two patients without spared tissue bridges at the lesion level and with complete or near complete loss of sensory and motor hand function had a highly typical hand representation. We present our predictive modelling results as being in line with this notion and added a word of caution that it is challenging to draw definite conclusions from non-significant predictive modelling and correlation results in such a limited sample size.

      With regards to the reviewer’s concern about the validity of the typicality measure – please see our detailed response to comment 2.6.

      Revised text Discussion:

      “Spared spinal cord tissue bridges can be found in most patients with a clinically incomplete injury, their width being predictive of electrophysiological information flow, recovery of sensorimotor function, and neuropathic pain (Huber et al., 2017; Pfyffer et al., 2021, 2019; Vallotton et al., 2019). However, in this study, spared midsagittal spinal tissue bridges at the lesion level and sensorimotor hand function did not seem necessary to maintain and activate a somatotopic hand representation in S1. We found a highly typical hand representation in two patients (S01 and S03) who did not have any spared spinal tissue bridges at the lesion level, a complete (S01) or near complete (S03) hand paralysis, and a complete (S01) or near complete loss (S03) of hand sensory function. Our predictive modelling results were in line with this notion and showed that these behavioural and structural spinal cord determinants were not predictive of hand representation typicality. Note however that our sample size was limited, and it is challenging to draw definite conclusions from non-significant predictive modelling results.”

      • Discussion. The authors say that hand representation is "preserved" in SCI patients. Perhaps it is better to be precise and to say that they active during movement planning.

      We thank the reviewer for their suggestion and revised the Discussion accordingly.

      Revised text Discussion:

      "In this study we investigated whether hand somatotopy is preserved and can be activated through attempted movements following tetraplegia."

      "How may these representations be preserved over time and activated through attempted movements in the absence of peripheral information?"

      "Together, our findings indicate that in the first years after a tetraplegia, the somatotopic S1 hand representation is preserved and can be activated through attempted movements even in the absence of retained sensory function, motor function, and spared spinal tissue bridges."

      Reviewer #3:

      The demonstration that cortex associated with an amputated limb can be activated by other body parts after amputation has been interpreted as evidence that the deafferented cortex "reorganizes" and assumes a new function. However, other studies suggest that the somatotopic organization of somatosensory cortex in amputees is relatively spared, even when probed long after amputation. One possibility is that the stability is due to residual peripheral input. In this study, Kikkert et al. examine the somatotopic organization of somatosensory cortex in patients whose spinal cord injury has led to tetraplegia. They find that the somatotopic organization of the hand representation of somatosensory cortex is relatively spared in these patients. Surprisingly, the amount of spared sensorimotor function is a poor predictor of the stability of the patients' hand somatotopy. Nonethless, the hand representation deteriorates over decades after the injury. These findings contribute to a developing story on how sensory representations are formed and maintained and provide a counterpoint to extreme interpretations of the "reorganization" hypothesis mentioned above. Furthermore, the stability of body maps in somatosensory cortex after spinal cord injury has implications for the development of brain-machine interfaces.

      I have only minor comments:

      1) Given the controversy in the field, the use of the phrase "take over the deprived territory" (line 45) is muddled. Perhaps a more nuanced exposition of this phenomenon is in order?

      We agree a more nuanced expression would be more appropriate. We have changed this sentence accordingly in the revised manuscript.

      Revised text Introduction:

      “Seminal research in nonhuman primate models of SCI has shown that this leads to extensive cortical reorganisation, such that tactile stimulation of cortically adjacent body parts (e.g. of the face) activated the deprived brain territory (e.g. of the hand; Halder et al., 2018; Jain et al., 2008; Kambi et al., 2014).”

      2) The statement that "results are mixed" regarding intracortical microstimulation of S1 is dubious. In only one case has the hand representation been mislocalized, out of many cases (several at CalTech, 3 at the University of Pittsburgh, one at Case Western, one at Hopkins/APL, and one at UChicago). Perhaps rephrase to "with one exception?"

      We agree that this sentence may give a wrong outlook on the literature and have changed the text per the reviewer’s suggestion.

      Revised text Introduction:

      “Case studies using intracortical stimulation in the S1 hand area to elicit finger sensations in SCI patients hint at such preserved somatotopic representations (Fifer et al., 2020; Flesher et al., 2016), with one exception (Armenta Salas et al., 2018).”

      3) The phrase "tetraplegic sinal cord injury" seems awkward.

      Thank you for highlighting this to us. We have corrected these instances in our revised manuscript to “tetraplegia”.

      4) The stability of the representation is attributed to efference copy from M1. While this is a fine speculation, somatosensory cortex is part of a circuit and is interconnected with many other brain areas, M1 being one. Perhaps the stability is maintained due to the position of somatosensory cortex within this circuit, and not solely by its relationship with M1? There seems to be an overemphasis of this hypothesis at the exclusion of others.

      Thank you for this comment. We agree we overemphasized the efference copy theory. We have adjusted this and now provide a more balanced description of potential circuits and interconnections that could maintain somatotopic representations after tetraplegia.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this report, Shekhar et al, have profiled developing retinal ganglion cells from embryonic and postnatal mouse retina to explore the diversification of this class of neurons into specific subtypes. In mature retina, scRNAseq and other methods have defined approximately 45 different subtypes of RGCs, and the authors ask whether these arise from a common postmitotic precursor, or many ditinct subtypes of precursors. The overall message, is that subtype diversification arises as a "gradual, asynchronus fate restriction of postmitotic multipotential precursors. The authors find that over time, clusters of cells become "decoupled" as they split into subclusters. This process of fate decoupling is associated with changes in the expression of specific transcription factors. This allows them to both predict lineage relationships among RGC subtypes and the time during development when these specification events occur. Although this conclusion based almost entirely on a computational analysis of the relationships among cells sampled at discrete times, the evidence presented supports the overall conclusion. Future experimental validation of the proposed lineage relationships of RGC subtypes will be needed, but this report clearly outlines the overall pattern of diversification in this cell class.

      We thank the reviewer for their thoughtful assessment of our study.

      Reviewer #2 (Public Review):

      The manuscript "Diversification of multipotential postmitotic mouse retinal ganglion cell precursors into discrete types" by Shekhar and colleagues represents an in-depth analysis of an additional transcriptomic datasets of retinal single-cells. It explores the progression of retinal ganglion cells diversity during development and describes some of aspects of fate acquisition in these postmitotic neurons. Altogether the findings provide another resource on which the neural development community will be able to generate new hypotheses in the field of retinal ganglion cell differentiation. A key point that is made by the authors regards the progression of the number of ganglion cell types in the mouse retina, i.e., how, and when neuronal "classes diversify into subclasses and types" (also p. 125). In particular, the authors would like to address whether postmitotic neurons follow either a predetermination or a stepwise progression (Fig. 2a). This is indeed a fascinating question, and the analysis, including the one based on the Waddington-OT method is conceptually interesting.

      Comments and questions:

      Is the transcriptomic diversity, based on highly variable genes (the number of which is not detailed in the study) a robust proxy to assess cell types? One could argue that early on predetermined cell types are specified by a small set of determinants, both at the proteomic and transcriptomic level, and that it takes several days or week to generate the cascade that allows the detection of transcriptional diversity at the level of >100 gene expression levels.

      We had tested the dependence of our results on the number of highly variable genes (HVGs) used. This analysis, shown in Figure 2h, demonstrates that results are robust over the range tested – 1244-3003 total HVGs. Since the analysis in the paper employs 2800 HVGs (~800- 1500 at each stage), we are confident that we are in comfortable excess of the number at which we would need to worry. We have expanded the discussion to avoid confusion on this point. We also address the possibility that a small set of determinants are sufficient to define cell state in a transcriptomic study. This is a common argument, but we believe it is a tenuous one. We believe that the only way a small number of genes can truly define cell state is if they are expressed at very high levels. If these are expressed at high levels, they should be detected in our data and should drive the clustering. If they are expressed at extremely low levels, then given the nature of molecular fluctuations in cells, they cannot be expected to serve as a stable scaffold for differentiation. Indeed, a small set of determinants (usually transcription factors) may be necessary to specify a cell type. However, sufficiency of specification requires the expression of a usually much larger of number downstream regulators.

      Since there are many RGC subsets (45) that share a great number of their gene expression, is it possible that a given RGC could transition from one subset to another between P5 and P56? Or even responding to a state linked to sustained activity? Was this possibility tested in the model?

      We cannot address the possibility that cells swap types postnatally so that the cells comprising type X at P5 are not the same ones that comprise type X at P56. It does seem pretty unlikely, as the cell types are well-separated in transcriptional space (~250 DE genes on average). Regarding activity, we have made some initial tests by preventing visually evoked activity from birth to P56 in three different ways (dark-rearing and two mutant lines). We find no statistically significant effect on diversification. These results are currently being prepared for publication.

      The authors state that early during development there is less diversity than later. This statement seems obvious but how much. Can this be due to differential differentiation stage? At E16 RGC are a mix of cells born from E11 to E16, with the latter barely located in the GCL. Does this tend to show a continuum that is may be probably lost when the analysis is performed on cells isolated a long time after they were born (postnatal stages)? Alternatively, would it be possible to compare RGC that have been label with birth dating methods?

      Regarding the amount of diversification, we quantified this using the Rao diversity index (Figure 2h), which suggests an overall increase in 2-fold transcriptional diversity at P56 compared to the early stages. The continuum is likely because cells at early stage are close to the precursor stage and not very differentiated. Regarding combining RNA-seq with birthdating, although elegant methods now make this combination possible, it falls beyond the scope of this study.

      Comparing data produced by different methods can be challenging. Here the authors compared transcriptomic diversity between embryonic dataset produced with 10X genomics (E13 to P0) and, on the other hand, postnatal P5 that were produced using a different drop-seq procedure). Is it possible to control that the differences observed are not due to the different methods?

      It is correct that most of the P5 data was produced using Drop-seq, but that dataset also includes transcriptomes obtained by the 10X method. The relative frequency of RGC clusters and the average gene expression values obtained using either method was highly correlated (Reviewer Fig. 1). This is now pointed out in the “Methods.”

      Reviewer Fig. 1. Comparison between the relative frequency of types (left) and the average gene expression levels (right) at P5 between 10X data (y-axis) and Drop-seq data (x-axis). R corresponds to the Pearson correlation coefficient. The axes are plotted in the logarithmic scale.

      It might be important to control the conclusion that diversity is lower at E13 vs P5 when we see that thrice less cells (5900 vs 180000) were analyzed at early stage (BrdU, EdU, CFSE...)? A simple downsampling prior to the analysis may help.

      Although we collected different numbers of cells at different ages, we noted in the text that they do not influence the number of clusters. Regarding P5 specifically, Rheaume et al. (who we now discuss) obtained very similar results to ours with only 6000 cells (3x lower).

      Ipsilateral RGC: It is striking that the DEG between C-RGC and I-RGC reflect a strong bias with cells scored as" ipsi" are immature RGC while the other ("contra") are much more mature. This bias comes from the way ipsilateral RGC were "inferred" using non-specific markers. Can the author try again the analysis by identifying RGC using more robust markers? (eg. EphB1). Would it be possible to select I-RGC and C-RGC that share same level of differentiation? Previous studies already identified I-RGC signature using more specific set-up (Wang et al., 2016 from retrogradely labelled RGC; Lo Giudice et al., 2019 with I-RGC specific transgenic mouse).

      We are not sure how the reviewer concludes that the putative I-RGCs are more immature than the putative C-RGCs. As discussed earlier, insofar as expression levels of pan-RGC markers are indicative of maturational stage, we found no evidence that clustering is driven by maturation gradients. Thus, we expect our putative I-RGCs and C-RGCs to not differ in differentiation state. Following the reviewer’s suggestion, we now include EphB1(Ephb1) in our I-RGC signature. The impact of replacing Igfbp5 with Ephb1 on the inferred proportion of I-RGCs within each terminal type was minimal (Reviewer Fig. 2). We would like to note that to assemble our IRGC/C-RGC signatures we relied on data presented Wang et al. (2016). Outside of wellestablished markers (e.g. Zic2, and Isl2), we chose the RNA-seq hits in Wang et al. that had been validated histologically in the same paper or that are correlated with Zic2 expression in our data. This nominated Igfbp5, Zic1, Fgf12, and Igf1.

      Reviewer Fig. 2. Comparison of inferred I-RGC frequency within each terminal type (points) using two I-RGC signature reported in the paper. For the y-axis we used Zic2 and EphB1.

      It would be important to discuss how their findings differs from the others (including Rheaume et al., 2018). To make a strong point, I-RGC shall be isolated at a stage of final maturation (P5?) and using retrograde labelling, which is a robust method to ensure the ipsilateral identity of postnatal RGCs.

      We cite Rheaume et al. in several places. In fact, there is good transcriptional correspondence between our dataset and theirs (Figure S1i), despite the differences in the number of cells profiled (~6000 vs ~18000) and technologies (10X vs. Drop-seq/10X). We now mention this is the text. Note also that we had compared our P56 data with Rheaume et al.’s, P5 data in an earlier publication (Tran et al., 2019) and observed a similar tight correspondence between clusters. Zic1 is expressed in I-RGCs (Wang et al., 2016) at early stages, and in our dataset its expression at E13 and E14 is similar to that of Zic2 (Supplementary Fig. 8); Postnatally, however, it marks W3B RGCs (Tran et al., 2019), many of which project contralaterally (Kim et al., J. Neurosci. 2010). Regarding retrograde labeling, as noted above, additional experiments would take a prohibitively long time (up to a year) to complete.

      It is unclear how good Zic1 and Igf1 can be used as I-RGC marker. Can the author specify how specific to I-RGC they are? Have they been confirmed as marker using retrograde labelling experiments?

      We have relied on previous work, primarily from the Mason lab, to choose I-RGC and C-RGC markers. Igf1 is a C-RGC marker that is expressed in a complementary fashion with Igfbp5, an I-RGC marker as noted in Wang et al, 2016. They also perform ISH to show that Igf1 is not expressed in the VT crescent, while Igfbp5 is (see Fig. 5 in Wang et al., 2016). Similarly, Zic1 is also cited in Wang et al. as an RNA-seq hit for I-RGCs. Although Zic1 was not validated using ISH, we found its expression pattern to be highly correlated with Zic2 at E13 (Supplementary Fig. 8c).

      The enrichment procedure may deplete the RGC subpopulation that express low levels of Thy1 or L1CAM. A comparison on that point could be done with the other datasets analysed in the study.

      We presume the reviewer is referring to the data of Lo Guidice and Clark/Blackshaw, which we show in comparison to ours in Figure S1. In both of those studies, all retinal cells were analyzed, whereas we enriched RGCs. As noted in the text, RGCs comprise a very small fraction of all retinal cells, so Lo Giudice and Clark/Blackshaw lacked the resolution to resolve RGC diversity at later time points. Indeed, there is no whole retina dataset available in which RGCs are numerous enough for comprehensive subtyping. Our approach to this issue was to collect RGCs with both Thy1 and L1 at E13, E14, E16 and P0, with the idea that the markers might have complementary strengths and weaknesses. In fact, at each age, all clusters are present in both collection types, although frequencies vary. This concordance supports the idea that neither marker excludes particular types. We now stress this point in results and in the Supplementary Fig. 2 legend.

      In supplemental Fig. S1e: why are cells embedded from "Clark" datasets only clusters on the right side of the UMAP while the others are more evenly distributed?

      Actually, both the Clark et al. and Lo Giudice et al. datasets are predominantly clustered on the right side of the UMAP. This reflects the methodological difference noted above: they profiled the whole retina, whereas we isolated RGCs. Thus, their datasets contain a much higher abundance of RPCs and non-neurogenic precursors compared to ours. The right clusters represent RPCs due to their expression of Fgf15 and other markers, while the left clusters represent RGCs based on their expression of Nefl. Indeed, a main reason for including these plots was to illustrate the relative abundance of RGCs in our data (also see Supplementary Fig. S1h).

      What could explain that CD90 and L1CAM population are intermingled at E14, distinct at E16, and then more mixed at P0?

      We believe the reviewer is referring to Supplementary Figs. S2a-c. Given the temporal expression level changes in Thy1 and L1cam (Supplementary Fig. S1c) in RGCs, a likely possibility is that they enrich RGC precursor subsets at different relative frequencies. We now note this in the Supplementary Fig. 2 legend.

      On Fig. 6: the E13 RGC seems to be segregated in early born RGC expressing Eomes and later born expressing neurod2. Thus, fare coupling with P5 seems to suggest that Eomes population at P5 may have been generated first, and Neurod2 generated later. Is that possible?

      That the Eomes RGCs are specified before Neurod2 RGCs is one of our conclusions from the fate decoupling analysis (Figures 6f-h). Whether this is because the former arise from early born cells and the latter arise from later born cells is not clear. There is disagreement in the literature on whether ipRGCs are born at a different time than other RGCs, so we prefer not to make a comment.

      Methods: The Methods section is extensive, and yet it is presented in a rather complex manner so that it is difficult to understand for a broad audience. It would be valuable if the authors could simplify or better explain some parts (the WOT section in particular).

      We believe that the sections on animals, molecular biology and histology are quite straightforward, but agree that the sections describing the computational analysis are hard going. We have modified them in several places as requested. As regards better explanation of the WOT, we now precede that section with an “overview” as a way of making it easier to follow. (We had already included an overview of the clustering procedures.) We have also provided further detail on some of the reviewer’s subsequent questions on this section, including the use of HVGs, the Classifier, and the strategy for inferring I-RGCs (see below). Perhaps most important, we have worked to make the “Results” and “Discussion” sections accessible to a broad audience.

      *Highly variable genes (HVG) used for clustering and dimensionality reduction: how many of them and what are they? Are they the same used for each stage?

      Since clustering was performed at each stage independently, we determined HVGs at each stage separately using a statistical method introduced in one of our previous studies (Pandey et al., Current Biology, 2018). The total number of HVGs at each stage were as follows: E13: N=1094 E14: N=834 E16: N=822 P0: N=881 P5: N=1105 P56: N=1510

      We note that these are not necessarily the same at each stage due to the temporal variation in gene expression. Together these correspond to 2854 unique genes (union of all HVGs). The WOT analysis was done using this full set.

      *In the methods p9: "The common features G = GR ∩ GT are used to train a third classifier ClassR on the reference atlas AR. This ensures that inferred transcriptomic correspondences are based on "core" gene expression programs that underlie cell type identity rather than maturation-associated genes." Could the authors explain the relevance of using a third model and, more importantly, is there any genes that eliminated through the procedure that could be important to drive the diversification process? If so, would it be possible to estimate their number and the relative impact?

      The rationale for this was as follows. Our goal is to map cells from one time point to a type at another time point. The naïve way to do this would be to use a classifier trained entirely at either of the time point. However, the features of such a classifier is likely to contain genes that are not expressed at the earlier time point, and likely to generate spurious mappings (since the set of cluster specific genes are not identical). Therefore, we sought to train a classifier that is trained using genes that are part of conserved transcriptional signatures at both time points, which corresponds to the third model.

      When this filtering was not performed, the temporal correspondences in the supervised classification model were less specific than those reported. In particular, ARI values dropped by about 15% on average. The simple reason for this is that a cluster specific gene at E13 (for e.g.) may no longer be expressed at E14, and vice-versa. Thus, by restricting the features to a common set of cluster specific genes, we obtained the “best possible” transcriptomic correspondences between clusters at consecutive time points. We note that the correspondences obtained in this way (Figure 3) were recovered through WOT when the results of the latter were collapsed at the cluster level (Supplementary Fig. 5).

      *Methods page 15: Inference of ipsilaterally-projecting RGC types. Wouldn't it be more valuable to consider more markers to distinguish RGC precursors?

      As indicated before, we used I-RGC genes and C-RGC genes reported in Wang et al., 2016 (Table 2), in addition to the well-known markers Zic2 and Isl2. Here, we prioritized genes that had been histologically validated (Figs. 4 and 5), which were expressed in our data (Sema3e and Tbx20 were not considered as these undetectable at E13 in our data). Following the reviewer’s earlier suggestion, we also noted that including Ephb1 in our signature minimally impacts the results.

      Discussion: *Is there somewhat a plasticity that allow the RGC subgroups to switch over time? (IF we were to record the transcriptome of the same cell over time, will one observe that the cell belong to another cluster / subgroup?

      One can only speculate. Other than long-term in vivo imaging combined with vital type-specific markers we know of no way to experimentally address the possibility that cells swap types postnatally so that the cells comprising type x at P5 are not the same ones that comprise type x at P56. It does seem pretty unlikely though.

      *While the data appears technically rigorous, and the number of cells sequenced very high, the results seem redundant with several prior studies and the discrepancies are not sufficiently discussed.

      We are confused by this point, since the reviewer does not cite the papers to which s/he refers. To our knowledge there is no study at present that has described RGC diversification, so it is not clear what would be discrepant.

    1. Author Response

      Reviewer #1 (Public Review):

      It is well established that valuation and value-based decision-making is context-dependent. This manuscript presents the results of six behavioral experiments specifically designed to disentangle two prominent functional forms of value normalization during reward learning: divisive normalization and range normalization. The behavioral and modeling results are clear and convincing, showing that key features of choice behavior in the current setting are incompatible with divisive normalization but are well predicted by a non-linear transformation of range-normalized values.

      Overall, this is an excellent study with important implications for reinforcement learning and decision-making research. The manuscript could be strengthened by examining individual variability in value normalization, as outlined below.

      We thank the Reviewer for the positive appreciation of our work and for the very relevant suggestions. Please find our point-by-point answer below.

      There is a lot of individual variation in the choice data that may potentially be explained by individual differences in normalization strategies. It would be important to examine whether there are any subgroups of subjects whose behavior is better explained by a divisive vs. range normalization process. Alternatively, it may be possible to compute an index that captures how much a given subject displays behavior compatible with divisive vs. range normalization. Seeing the distribution of such an index could provide insights into individual differences in normalization strategies.

      Thank you for pointing this out, it is indeed true that there is some variability. To address this, and in line with the Reviewer’s suggestion, we extracted model attributions per participant on the individual out-of-sample log-likelihood, using the VBA_toolbox in Matlab (Daunizeau et al., 2014). In experiment 1 (presented in the main text), we found that the RANGE model accounted for 79% of the participants, while the DIVISIVE model accounted for 12%. The relative difference was even higher when including the RANGEω model in the model space: the RANGE and RANGEω models account for a total of 85% of the participants, while the DIVISIVE model accounted only for 5%.

      In experiment 2 (presented in the supplementary materials), the results were comparable (see Figure 3-figure supplement 3: 73% vs 10%, 83% vs 2%).

      To provide further insights into the behavioral signatures behind inter-individual differences, we plotted the transfer choice rates for each group of participants (best explained by the RANGE, DIVISIVE, or UNBIASED models), and the results are similar to our model predictions from Figure 1C:

      Author Response Image 1. Behavioral data in the transfer phase, split over participants best explained by the RANGE (left), DIVISIVE (middle) or UNBIASED (right) model in experiment 1 (A) and experiment 2 (B) (versions a, b and c were pooled together).

      To keep things concise, we did not include this last figure in the revised manuscript, but it will be available for the interested readers in the Rebuttal letter.

      One possibility currently not considered by the authors is that both forms of value normalization are at work at the same time. It would be interesting to see the results from a hybrid model. R1.2 Thank you for the suggestion, we fitted and simulated a hybrid model as a weighted sum between both forms of normalization:

      First, the HYBRID model quantitatively wins over the DIVISIVE model (oosLLHYB vs oosLLDIV : t(149)=10.19, p<.0001, d=0.41) but not over the RANGE model, which produced a marginally higher log-likelihood (oosLLHYB vs oosLLRAN : t(149)=-1.82, p=.07, d=-0.008). Second, model simulations also suggest that the model would predict a very similar (if not worse) behavior compared to the RANGE model (see figure below). This is supported by the distribution of the weight parameter over our participants: it appears that, consistently with the model attributions presented above, most participants are best explained by a range-normalization rule (weight > 0.5, 87% of the participants, see figure below). Together, these results favor the RANGE model over the DIVISIVE model in our task.

      Out of curiosity, we also implemented a hybrid model as a weighted sum between absolute (UNBIASED model) and relative (RANGE model) valuations:

      Model fitting, simulations and comparisons slightly favored this hybrid model over the UNBIASED model (oosLLHYB vs oosLLUNB: t(149)=2.63, p=.0094, d=0.15), but also drastically favored the range normalization account (oosLLHYB vs oosLLRAN : t(149)=-3.80, p=.00021, d=-0.40, see Author Response Image 2).

      Author Response Image 2. Model simulations in the transfer phase for the RANGE model (left) and the HYBRID model (middle) defined as a weighted sum between divisive and range forms of normalization (top) and between unbiased (no normalization) and range normalization (bottom). The HYBRID model features an additional weight parameter, whose distribution favors the range normalization rule (right).

      To keep things concise, we did not include this last figure in the revised manuscript, but it will be available for the interested readers in the Rebuttal letter.

      Reviewer #2 (Public Review):

      This paper studies how relative values are encoded in a learning task, and how they are subsequently used to make a decision. This is a topic that integrates multiple disciplines (psych, neuro, economics) and has generated significant interest. The experimental setting is based on previous work from this research team that has advanced the field's understanding of value coding in learning tasks. These experiments are well-designed to distinguish some predictions of different accounts for value encoding. However there is an additional treatment that would provide an additional (strong) test of these theories: RN would make an equivalent set of predictions if the range were equivalently adjusted downward instead (for example by adding a "68" option to "50" and "86", and then comparing to WB and WT). The predictions of DN would differ however because adding a low-value alternative to the normalization would not change it much. Would the behaviour of subjects be symmetric for equivalent ranges, as RN predicts? If so this would be a compelling result, because symmetry is a very strong theoretical assumption in this setting.

      We thank the Reviewer for the overall positive appraisal concerning our work, but also for the stimulating and constructive remarks that we have addressed below. At this stage, we just wanted to mention that we also agree with the Reviewer concerning the fact that a design where we add "68" option to "50" and "86" would represent also an important test of our hypotheses. This is why we had, in fact, run this experiment. Unfortunately, their results were somehow buried in the Supplementary Materials of our original submission and not correctly highlighted in the main text. We modified the manuscript in order to make them more visible:

      Behavioral results in three experiments (N=50 each) featuring a slightly different design, where we added a mid value option (NT68) between NT50 and NT87 converge to the same broad conclusion: the behavioral pattern in the transfer phase is largely incompatible with that predicted by outcome divisive normalization during the learning phase (Figure 2-figure supplement 2).

      Reviewer #3 (Public Review):

      Bavard & Palminteri extend their research program by devising a task that enables them to disassociate two types of normalisation: range normalisation (by which outcomes are normalised by the min and max of the options) and divisive normalisation (in which outcomes are normalised by the average of the options in ones context). By providing 4 different training contexts in which the range of outcomes and number of options vary, they successfully show using 'ex ante' simulations that different learning approaches during training (unbiased, divisive, range) should lead to different patterns of choice in a subsequent probe phase during which all options from the training are paired with one another generating novel choice pairings. These patterns are somewhat subtle but are elegantly unpacked. They then fit participants' training choices to different learning models and test how well these models predict probe phase choices. They find evidence - both in terms of quantitive (i.e. comparing out-of-sample log-likelihood scores) and qualitative (comparing the pattern of choices observed to the pattern that would be observed under each mode) fit - for the range model. This fit is further improved by adding a power parameter which suggests that alongside being relativised via range normalisation, outcomes were also transformed non-linearly.

      I thought this approach to address their research question was really successful and the methods and results were strong, credible, and robust (owing to the number of experiments conducted, the design used and combination of approaches used). I do not think the paper has any major weaknesses. The paper is very clear and well-written which aids interpretability.

      This is an important topic for understanding, predicting, and improving behaviour in a range of domains potentially. The findings will be of interest to researchers in interdisciplinary fields such as neuroeconomics and behavioural economics as well as reinforcement learning and cognitive psychology.

      We thank Prof. Garrett for his positive evaluation and supportive attitude.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Fernandes et al. take advantage of synthetic constructs to test how Bicoid (Bcd) activates its downstream target Hunchback (Hb). They explore synthetic constructs containing only Bcd, Bcd and Hb, and Bcd and Zelda binding sites. They use these to develop theoretical models for how Bcd drives Hb in the early embryo. They show that Hb sites alone are insufficient to drive further Hb expression.

      The paper's first half focuses on how well the synthetic constructs replicate the in vivo expression of hb. This approach is generally convincing, and the results are interesting. Consistent with previous work, they show that Bcd alone is sufficient to drive an expression profile that is similar to wild‐type, but the addition of Hb and Zelda are needed to generate precise and rapid formation of the boundaries. The experimental results are supported by modelling. The model does a nice job of encapsulating the key conclusions and clearly adds value to the analysis.

      In the second part of the paper, the authors use their synthetic approach to look at how the Hb boundary alters depending on Bcd dosage. This part asks whether the observed Bcd gradient is the same as the activity gradient of Bcd (i.e. the "active" part of Bcd is not a priori the same as the protein gradient). This is a very interesting problem and good the authors have tried to tackle this. However, the strength of their conclusions needs to be substantially tempered as they rely on an overestimation of the Bcd gradient decay length.

      Comments:

      ‐ My major concern regards the conclusions for the final section on the activity gradient. In the Introduction it is stated: "[the Bcd gradient has] an exponential AP gradient with a decay length of L ~ 20% egg‐length (EL)". While this was the initial estimate (Houchmandzadeh et al., Nature 2002), later measurements by the Gregor lab (see Supplementary Material of Liu et al., PNAS 2013) found that "The mean length constant was reduced to 16.5 ± 0.7%EL after corrections for EGFP maturation". The original measurements by Houchmandzadeh et al. had issues with background control, that also led to the longer measured decay length. In later work, Durrieu et al., Mol Sys Biol 2018, found a similar scale for the decay length to Liu et al. Looking at Figure 5, a value of 16.5%EL for the decay length is fully consistent with the activity and protein gradients for Bcd being similar. In short, the strength of the conclusions clearly does not match the known gradient and should be substantially toned down.

      The reviewer is right: several studies aiming to quantitatively measure the Bicoid protein gradient ended‐up with quite different decay lengths.

      A summary of the various decay lengths measured, and the method used for these measurements is given below:

      As indicated, these measurements are quite variable among the different studies and the differences can potentially be attributed to different methods of detection (antibody staining on fixed samples vs fluorescent measurements on live sample) or to the type of protein detected (endogenous Bicoid vs fluorescently tagged).

      We agree with the reviewer that given these discrepancies, the exact value of the Bcd protein gradient decay length is not known and that we only have measurements that put it in between 16 and 25 % EL (see the Table above). Therefore, we agree that we should tone down the difference between the protein vs activity gradient and focus on the measurements of the effective activity gradient decay length allowed by our synthetic reporters. This allows us to revisit the measurement of the Hill coefficient of the transcription step‐like response, which is based on the decay‐length for the Bcd protein gradient, and assumed in previous published work to be of 20% EL (Gregor et al., Cell, 2007a; Estrada et al., 2016; Tran et al., PLoS CB, 2018). Importantly, the new Hill coefficient allows us to set the Bcd system within the limits of an equilibrium model.

      As mentioned by the reviewer, it is possible that the decay length of the protein gradient measured using antibody staining (Houchmandzadeh et al,, Nature, 2002) was not correct due to background controls. Such measurements were also performed in Xu et al. (2015) which agree with the original measurements (Houchmandzadeh et al., Nature 2002). As indicated in the table above, all the other measurements of the Bcd protein gradient decay length were done using fluorescently tagged Bcd proteins and we cannot exclude the possibility the wt vs tagged protein might have different decay lengths due to potentially different diffusion coefficients or half‐lives. Before drawing any conclusion on the exact value of the endogenous Bcd protein gradient decay length, it is essential to measure it again in conditions that correct for the background issues for immuno‐staining as it was done in Liu et al., PNAS, 2013 for the Bcd‐eGFP protein. In this study, the authors only measured the decay length of the Bcd fusion protein using immuno‐staining for the Bcd protein. Unfortunately, in this study, the authors did not measure again the decay length of the endogenous Bcd protein gradient using immuno‐staining and the same procedure for background control. Therefore, they do not firmly exclude the possibility that the endogenous vs tagged Bcd proteins might have different decay length.

      We thank the reviewer for his comment which helped us to clarify the message. In addition, as there is clearly an issue for the measurements of the Bcd protein gradient, we added a section in the SI (Section E) and a Table (Table S4) describing the various decay length measured for the Bcd or the Bcd‐fluorescently tagged protein gradients from previous studies. In the discussion, together with the possibility that there might be a protein vs activity gradient (as we originally proposed and believe is still a valid possibility), we also discuss the alternative possibility proposed by the reviewer which is that the protein vs activity gradients have the same decay lengths but that the decay length of the Bcd protein gradient was potentially not correctly evaluated.

      ‐ All of the experiments are performed in a background with the hb gene present. Does this impact on the readout, as the synthetic lines are essentially competing with the wild‐type genes? What controls were done to account for this?

      We agree with the reviewer that this concern might be particularly relevant at the hb boundary where a nucleus has been shown to only contain ~ 700 Bicoid molecules (Gregor et al., Cell, 2007b). However, ~1000 Bicoid binding regions have been identified by ChIP seq experiments in nc14 embryos (Hannon et al., Elife, 2017) and given that several Bcd binding sites are generally clustered together in a Bcd region, the number of Bcd binding sites in the fly genome is likely larger than 1000. It is much greater than the number of Bicoid binding sites in our synthetic reporters. Therefore, we think that it is unlikely that adding the synthetic reporters (which in the case of B12 only represents at most 1/100 of the Bcd binding sites in the genome) will severely alter the competition for Bcd binding between the other Bcd binding sites in the genome. Additionally, the insertion of a BAC spanning the endogenous hb locus with all its Bcd‐dependent enhancers did not affect (as far as we can tell) the regulation of the wildtype gene (Lucas, Tran et al., 2018).

      We have added a sentence concerning this point in the main text (lines 108 to 111).

      ‐ Further, the activity of the synthetic reporters depends on the location of insertion. Erceg et al. PLoS Genetics 2014 showed that the same synthetic enhancer can have different readout depending on its genomic location. I'm aware that the authors use a landing site that appears to replicate similar hb kinetics, but did they try random insertion or other landing site? In short, how robust are their results to the specific local genome site? This should have been tested, especially given the boldly written conclusions from the work.

      This concern of the reviewer has been tested and is addressed Fig S1 where we compare two random insertions of the hb‐P2 transgene (on chromosome II and III; Lucas, Tran et al., 2018) and the insertion at the VK33 landing site that was used for the whole study. As shown Fig. S1, the dynamics of transcription (kymographs) are very similar. In the main text, the reference Fig. S1 is found in the Materials and Methods section (bottom of the 1st paragraph concerning the Drosophila stocks, lines 518).

      ‐ Related to the above, it's also not obvious that readout is linear ‐ i.e. as more binding sites are added, there could be cooperativity between binding domains. This may have been accounted for in the model but it is not clear to me how.

      The reviewer is totally correct. It is clear from our data that readout is not linear: comparing (increase of 1.5 X in the number of BS) B6 with B9 leads to a 4.5 X greater activation rate and this argues against independent activation of transcription by individual bound Bcd TF. There is almost no impact of adding 3 more sites when comparing B9 to B12 (even though it corresponds to an increase of 1.33 X in the number of BS). This issue has been rephrased in the main text (lines 200 to 203) and further developed for the modeling aspects in the SI section C and Figure S3. It is also discussed in the second paragraph of the discussion (lines 380 to 383).

      ‐ It would be good in the Introduction/Discussion to give a broader perspective on the advantages and disadvantages of the synthetic approach to study gene regulation. The intro only discusses Tran et al. Yet, there is a strong history of using this approach, which has also helped to reveal some of the approaches shortcoming. E.g. Gertz et al. Nature 2009 and Sharon et al. Nature Biotechnology 2012. Again, I may have missed, but from my reading I cannot see any critical analysis of the pros/cons of the synthetic approach in development. This is necessary to give readers a clearer context.

      One sentence was added in the introduction concerning this point (lines 79 to 82).

      A short review concerning the synthetic approach in development has also been added at the beginning of the discussion (lines 347 to 359).

      Reviewer #2 (Public Review):

      It is known that Bicoid increases in concentration across the syncytial division cycles, the gradient length scale for Bicoid does not change, and hunchback also increases in concentration during the syncytial cycles but the sharp boundary of the hunchback gradient is constantly seen despite the change in concentration of Bicoid. This manuscript shows that by increasing the Bicoid concentration or by adding Zelda binding sites, the expression of hunchback can be recapitulated to that of a previously studied promoter for hunchback.

      I have the following comments to understand the implications of the study in the context of increasing concentrations of Bicoid during the syncytial division cycles:

      ‐ Bicoid itself is also increasing over the syncytial division cycles, how does this change in concentration of Bicoid affect the activation of the hunchback promoter given the cooperative binding of Bicoid and Bicoid and Zelda as documented by the study?

      We thank the reviewer for this remark about the dynamics of the Bcd gradient, which we may have taken for granted. A seminal work on the dynamics of the Bcd gradient using fluorescent‐tagged Bcd (Gregor et al, Cell, 2007a) has shown that the gradient of Bcd nuclear concentration (this nuclear concentration is the one that matter for transcription) remains stable over nuclear cycles, despite a global increase of Bcd amount in the embryo. This can be explained by the fact that Bcd molecules are imported in the nuclei and that the number of nuclei double at every cycle, such that both processes compensate each other. Thus, we assumed that the gradient of Bcd nuclear concentration was stable over nc11 to nc13.

      We have clarified this assumption in the model section in the manuscript (lines 165‐168).

      Supporting our assumption, when looking at the transcription dynamics regulated by Bcd, in Lucas et al, PLoS Gen, 2018, we observed very reproducible expression pattern dynamics of the hb‐P2 reporter at each cycle nc11 to nc13. Such reproducibility in the pattern dynamics were also observed in this current work for hb‐P2, B6, B9, B12 and H6B6 reporters (Fig. S6A). Also, in Lucas et al, PLoS Gen, 2018, the shift in the established boundary positions of hb‐P2 reporter between nc11 to nc13 is ~2%EL (approximately a nucleus length ~10μm) and it is thus marginal.

      In addition, as mentioned in the text (lines 105 to 107), we only focused our analysis on nc13 data which are statistically stronger given the higher number of nuclei analyzed. Thus, any change of Bcd nuclear concentration that would happen over nuclear cycles will not matter.

      Concerning Zelda: Zelda’s transcriptional activity when measured on a reporter with only 6 Zld binding sites changes drastically over the nuclear cycles, with strong activity at nc11 and much weaker activity at nc13 (Fig S4A). This indicates that the changes in expression pattern dynamics of Z2B6 from nc11 to nc13 are caused predominantly by decreasing Zelda activity: the effect of Zld on the Z2B6 promoter is very strong during nc11 and nc12. It is also very strong at the beginning of nc13 (even though the Z6 reporter is almost silent) and became a bit weaker in the second part of nc13 (Fig S4B‐D).

      ‐ Does the change in concentration of Bicoid across the nuclear cycles shift the gradient similar to the change in numbers of Bicoid binding sites?

      In both Lucas et al, PLoS Gen, 2018 and in this work (Fig. 1, Fig. 3 and Fig. S6A), we found that the positions of the expression boundary are very reproducible and stable in time for hb‐P2, B6, B9, B12, H6B6 during the interphase of nc12 to 13. For hb‐P2, the averaged shift of the established boundary position in nc11, 12 and 13 is within 2 %EL. This averaged shift between the cycles is of similar magnitude to the difference caused by embryo‐to‐embryo variability within nc13 (~2 %EL) (Gregor et al, Cell, 2007b, Lucas et al, PloS Gen, 2018). This shift is much smaller than the difference between the expression boundary positions of B6 and B9 (~ 8 % EL) and between B6 and Z2B6 (~17.5 %EL) in nc13.

      For these reasons, we conclude that the difference between the expression patterns of B6, B9 and Z2B6 are caused predominantly by changing the TF binding site configurations of the reporters, rather than variability in the Bcd gradient.

      The assumption of gradient stability has been clarified in the previous answer and in the manuscript (lines 165‐168).

      ‐ The intensity is a little higher for B9 and B12 at the anterior in 2B? Is this statistically different? is this likely to change the amount of Bicoid expression at the locus and lead to more robust activation?

      We performed statistical tests to distinguish the spot intensities at the anterior pole for every pair of reporters in Fig. 2B (hb‐P2, B6, B9 and B12). All p‐values from pair‐wise KS tests are greater than 0.067, suggesting that the spot intensities at the anterior pole are not distinguishable between these reporters.

      We have clarified this in the manuscript (line 157).

      ‐Are the fraction of active loci not changing across the syncytial cycles when the concentration of Bicoid also changes and consistent with the synthetic promoters?

      To measure the reproducibility of the expression pattern dynamics in different nuclear cycles, we compared the boundary position of the fraction of active loci pattern as a function of time for all hbP2 and synthetic reporters (Fig. S6A). In this figure panel, for all reporters except Z2B6, the curves in nc12 and nc13 largely overlap, suggesting high reproducibility in the pattern dynamics between cycles and consequently low sensitivity to the subtle variation in the Bcd nuclear concentration gradient between the cycles.

      For Z2B6, we attributed the difference in pattern dynamics between nc12 and nc13 to the changes in Zelda activity, as validated independently with a synthetic reporter with only 6 Zld binding sites (Fig. S4A).

      ‐How do the numbers of Hb BS change the expression of Hb? H6B6 has 6 Hb BS whereas the Hb‐P2 has 1? Are more controls needed to compare these 2 contexts?

      As our goal was to determine to which mechanistic step of our model each TF (Bcd, Hb, Zld) contributed, we added BS numbers that are much higher than in the hb‐P2 promoter. The added number of Hb BS remains very low when compared to total number of Hb binding sites in the entire genome (Karplan et al, PLOS Gen, 2011), therefore, it is very unlikely to affect the endogenous expression of Hb protein.

      We clarified this in the manuscript (lines 211 to 212).

      Does Zelda concentration change across the syncytial division cycles? How does the change in concentration in the natural context affect the promoter activation of Hb?

      Zelda concentration is stable over the nuclear cycles, as observed with the fluorescently‐tagged Zld protein (Dufourt et al., Nat Com, 2018). However, Zelda’s transcriptional activity when measured on a reporter with only 6 Zld binding sites changes drastically over the nuclear cycles, with strong activity at nc11 and much weaker activity at nc13 (Fig S4A, this work).

      The impact of this change in Zld activity can be observed with the Z2B6 promoter, with the expression boundary moving from the posterior region toward the anterior region over the nuclear cycles (Fig. S4B‐D). However, we don’t detect any changes in the expression pattern dynamics of hb‐P2 over the nuclear cycles (Fig. S6A and in Lucas et al., PLoS Gen, 2018).

      We have clarified this in lines 250‐251 of the main manuscript.

      ‐Changing the dose of Bicoid shifts the boundary of hunchback expression. It would be nice to model or test this in the context of varing doses of zelda or even reason this with respect to varying doses of zelda across the syncytial division cycles.

      We thank the reviewer for this insight. Concerning Zelda, we did not perform any experiment reducing the amount of Zelda in the embryo. However, in a previous study (Lucas et al., PLoS Genetics, 2018), we observed that the boundary of hb was shifted towards the anterior when decreasing the amount of Zelda consistent to the fact that the dose of Zelda is critical to set the boundary position and the threshold of Bcd concentration required for activation. However, as Zelda is distributed homogeneously along the AP axis, it cannot bring per se positional information to the system.

      Reviewer #3 (Public Review):

      I think the framing could be improved to better reflect the contribution of the work. From the abstract, for example, it's unclear to me what the authors think is the most meaningful conclusion. Is it the observations about the finer details of TF regulation (bursting dynamics), the fact that Bcd is probably the sole source of "positional information" for hb‐p2, that Bcd exists in active/inactive form, or the fact that an equilibrium model probably suffices to explain what we observe? The first sentence itself seems to suggest this paper will discuss "dynamic positional information", in which case it's somewhat misleading to say this kind of work is "largely unexplored"; Johannes Jaeger in particular has been a strong proponent of this view since at least 2004. On that note some particularly relevant recent papers in the Drosophila early embryo include:

      1) Jaeger and Verd (2020) Curr Topics Dev Biol

      2) Verd et al. (2017) PLoS Comp Biol

      3) Huang, Amourda, et al. and Saunders (2017) eLife

      4) Yang, Zhu, et al. (2020) eLife [see also the second half of Perkins (2021) PLoS Comp Biol for further discussion of that model]

      ‐Some reviews from James Briscoe also discuss this perspective.

      We agree with the reviewer that the phrasing of the abstract was not clear enough to emphasize the contribution of the work and we are also sorry if it suggested that the dynamic positional information is largely unexplored because this was not at all our intention.

      We rephrased the abstract aiming to better highlight the most meaningful conclusions.

      ‐I would also recommend modifying the title to reflect the biology found in the new results.

      We modified the title to better reflect the new results:<br /> “Synthetic reconstruction of the hunchback promoter specifies the role of Bicoid, Zelda and Hunchback in the dynamics of its transcription”

      ‐A major point that the authors should address is the design of the synthetic constructs. From table S1, the sites are often very closely linked (4‐7 base pairs). From the footprint of these proteins, we know they can cover DNA across this size (see, https://pubmed.ncbi.nlm.nih.gov/8620846/). As such, there may be direct competition/steric hindrance (see https://pubmed.ncbi.nlm.nih.gov/28052257/). What impact does this have on their interpretations? Note also that the native enhancer has spaced sites with variable identities.

      We completely agree with the reviewer comment in the sense that we named our reporters according to the number (N) of Bcd binding sites sequences that they contain, even though we cannot prove definitively that they can effectively be bound simultaneously by N Bcd molecules. It is thus possible that B9 is not a B9 but an effective B6 (i.e. B9 can only be bound simultaneously by 6 molecules) if, for instance, the binding of a Bcd molecule to one site would prevent by the binding of another Bcd molecule to a nearby site (as proposed by the reviewer in the case of direct competition or steric hindrance).

      Even though we cannot exclude this possibility, we think that our use of B6, B9, B12, in reference to the 6 Bcd BS of hb‐P2 promoter, is relevant for several reasons : i) some of the Bcd BS in the hb‐P2 promoter are also very close from each other (see Table S1); ii) the design of the synthetic construct was made by multimerizing a series of 3 strong Bcd binding sites with a similar spacing as found for the closest sites in the hb‐P2 promoter (as shown in Figure 1A and Table S1); iii) the binding of the Bicoid protein has been shown in foot printing experiments in vitro to be more efficient on sites of the hb‐P2 promoter that are close from each other, and this has even been interpreted as binding cooperativity (Ma et al., 1996); iv) even though these experiments were not performed with full‐length proteins, two molecules of the paired homeodomain (from the same family of DNA binding domain as Bcd) are able to simultaneously bind to two binding sites separated by only 2 base pairs. This binding to very close sites is even cooperative while when the two sites are distant by 5 base pairs or more, the simultaneous binding to the two sites occurs without cooperativity (Wilson et al., 1993).

      Conversely, as it is very difficult to demonstrate that 9 Bcd molecules can effectively bind to our B9 promoter, it is very difficult to know exactly how many binding sites for Bcd the hb‐P2 contains, and a large debate concerning not only the number but also the identity of the Bcd sites in the hb promoter is still ongoing (Park et al., 2019; Ling et al., 2019).

      As we cannot exclude the possibility that B9 is an effective B6, it remains possible that B9 and hb‐P2 (which is supposed to only contains 6 sites) have the same number of effective Bcd binding site and this could explain why the two reporters have very similar transcription dynamics and features.

      Regarding other interpretations in the manuscript, we identified two other aspects that will be affected if our synthetic reporters have fewer effective sites than the number of sites they carry. The first one concerns the synergy, as the increase in the number of sites of 1.5 from B6 to B9 might be over‐estimated but this would even increase the synergistic effect given the 4.5 difference in activity of the two reporters (Fig. S3). The second one concerns the discussion on the Hill coefficient and the decay length where the effective number of binding sites (N) is required to determine the limit of concentration sensing (Fig. 5). This would particularly be important for the hb‐P2 promoter.

      Except for these specific points, we don’t think that the possibility that reporters do not exactly contain as many as effective binding sites than proposed, has a huge impact on our interpretations and the general message conveyed in this manuscript. Most importantly, it is very clear that our B6 and B9 reporters differ only by three Bcd binding sites and have yet very distinct expression dynamics: while B9 recapitulates almost all transcription features of hb‐P2, B6 is far from achieving it. Similarly, H6B6 and Z2B6 have very different transcription features than B6 and these differences have been key for understanding the mechanistic functions of the three TF we studied.

      This discussion has been added to the discussion (lines 400 to 414)

    1. Author Response:

      Reviewer #3 (Public Review):

      This paper reports that levodopa administration to healthy volunteers enhances the guidance of model-free credit assignment (MFCA) by model-based (MB) inference without altering MF and MB learning per se. The issue addressed is fascinating, timely and clinically relevant, the experimental design and analysis strategy (reported previously) are complex, but sophisticated and clever and the results are tantalizing. They suggest that ldopa boosts model-based instruction about what (unobserved or inferred) state the model-free system might learn about. As such, the paper substantiates the hypothesis that dopamine plays a role specifically in the interaction between distinct model-based and model-free systems. This is really a very valuable contribution, one that my lab and I expect many other labs had already picked up immediately after it appeared as a preprint.

      Major strengths include the combination of pharmacology with a substantial sample size, clever theory-driven experimental design and application of advanced computational modeling. The key effect of ldopa on retroactive MF inference is not large, but substantiated by both model-agnostic and model-informed analyses and therefore the primary conclusion is supported by the results.

      The paper raises the following questions.

      What putative neural mechanism led the authors to predict this selective modulation of the interaction? The introduction states that "Given DA's contribution to both MF and MB systems, we set out to examine whether this aspect of MB-MF cooperation is subject to DA influence." This is vague. For the hypothesis to be plausible, it would need to be grounded in some idea about how the effect would be implemented. Where exactly does dopamine act to elicit an effect on retroactive MB inference, but not MB learning per se? If the mechanism is a modulation of working memory and/or replay itself, then shouldn't that lead to boosting of both MB learning as well as MB influences on MF learning? Addressing this involves specification of the mechanistic basis of the hypothesis in the introduction, but the question also pertains to the discussion section. Hippocampal replay is invoked, but can the authors clarify why a prefrontal working memory (retrieval) mechanism invoked in the preceding paragraph would not suffice. In any case, it seems that an effect of dopamine on replay would also alter MB choice/planning?

      In sum, we agree with this criticism and have now revised the relevant intro paragraph (p. 3/4).

      We now discuss DAergic manipulation of replay in particular (p. 24). We infer that a component of a MB influence over choice comes from the way it trains a putative MF system (something explicitly modelled in Mattar & Daw, 2018, and a new preprint from Antonov et al., 2021, referencing data from Eldar et al., 2020) – and consider what happens if this is boosted by DA manipulations. The difference between the standard two-step task and the present task is that in our task there is extra work for the MB system in order to perform inference so as to resolve uncertainty for MFCA. We later suggest that the anticorrelation we found between the effect of DA on MB influence over choice and MB guidance of MFCA arises from this extra work.

      The broader questions raised about (prefrontal) working memory and (hippocampal) replay pertains to recent and ongoing work, and we feel this should be part of the discussion, which we have re-written this to detail more clearly different possible mechanistic explanations, pointing to how they might be tested in the future (p. 23/24).

      A second issue is that the critical drug effects seems somewhat marginally significant and the key plots (e.g. Fig3b and Fig 44b,c, but also other plots) do not visualize relevant variability in the drug effect. I would recommend plotting differences between LDopa and placebo, allowing readers to appreciate the relevant individual variability in the drug effects.

      We have now replotted the data in the new Figures 4 and 5 to reflect drug-related variability.

      Third, I do wonder how to reconcile the lack of a drug x common reward effect (the lack of a dopamine effect on MF learning) as well as the lack of a drug effect on choice generalization with the long literature on dopamine and MF reinforcement and newer literature on dopamine effects on MB learning and inference. The authors mention this in the discussion, but do not provide an account. Can they elaborate on what makes these pure MB and MF metrics here less sensitive than in various other studies, and/or what are the implications of the lack of these effects for our understanding of dopamine's contributions to learning?

      Regarding a lack of a drug effect on MF learning or control, we now elaborate on this on p. 22/23:

      “With respect to our current task, and an established two-step task designed to dissociate MF and MB influences (Daw et al., 2011), there is as yet no compelling evidence for an direct impact of DA on MF learning or control (Deserno et al., 2015a; Kroemer et al., 2019; Sharp et al., 2016; Wunderlich et al., 2012, Kroemer et al., 2019). A commonality of our novel and the two-step task is dynamically changing reward contingencies. As MF learning is by definition incremental, slowly accumulating reward value over extended time-periods, it follows that dynamic reward schedules may lessen a sensitivity to detect changes in MF processes (see Doll et al., 2016 for discussion). In line with this, experiments in humans indicate that value-based choices performed without feedback-based learning (for reviews see, Maia & Frank, 2011; Collins and Frank, 2014), as well as learning in stable environments (Pessiglione et al., 2006), are susceptible to DA drug influences (or genetic proxies thereof) as expected under an MF RL account. Thus, the impact of DA boosting agents may vary as a function of contextual task demands. This resonates with features of our pharmacological manipulation using levodopa, which impacts primarily on presynaptic synthesis. Thus, instead of necessarily directly altering phasic DA release, levodopa impacts on baseline storage (Kumakura and Cumming, 2009), likely reflected in overall DA tone. DA tone is proposed to encode average environmental reward rate (Mohebi et al., 2019; Niv et al., 2007), a putative environmental summary statistic that might in turn impact an arbitration between behavioural control strategies according to environmental demands (Cools, 2019).”

      As pointed out by the reviewer as well, in the present task we did not find an effect of levodopa on MB influences per se and now discuss this on p. 22:

      “In this context, a primary drug effect on prefrontal DA might result in a boosting of purely MB influences. However, we found no such influence at a group level – unlike that seen previously in tasks that used only a single measure of MB influences (Sharpe et al., 2017; Wunderlich et al., 2012). Our novel task systematically separates two MB processes: a guidance of MFCA by MB inference and pure MB control. While we found that only one of these, namely guidance of MFCA by MB inference, was sensitive to enhancement of DA levels at a group level, we did detect a negative correlation between the DA drug effects on MB guidance of MFCA and on pure MBCA. One explanation is that a DA-dependent enhancement in pure MB influences was masked by this boosting in the guidance of MFCA by MB inference. In this regard, our data is suggestive of between-subject heterogeneity in the effects of boosting DA on distinct aspects of MB influences.”

      Another open question remains as to why different task conditions (guidance of MFCA by MB vs. pure MB control) apparently differ in their sensitivity to the drug manipulation. We discuss this (p. 22) by proposing that a cost-benefit trade-off might play an important role (Westbrook et al., 2020).

      Fourth, the correlation with WM and drug effect on preferential MBCA for non-informative but not informative destination is really quite small, and while I understand that WM should be associated with preferential MBCA under placebo, it does not become clear what makes the authors predict specifically that WM predicts a dopa effect on this metric, rather than the metric taken under placebo, for example.

      Our initial reasoning was that MFCA based on reward at the non-informative destination should be particularly sensitive to WM, on the basis that the reward is no longer perceptually available once state uncertainty can be resolved by the MB system. However, we agree with the reviewer that this reasoning does not indicate why it should specifically effect the drug-induced change. In light of this critique, we have removed this part from the abstract, introduction and the main results but still report this relation to WM in Appendix 1 (p. 44/45, subheading “Drug effect on guidance of MFCA and working memory”, Appendix 1 - Figure 11) as an exploratory analysis as suggested in the editor’s summary.

      A fifth issue is that I am not quite convinced about the negative link between dopamine's effects on MBCA and on PMFCA. The rationale for including WM, informativeness as well as DA effects on MBCA in the model of DA effects on PMFCA wasn't clear to me. The reported correlation is statistically quite marginal, and given that it was probably not the first one tested and given the multiple factors involved, I am somewhat concerned about the degree to which this reflects overfitting. I also find the pattern of effects rather difficult to make sense of: in high WM individuals, the drug-effects on PMFCA and MBCA are negatively related for informative and non-informative destinations. In low WM individuals, the drug-effects on PMFCA and MBCA are negatively related for informative, but not non-informative destinations. It is unclear to me how this pattern leads to the conclusion that there is a tradeoff between PMFCA and MBCA. And even if so, why would this be the case? It would be relevant to report the simple effects, that is the pattern of correlations under placebo separately from those under ldopa.

      The reviewer’s critique is well taken. In connection to the working memory finding reported in the previous section of the initial manuscript, we reasoned that it would be necessary to include WM in the model as well. We still consider this analysis on inter-individual differences in drug effects from different task conditions is important because it connects our current work to previous work linking DA to MB control. However, we now perform a simplified analysis on this where we leave out WM and instead average PMFCA across informative and non-informative destinations (since we had no prior hypothesis that these conditions should differ, p. 19/20). This results in a significant negative correlation of drug-related change in average PMFCA and MB control (Figure 6A, r=-.31,p=.02 Pearson r=-.30, p=.017, Spearman r=-.33, p=.009). In addition, we also ran extended simulations to verify that this negative correlation does not result from correlations among model parameters (see Appendix 1 - Figure 10 for control analysis verifying that this negative correlation survives control for parameter-tradeoff).

      Figure 6. Inter-individual differences in drug effects in MBCA and in preferential MFCA, averaged across informative and non-informative destinations (aPMFCA). A) Scatter plot of the drug effects (levodopa minus placebo; ∆ aPMFCA, ∆ MBCA). Dashed regression line and r Pearson correlation coefficient. B) Drug effects in credit assignment (∆ CA) based on a median on ∆ MBCA. Error bars correspond to SEM reflecting variability between participants.

      As suggested by the reviewer, we unpack this correlation further (p. 19/20) by taking the median on Δ MBCA (-0.019) and split the sample in lower/higher median groups. The higher median group showed a positive (M= 0.197, t(30)= 4.934, p<.001) and the lower-median group showed a negative (M= -0.267, t(30)= -7.97, p<.001) drug effect on MBCA, respectively (Figure 6B). In a mixed effects model (see Methods), we regressed aPMFCA against drug and a group indicator of lower/higher median Δ MBCA groups. This revealed a significant drug x Δ MBCA-split interaction (b=-0.17, t(120)=-2.05, p=0.042). In the negative Δ MBCA group (Figure 6B), a significantly positive drug effect on aPMFCA was detected (simple effect: b=.18, F(120,1)=10.35, p=.002) while in the positive Δ MBCA group a drug-dependent change in aPMFCA was not significant (Figure 6B, simple effect: b=.02, F(120,1)=0.10, p=.749).

      We have changed the respective section of the results accordingly (p. 19/20). Further, we have motivated this exploratory analysis more clearly in the introduction (p. 3/4) in terms of it providing a link to previous relevant studies (Deserno et al., 2015a; Groman et al., 2019; Sharp et al., 2016; Wunderlich et al., 2012). Lastly, we have endeavoured to improve the discussion on this (p. 21/22).

      More generally I would recommend that the authors refrain from putting too much emphasis on these between-subject correlations. Simple power calculation indicates that the sample size one would need to detect a realistically small to medium between-subject effect (that interacts with all kinds of within-subject factors) is in any case much larger than the sample size in this study.

      We agree with this and have, as mentioned above, substantially adjusted the section on inter-individual differences. We have moved the WM analysis to Appendix 1 (p. 44/45, subheading “Drug effect on guidance of MFCA and working memory”, Appendix 1 - Figure 11) and greatly simplified the analysis of inter-individual differences in drug effects (see previous paragraph). We also mention the overall small to moderate effects in the limitations section (p. 25/26).

      Another question is how worried should we be that the critical MB guidance of MFCA effect was not observed under placebo (Figure 3b)? I realize that the computational model-based analyses do speak to this issue, but here I had some questions too. Are the results from the model-informed and model-agnostic analyses otherwise consistent? Model-agnostic analyses reveal a greater effect of LDopa on informative destination for the ghost-nominated than the ghost-rejected trials and no effect for noninformative destination. Conversely model-informed analyses reveal a nomination effect of ldopa across informative and noninformative trials. This was not addressed, or am I missing something? In fact, regarding the modeling, I am not the best person to evaluate the details of the model comparison, fitting and recovery procedures, but the question that does rise is, and I would make explicit in the current paper how does this model space, the winning model and the modeling exercise differ (or not) from that in the previous paper by Moran et al without LDopa administration.

      A detailed response to this was provided in replay to point 6 as summarized by the editor. And we provide a summary here as well.

      Firstly, we clearly indicate discrepancies between our model-agnostic and computational modelling analyse and acknowledge that discrepancies may be expected when effects of interest are weak to moderate, which we acknowledge (p. 25/26, limitations).

      Secondly, the results from the computational model are generally statistically stronger, which is not surprising given that they are based on influences from far more trials. We now include a discussion of this in more detail in the section on limitations (p. 25/26).

      Thirdly, although the computational model uses a slightly different parameterization from that reported in Moran et al. (2019), it is a formal extension of that model, allowing the strength of effects for informative and uninformative destinations to differ. We now include a reference to this change in parameterization in the limitation section (p. 25/26), and include a more detailed description in Appendix 1 (p. 45-47).

      Finally, to test if the current models support our main conclusion from Moran et al. (2019) that retrospective MB inference guides MFCA for both the informative and non-informative destinations, we reanalysed the Moran et al. (2019) data using the current novel models and found converging support, as we now report (Appendix 1 – Figure 8).

      Finally, the general story that dopamine boosts model-based instruction about what the model-free system should learn is reminiscent of the previous work showing that prefrontal dopamine alters instruction biasing of reinforcement learning (Doll and Frank) and I would have thought this might deserve a little more attention, earlier on in the intro.

      The reviewer is indeed correct and we now reference this line of work (Doll et al., 2009, 2011) in the intro (p. 4).

    1. Author Response

      Reviewer #1 (Public Review):

      While the mechanism about arm-races between plant and specialist herbivores has been studied, such as detoxification of specific secondary metabolites, the mechanism of the wider diet breadth, so-called generalist herbivores have been less studied. Since the heterogeneity of host plant species, the experimental validation of phylogenetic generalism of herbivores seemed as hard to be conducted. The authors declared the two major hypotheses about the large diet breadth ("metabolic generalism" and "multi-host metabolic specialism"), and carefully designed the experiment using Drosophila suzukii as a model herbivore species.

      By an untargeted metabolomics approach using UHPLC-MS, authors attempted to falsify the hypotheses both in qualitative- and quantitative metabolomic profiles. Intersections of four fruit (puree) samples and each diet-based fly individual samples from the qualitative data revealed that there were few ions that occur as the specific metabolite in each diet-based fly group, which could reject the "multi-host metabolic specialism" hypothesis. Quantitative data also showed results that could support the "metabolic generalism" hypothesis. Therefore, the wide diet breadth of D. suzukii seemed to be derived from the general metabolism rather than the adaptive traits of the diverse host plant species. On the other hand, the reduction of the metabolites (ions) set using GLM seemed logical and 2-D clustering from the reduced ions set showed that quantitative aspects of diet-associated ions could classify "what the flies ate". These interesting results could enhance the understanding of the diet breadth (niche) of herbivorous insects.

      The authors' approach seemed clear to falsify the hypotheses based on the appropriate data processing. The intersection of shared ions from the qualitative dataset could distinguish the diet-specific metabolites in flies and commonly occurring metabolites among flies and/or fruits. Also, filtering on the diet-specific ions seemed to be a logical and appropriate way. Meanwhile, the discussion about the results seemed to be focused on different points regarding the research hypotheses which were raised in the introduction part. Discussion about the results mainly focused on the metabolism of D. suzukii itself, rather than the research hypotheses and questions that were raised from the evolution of the wide diet breadth of generalist herbivores. In particular, the conclusion seems to be far from the main context of the authors' research; e.g. frugivory. It makes the implication of the study weaker.

      We wish to thank Reviewer #1 for their appreciation of our study. As recommended, we now focus our discussion more on the general aspect of our findings (relevant to insects, herbivores, or frugivores), and less on the peculiarities of the metabolism of D. suzukii itself. Specifically, we now only mention D. suzukii in one section (two sentences) of our Discussion, to serve as an example (l.387-396). Thanks to this comment, the Discussion may interest a broader readership, on the evolution of diet breadth in generalist herbivorous species and offers a better understanding of the general implications of our findings.

      Reviewer #2 (Public Review):

      The manuscript: "Metabolic consequences of various fruit-based diets in a generalist insect species" by Olazcuaga et al., addresses an interesting question. Using an untargeted metabolomics approach, the authors study how diet generalism may have evolved versus diet specialization which is generally more commonly observed, at least in drosophila species. Using the phytophagous species Drosophila suzukii, and by directly comparing the metabolomes of fruit purees and the flies that fed on them, the authors found evidence for "metabolic generalism". Metabolic generalism means that individuals of a generalist species process all types of diet in a similar way, which is in contrast to "multi-host metabolic specialism" which entails the use of specific pathways to metabolize unique compounds of different diets. The authors find strong evidence for the first hypothesis, as they could easily detect the signature of each fruit diet in the flies. The authors then go on to speculate on the evolutionary ramifications of this for how potentially diet specializations may have evolved from diet generalism. Overall, the paper is well written, the experiments well documented, and the conclusions convincing.

      We thank Reviewer #2 for their comments and appreciation of our work.

      Reviewer #3 (Public Review):

      Laure Olazcuaga et al. investigated the metabolomes of four fruit-based diets and corresponding individuals of Drosophila suzukii that reared on them using comparative metabolomics analysis. They observed that the four fruit-based diets are metabolically dissimilar. On the contrary, flies that fed on them are mostly similar in their metabolic response. From a quantitative point of view, they find that part of the fly metabolomes correlates well with that of the corresponding diet metabolomes, which is indicative of insect ingestive history. By further focusing on 71 metabolites derived from diet-specific fly ions and highly abundant fruit ions, the authors show that D. suzukii differentially accumulates diet metabolism in a compound-specific manner. The authors claim that the data support the metabolic generalism hypothesis while rejecting the multi-host metabolic specialism hypothesis. This study provides a valuable global chemical comparison of how diverse diet metabolites are processed by a generalist insect species.

      Strengths:

      The rapid advances in high-resolution mass spectrometry have recently accelerated the discovery of many novel post-ingestive compounds through comparative metabolomics analysis of insect/frass and plant samples. Untargeted metabolomics is thus a very powerful approach for the systematic comparison of global chemical shifts when diverse plant-derived specialized metabolites are further modified or quantitatively metabolized after ingestion by insects. The technique can be readily extended to a larger micro- or macro-evolutionary context for both generalist and specialist insects to systematically investigate how plant chemical diversity contributes to dietary generalism and specialism.

      We would like to thank Reviewer #3 for their insightful comments on the power of untargeted metabolomics to evaluate the fate of plant metabolites and their use by herbivores. We also agree that these techniques can be used to tackle eco-evolutionary issues, such as the origin and maintenance of dietary generalism and specialism here. We hope that our study will inspire other researchers to explore such techniques and experiments to gain a global overview of biochemistry fluxes and their evolution. We now mention it in the conclusion (L454-459).

      Weaknesses:

      The authors claim that their data support the hypothesis of metabolic generalism, however, a total analysis of insect metabolism may not generate a clean dataset for direct comparison of fruit-derived metabolites with those metabolized by D. suzukii, given that much of these metabolites would be "diluted" proportionally by insect-derived metabolites. If the insect-derived metabolites predominate, then, as the authors observed, a tight clustering of D. suzukii metabolomes in the PCA plot would be expected. It is therefore very difficult to interpret these patterns.

      We agree with Reviewer #3 that a careful examination of the different possible origins of metabolites should take place to distinguish between our two competing hypotheses.

      The only source of metabolites for insects in our experimental setup is a mixture of (i) a large proportion of fruit purees and (ii) a minor proportion of artificial medium consisting mainly of yeast. Our goal is thus to understand the fate of (i) “fruit-derived” metabolites (transformed and untransformed), while controlling for (ii) “artificial media-derived” metabolites, that constitute a nuisance signal but are necessary for a complete development in our system.

      By “fruit-derived” and “insect-derived” metabolites, it is our understanding that Reviewer #3 means “fruit” metabolites (when in insects, untransformed “fruit-derived” metabolites) and “artificial medium-derived” metabolites. It is true that we do wish to avoid a predominance of “artificial medium-derived” metabolites and focus on “fruit-derived” metabolites in insects. We also want to note that it is of primary importance in our study to distinguish between “fruit” metabolites that are carried as is (“fruit” metabolites present in insects, ie untransformed “fruit-derived” metabolites), and “fruit” metabolites that are used after transformation by the insect (i.e., transformed “fruit-derived” metabolites).

      We agree with Reviewer #3 that the presence of “artificial medium-derived” metabolites could be problematic in direct comparisons of fruits and insects (and not among fruits or among insects’ comparisons).

      However, we took some steps to avoid such problems:

      1. We included control fly samples in our experiment: at each experimental generation, flies developed only on artificial medium (without fruit puree) were collected and processed simultaneously with flies that developed on fruit media. Results using these artificial medium-reared flies as controls (by subtracting their ions levels and removing ions that were similar, respective of their generation) were similar to results using raw data and conclusions were identical (see below).

      2. We lowered the proportion of artificial medium in our fruit media so that it was kept to a minimum, compatible with larval development and adult survival.

      Consistent with the low impact of this “artificial medium” component on our conclusions, we also wish to point out the presence pattern of metabolites found only in flies and never in fruits when using raw data (Figure 3, yellow stack). Even in the most conservative hypothesis of 100% of these metabolites originating from our artificial medium (which is probably not the case), we observe that it constitutes only a minor proportion of metabolites common to all flies (15.7%).

      For your consideration, we include below the main Figures, using both raw data and artificial medium-controlled:

      Figure 2, left = raw data; right = artificial-media controlled:

      Figure 3, left = raw data; right = artificial-media controlled:

      Figure 3S1, left = raw data; right = artificial-media controlled:

      Figure 4, above = raw data; below = artificial-media controlled:

      We hope that we convinced the Editor/Reviewers that raw data and artificial-medium controlled data provide a single and same answer to all our analyses. We chose to present only raw data, to simplify the Materials & Methods section.

      We however modified the current version of the manuscript to inform the reader that proper controls were done and that their inclusion do not modify any of our conclusions (l.110-113 and l.583-589).

      We also wish to point out two additional comments:

      • As Reviewer #1 also recommended, we modified the expectations drawn in Fig1G to better consider the general comment of “insect derived” metabolites being fundamentally different from plant metabolites (even if we do show in our study that only approx. 9% of metabolites are private to flies).

      • The main part of our care in the use of this global PCA analysis is that it follows two other analyses (global intersection and comparison of intersections among fruits and among flies) and precedes another one (fly-focused PCA). We hope that all these analyses help the readers get a comprehensive overview of the dataset and associated results, avoiding reliance on a single analysis.

      • We also help readers to explore and visualize all analyses presented in our manuscript by setting up a shiny application (in addition to our available dataset and R code), at https://fruitfliesmetabo.shinyapps.io/shiny/. This is now mentioned in the main text (l.588-589).

      We thank the Reviewer for their comment that greatly improved the manuscript.

      The authors generated a qualitative dataset using the peak list produced by XCMS which contains quantitative peak areas, it is unclear how the threshold was selected to determine if a peak is present or absent in a given sample. The qualitative dataset would influence the output of their data analysis.

      The referee is right in pointing out that the threshold used to determine if a peak is present or absent in a given sample was not clearly specified. This has now been corrected in the “Host use” section of the Materials & Methods (l.513-516). Briefly, a given replicate of a compound was considered present if the corresponding peak area following XCMS quantification was > 1000. This threshold was selected to be close to the practical quantification threshold of the Thermo Exactive mass spectrometer used in this study. This threshold was selected in order to allow the quantification of low-abundance compounds, as many plant-derived diet compounds were expected to be present in trace amounts in flies. We additionally applied a stringent rule for presence of any given compound (presence in at least 3 biological replicates).

      The authors reply on in-source fragmentation for peak annotation when authentic standards are not available. The accuracy of the annotation thus requires further validation.

      The Supplementary Table 1 was unfortunately omitted in the first submission of the manuscript. This oversight has been now corrected and the Supplementary Table 1 details all information used for metabolite annotation. In particular, MS/MS data comparison with mass spectral databases as well as with published literature have been added to substantiate metabolite identifications. This MS/MS data was produced thanks to the comment of the Reviewer. We also provide four more annotations from standards to attain 30 / 71 identifications validated through chemical standards.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript elegantly demonstrates that the degradation of PTPN14 by human papillomavirus (HPV) 16 and 18 E7 proteins previously reported by the authors is essential for E7-mediated YAP1 activation. This is important for E7-mediated maintenance of basal cell state and presumably persistence of HPV infection. The authors use a series of innovative tissue models combined with validation in clinical samples to demonstrate the importance of YAP1 activation in high-risk HPV pathogenesis.

      The data are of high quality with excellent controls. The manuscript is well-written and the rationale of each experiment easy to follow. In general the results support the authors conclusions. I have the following suggestion to improve the manuscript: The enhanced nuclear expression of YAP in the basal cells of epithelia expressing HPV16/18 E7 is difficult to see in the low resolution IF images shown. The magnified images do show enhanced expression compared to HFK cultures, but to remove any bias in selection of enhanced areas, could the authors include quantification of the distribution of IF signal in the basal cells, compared to the suprabasal cells, of the epithelia shown with statistical analysis? Figure 2 would also benefit from quantification as described above.

      We appreciate the positive feedback and constructive suggestions from Reviewer #1. We used widefield images with the goal of presenting as many cells in organotypic cultures as possible, but at low magnification. We have further analyzed the imaging data and updated the manuscript as follows:

      1) We assessed YAP1 intensity in basal and suprabasal layers as suggested by the reviewer. Consistent with literature reports, YAP1 is expressed predominantly in basal cells in each of our organotypic cultures, independent of E7 status (see figure below).

      2) Because YAP1 is always more highly expressed in basal cells than in suprabasal cells and YAP1 is regulated at the level of nuclear/cytoplasmic localization, we anticipated that quantification of YAP1 nuclear localization in our organotypic cultures may be more useful to readers than basal/suprabasal quantification.

      Consequently, we conducted classification-based analyses to quantify YAP1 nuclear localization (a surrogate for YAP1 activity) in the cultures. Each image to be analyzed was deidentified and assigned a coded name. Each cell in the basal layer was then classified as having either predominantly nuclear YAP1 staining, predominantly cytoplasmic YAP1 staining, or YAP1 staining that is comparably distributed between the nucleus and cytoplasm. At least three fields were analyzed per raft. We assessed YAP1 localization in 8,323 cells (average 378.3 cells/culture shown in the text for almost all cultures). The quantifications are now included in Figure 1-figure supplement 2C-E, Figure 1-Figure supplement 5A-C, and Figure 2-figure supplement 1D-F.

      The new quantifications do not change our interpretations of the results nor our conclusion that HPV E7 degrades PTPN14 to activate YAP1 in basal cells. We noted that HPV E6 may promote YAP1 nuclear localization to some degree and have updated the text accordingly.

      Reviewer #2 (Public Review):

      Strengths: A major strength of this report is the use of several different technical approaches, the results from which converge to provide several types of data supporting their conclusions. These various techniques include genetic knockdown/overexpression in primary keratinocytes, organotypic raft cultures, laser-capture microdissection, cell fate monitoring assays, and analysis of publicly available datasets. The manuscript is well-written and the figures are well-made. Weaknesses: Overall, there are only a few minor weaknesses related to figure quality and presentation (which will be conveyed in the private recommendations to the authors).

      We appreciate the positive feedback and these thoughtful comments from reviewer #2.

      Are claims/conclusions justified by data? Overall, the authors' conclusions are adequately justified by the data. However, there were a few interpretations I felt were somewhat overstated given the experiments performed and data provided. 1. The first issue relates to the interpretation/conclusion of the results from experiments analyzing basal cell number. In Figure 2, the basal cell number was indeed reduced in R84S compared to WT E7. However, it was not reduced to parental HFK levels, suggesting other E7 activities are involved in increasing basal cell number. A similar observation is presented in Figure 7 (E-F), where the R84S E7 mutant still had significantly higher basal cell retention than the empty vector control, albeit lower than WT E7. While their data certainly indicates that the binding and subsequent degradation of PTPN14 is an E7 function important to increasing basal cell number and retention, there are clearly other E7 functions involved. While the authors don't necessarily overinterpret these findings, the possibility that other E7 functions are involved is not explicitly acknowledged or explored in the Discussion.

      Indeed, cells expressing HPV18 E7 R84S retain some capacity to increase basal cell number (Figure 2) and promote basal cell retention (Figure 7). It is possible that an activity of HPV E7 in addition to PTPN14 degradation influences these phenotypes. HPV18 E7 R84S retains the capacity to bind and degrade RB1 (Hatterschide et al., 2020). The basal cells in the HPV18 E7 R84S cell fate experiment were predominantly found in clusters indicative of possible clonal expansion. We hypothesize that such clusters reflect proliferation induced by RB1 inactivation and cause the ratio of basal to suprabasal cells to remain high even in the R84S mutant condition. Our hypothesis is now described in further detail in the text.

      1. The second issue pertains to the findings related to the effect on differentiation upon modulation of key Hippo pathway components (Figure 4). It does not appear that the authors performed these studies in the presence of any well-known stimuli that induce the differentiation process in keratinocytes grown in 2D culture (high calcium, high serum, etc) nor did they use these cells in organotypic rafts wherein differentiation occurs during the raft stratification process. This is particularly true in the studies exploring PTPN14 plus LATS1/2 silencing and the effect on repression of keratinocyte differentiation. Whereas it seems PTPN14 itself was serving as the differentiation stimuli in earlier experiments (Figure 4C/D), it does not appear any differentiation stimuli were provided in the experiments shown in Figures 4E-I. For these reasons, the interpretation drawn by the authors that "...inactivation of three different YAP1 inhibitors dampens differentiation gene expression" (Line 220-221) and "inactivation of LATS1 or LATS2...also repressed differentiation genes" (Lines 349-350) seems specific to endogenous levels of differentiation genes. It seems difficult to conclude that inactivation of the Hippo pathway is actively repressing the induction of differentiation if the cells are not being treated with stimuli to induce differentiation.

      Indeed, no differentiation stimuli were used in these experiments. We previously observed that PTPN14 knockout or E7 expression reduced differentiation gene expression both in undifferentiated cells and in cells stimulated to differentiate (Hatterschide et al., 2019, 2020). We anticipate that gene expression in unstimulated cells is reflective of gene expression in cells stimulated to differentiate. We altered the results and discussion text to emphasize that the experiment measures differentiation gene expression in unstimulated cells.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper provides experimental and modeling analysis of the inter-brain coupling of socially interacting bats, and reports that coordinated brain activity evolves at a slower time scale than the activity describing the differences. Specifically, the paper finds that there is an attracting submanifold corresponding to the mean (or "common mode") of neural activity, and that the dynamics in the orthogonal eigenmode, corresponding to the difference in brain activity, decays rapidly. These rapid decays in the difference mode are referred to as "catch up" activity.

      There are two main findings:

      1) Neural activity (especially higher frequency LFP activity in the 30-150Hz range) is modulated by social context. Specifically, the ratio of the averaged, moment-to-moment MEAN:DIFF ratio is much higher when the bats are in a single chamber, clearly indicating that the animals are coordinating their neural activity. This change also seems to hold -- although not as striking -- in lower-frequency LFP and spiking activity.

      2) The time scales of the mean vs. difference dynamics are segregated: the "difference dynamics" evolve at a faster time scale than "similarity dynamics", seems to be well supported.

      The basic finding is presented in Figure 1. The rest of the paper is focused on a modeling study to garner further insight into the dynamics.

      Weaknesses:

      This is an entirely phenomenological paper, and while it claims to garner "mechanistic insight", it is unclear what that means.

      We regret not clarifying sufficiently what we meant by “mechanistic insight.” The insight is the following: functional across-brain coupling acts as positive feedback to the mean component of neural activity, which amplifies it and slows it down; at the same time, it acts as negative feedback to the difference component, which suppresses it and speeds it up. Thus, findings (1) and (2) in the reviewer’s summary above can be explained by the same model mechanism. As the reviewer pointed out below, the details of the model are complex, which could have made the simple mechanism above opaque. Thus, we analyzed two simplified versions of the model to make the mechanistic insight clear. This is detailed below in our response to the reviewer’s comment on model complexity.

      The basic idea of the model is simple and somewhat interesting, but the details are extremely complex. There are many examples of this, but the method used to "regress out" the behavior was very hard to interpret.

      The method for regressing out behavior was described in Materials and Methods section 3.10, and we regret having neglected to reference it in the main text. We now reference it at the first instance in the main text where this is relevant.

      On the face of it, the model is extremely simple: a two-state linear dynamical system. However, this simplistic description buries extreme complexity. The model is extremely complex as involves a large number of parameters (e.g., time switching 'b' values, the values of which are completely unclear), the switching over time of these parameters based on hand-scored animal behavioral state, and the complex mix of markovian and linear dynamical systems theoretic results.

      As the reviewer pointed out, the core of the model is very simple: a linear dynamical system that models neural activity coupling. The model mechanism of positive and negative feedback, which is responsible for reproducing the two experimental results summarized by the reviewer above, is contained in this core (see Materials and Methods section 3.7 for details). On top of this, the model has a layer of complexity, involving a Markov chain model of behavior and a large number of behavioral parameters. This layer of complexity is independent from the feedback mechanism of the core of the model. Thus, while it makes the model more biologically realistic, it is not required to reproduce the two main experimental results. To explicitly show this, and to better understand the dependence of model behavior on its parameters, we analyzed two reduced versions of the model. The first reduced model replaces the behavioral inputs with white noise. The original model is , where a is neural activity, , is the coupling matrix, b is behavioral modulation, and τ is a time constant. b is where the complexity lies, as it is simulated using a Markov chain and involves many parameters. To strip away this layer of complexity, we replaced b with noise having a simple structure, namely, the mean and difference components of b having identical, flat power spectra. Importantly, this noise input does not induce correlation between bats, and it amounts to inputs of the same magnitude and same timescales to the mean and difference components of a. The resulting reduced model has only two parameters, the functional self-coupling C_S and functional across-brain coupling C_I (for simplicity, τ can be absorbed into the other parameters). We are interested in the two results the reviewer summarized above: (1) the mean component of neural activity having a larger variance than the difference component; (2) the mean component having a slow timescale than the difference component. In the manuscript, these are respectively quantified using the variance ratio and the power spectral centroid ratio of the mean and difference components. The reduced model allowed us to derive analytical expressions for these two quantities (see Materials and Methods section 3.8 for details). We found that they have very simple dependence on the functional coupling parameters: the variance ratio (mean variance divided by difference variance) is approximately , and the centroid ratio (mean centroid divided by difference centroid) is approximately .

      This parameter dependence is visualized below (note that the color maps are in log scale, and the white spaces are regions where the model is unstable).

      In the experimental data, the mean component had larger variance and lower power spectral centroid than the difference component. This corresponds to the parameter regime of (enclosed by dashed lines). Thus, a positive C_I acts as positive feedback to the mean component and negative feedback to the difference component, modulating their variance and timescales in opposite directions. This is consistent with the analysis of the original model in Materials and Methods section 3.7. In the revised manuscript, we’ve now added analysis of this reduced model to the Results section, and the above figure has been added as Figure 3I-J.

      The reviewer has stated a concern regarding the large number of parameters that set the input level according to behavioral state (b_resting, b_(social grooming), b_fighting, etc.). These parameters are important for ensuring that the model outputs realistic levels of behaviorally modulated neural activity (discussed below in our reply regarding model fit), but they are not important for the main results on variance and timescales. To demonstrate this, we studied a second reduced model. This model is identical to our original model except that, for each simulation, each of the behavior parameters (b_fighting, etc.) was independently drawn from the uniform distribution from 0 to 1. Despite the completely random behavioral parameters, this reduced model reproduces the variance and timescales results just like the original model, as shown in the figure below (compare with Figure 3E-F).

      To summarize, the reduced models allowed us to identify the simple parameter dependence of the modeling results, and showed that the simple linear dynamical system at the core of the original model is sufficient to reproduce the two main experimental observations.

      Indeed, a fundamental weakness of the model is that the Markov chain is taken as an "input" to the 2-state linear systems model, as if somehow the neural state does not affect the state transitions.

      Yes, this is a limitation of our model. We’ve added a discussion of this limitation, as well as future directions for overcoming it, in the Discussion section. The reason we did not model neural control of behavioral transitions is that it is under-constrained by existing data. While the brain obviously controls behaviors, not every part of the brain controls every behavior. Of the 11 behaviors observed in this study, we do not know which of them is controlled by the bat frontal cortex, and we do not know how they might be controlled (i.e., what specific spatiotemporal activity patterns affects behaviors in what ways). Without this knowledge, it’s unclear how to implement neural control of behavior in the model. This knowledge requires perturbation studies (lesion, inactivation, or activity manipulation) to establish casual relationships from neural activity to specific behaviors in the bat, which will be an important future direction.

      On the other hand, as the reviewer stated, our model included behavioral modulation of neural activity. It is well known that in mammals, arousal and movement modulate neural activity globally across cortex (McGinley et al., 2015, Neuron). Thus, given that different behaviors in general involve different levels of arousal and movement, our model included behavior-dependent modulation of frontal cortical neural activity. Finally, for the reviewer’s convenience, we also quote below the paragraph addressing this issue in the revised Discussion. “Another limitation of our model is the “open-loop” nature of the relationship between behavior and neural activity. Specifically, we modeled neural activity as being modulated by behavior, but behavior was modeled using a Markov chain that is independent from the neural activity. In reality, neural activity and behavior form a closed-loop, with different social behaviors being controlled by the neural activity of specific neural populations in specific brain regions. Thus, an important future direction is to close the loop by incorporating neural control of social behaviors into models of the inter-brain relationship in bats. This will require future experimental studies to identify which frontal cortical regions and populations in bats are necessary or sufficient to control social behaviors, as well as the detailed causal relationship from neural activity to social behavior. Furthermore, as social interactions can occur at multiple timescales, it will be interesting to investigate how these are controlled by neural activity at different timescales, and how those timescales are shaped by functional across-brain coupling. In summary, such a closed-loop model will shed light on how inter-brain activity patterns and dynamic social interactions co-evolve and feedback onto each other.”

      Further, the Markov assumption is not rigorously tested.

      We have now tested the Markov assumption, using the following methods. We compared three models of bat behaviors: (1) the independent model, where the behavioral state at a given time point is independent from the state at other time points; (2) the 1st-order dependency model, where the behavioral state at a given time point depends on the state at the previous time point only; (3) the 2nd-order dependency model, where the behavioral state at a given time point depends on the states at the two previous time points. The Markov assumption corresponds to model (2), which is used as a part of the main model of the paper. Note that models with longer time-dependencies (≥3) were not tested because the number of parameters grows exponentially with model order and our dataset is not large enough to fit them.

      To compare the three models, we split the behavioral data into a training set and a test set, fitted each model on the training set (Laplace smoothing was used to avoid assigning zero probability to unobserved events), and calculated the log-likelihood of the test set under each model. The figure below shows the cross-validated likelihoods for the behavioral data of one-chamber (A) and two-chambers (B) sessions, which were fitted separately; circles and error bars are means and standard deviations across 100 random splits of the data into training and test sets.

      As the figure above shows, the 1st-order model had the highest likelihood on average. This does not necessarily prove that bat behavior obeys the Markov assumption (if we had a lot more data, we might be able to fit better 2nd-order and higher-order models). But this does mean that, given the amount of data we have, the best model that we can fit is the 1st-order Markov chain. Thus, this result supports our usage of the Markov chain in the main model of the paper. In the revised manuscript, the above figure is included as Figure 3—figure supplement 2A-B, and the analysis is described in Materials and Methods section 3.5.

      No model selecting or other model validation appears to be done.

      To evaluate model fit, we simulated our model using experimentally observed behaviors (rather than simulating behaviors using a Markov chain), and compared the simulated neural activity with the experimentally observed activity (see Materials and Methods section 3.6 for detailed procedures). The comparison for an example experimental session is shown below, where we’ve plotted the experimentally observed neural activity and behaviors for bat 1 (A) and bat 2 (B), along with the simulated neural activity. The correlation coefficient between data and model are indicated above each plot. These are representative examples, as the average correlation over all sessions and bats is 0.72 (standard deviation is 0.10). This figure was added to the revised manuscript as Figure 3—figure supplement 1.

      In evaluating model fit, we realized that the model in the original manuscript produced outputs with a DC offset different from that of the data. Thus, in the revised manuscript (including the figure above), we added one more behavioral parameter (b_constant) that adjusts the DC offset, which is a parameter that reflects the effect of a baseline arousal level on neural activity (Materials and Methods section 3.4). Note that, since the only effect of this parameter is to adjust the DC offset of neural activity, it does not change any of the results in the paper.

      In short, the model, while very interesting, is so complex that it is literally impossible to evaluate. The authors report literally no shortcomings of their model. They do not report parameter estimation methods. They do not report fitting errors or other model validation metrics. The only evaluation is whether it can produce certain outputs that are similar to biological data. While the latter is certainly important, all models are wrong, and it essential to have a model simple enough to understand, both in terms of how it works and how it fails.

      The comments on the complexity of the model and on fitting errors have been addressed above. Regarding parameter estimation methods, they were described in Materials and Methods section 3.14, and we regret having neglected to directly reference it in the original manuscript. We now reference the section in the legend of Figure 3A which is the first place to introduce the parameters. Briefly, the behavioral parameters (b_resting, b_fighting, etc.) were simply chosen to be the average neural activity during the respective behaviors from the data; the other parameters were chosen by hand to roughly match the levels of activity from the data, keeping within the parameter regime of identified from the analyses. As we showed above, these parameters provide a reasonable fit to the data.

      The reason we chose the parameters heuristically in this way, rather than by minimizing some error objective, is the following. Our goal was to build a model that could qualitatively reproduce the experimental findings in a robust manner, that is, without fine-tuning of parameters. Thus, we analyzed the model to understand how model behaviors depend on the parameters, and to identify the parameter regime that reproduces the qualitative trends seen in the data (Figure 3I-J; Materials and Methods sections 3.7 and 3.8). Guided by these analyses, we chose parameters heuristically without algorithmic fine-tuning.

      Finally, following suggestions from reviewer 1 and reviewer 3, we have added discussions of shortcomings of the models (the last two paragraphs of the Discussion). With these discussions of model limitations, along with the presentation of simple insights into model mechanism from the reduced models above, we believe we have now presented a model that is “simple enough to understand, both in terms of how it works and how it fails.”

      In general, while the basic finding is fairly interesting, and the experiments and their findings are highly relevant to the field, the modeling and its explication fall short.

      It is not that it is wrong or bad; however, it is not clear that such a complex model increases our understanding beyond the experimental findings in Figure 1, and if it does, there has to be a major caveat that the model itself is not carefully vetted.

      Based on the reviewer’s comments on the model’s complexity, we have analyzed reduced versions of the model to understand its simple underlying mechanisms, as described above. This goes beyond the experimental findings in Figure 1, as it provides a computational mechanism that could give rise to those experimental findings. Moreover, based on the reviewer’s comments, we have more carefully vetted the model, by evaluating model fit and testing different behavioral models that assume or doesn’t assume the Markov property. Finally, we now discuss caveats of the model in the Discussion section, including the open-loop nature of the model as pointed out by the reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, the science is sound and interesting, and the results are clearly presented. However, the paper falls in-between describing a novel method and studying biology. As a consequence, it is a bit difficult to grasp the general flow, central story and focus point. The study does uncover several interesting phenomena, but none are really studied in much detail and the novel biological insight is therefore a bit limited and lost in the abundance of observations. Several interesting novel interactions are uncovered, in particular for the SPS sensor and GAPDH paralogs, but these are not followed up on in much detail. The same can be said for the more general observations, eg the fact that different types of mutations (missense vs nonsense) in different types of genes (essential vs non-essential, housekeeping vs. stress-regulated...) cause different effects.

      This is not to say that the paper has no merit - far from it even. But, in its current form, it is a bit chaotic. Maybe there is simply too much in the paper? To me, it would already help if the authors would explicitly state that the paper is a "methods" paper that describes a novel technique for studying the effects of mutations on protein abundance, and then goes on to demonstrate the possibilities of the technology by giving a few examples of the phenomena that can be studied. The discussion section ends in this way, but it may be helpful if this was moved to the end of the introduction.

      We modified the manuscript as suggested.

      Reviewer #2 (Public Review):

      Schubert et al. describe a new pooled screening strategy that combines protein abundance measurements of 11 proteins determined via FACS with genome-wide mutagenesis of stop codons and missense mutations (achieved via a base editor) in yeast. The method allows to identify genetic perturbations that affect steady state protein levels (vs transcript abundance), and in this way define regulators of protein abundance. The authors find that perturbation of essential genes more often alters protein abundance than of nonessential genes and proteins with core cellular functions more often decrease in abundance in response to genetic perturbations than stress proteins. Genes whose knockouts affected the level of several of the 11 proteins were enriched in protein biosynthetic processes while genes whose knockouts affected specific proteins were enriched for functions in transcriptional regulation. The authors also leverage the dataset to confirm known and identify new regulatory relationships, such as a link between the SDS amino acid sensor and the stress response gene Yhb1 or between Ras/PKA signalling and GAPDH isoenzymes Tdh1, 2, and 3. In addition, the paper contains a section on benchmarking of the base editor in yeast, where it has not been used before.

      Strengths and weaknesses of the paper

      The authors establish the BE3 base editor as a screening tool in S. cerevisiae and very thoroughly benchmark its functionality for single edits and in different screening formats (fitness and FACS screening). This will be very beneficial for the yeast community.

      The strategy established here allows measuring the effect of genetic perturbations on protein abundances in highly complex libraries. This complements capabilities for measuring effects of genetic perturbations on transcript levels, which is important as for some proteins mRNA and protein levels do not correlate well. The ability to measure proteins directly therefore promises to close an important gap in determining all their regulatory inputs. The strategy is furthermore broadly applicable beyond the current study. All experimental procedures are very well described and plasmids and scripts are openly shared, maximizing utility for the community.

      There is a good balance between global analyses aimed at characterizing properties of the regulatory network and more detailed analyses of interesting new regulatory relationships. Some of the key conclusions are further supported by additional experimental evidence, which includes re-making specific mutations and confirming their effects on protein levels by mass spectrometry.

      The conclusions of the paper are mostly well supported, but I am missing some analyses on reproducibility and potential confounders and some of the data analysis steps should be clarified.

      The paper starts on the premise that measuring protein levels will identify regulators and regulatory principles that would not be found by measuring transcripts, but since the findings are not discussed in light of studies looking at mRNA levels it is unclear how the current study extends knowledge regarding the regulatory inputs of each protein.

      See response to Comment #10.

      Specific comments regarding data analysis, reproducibility, confounders

      1) The authors use the number of unique barcodes per guide RNA rather than barcode counts to determine fold-changes. For reliable fold changes the number of unique barcodes per gRNA should then ideally be in the 100s for each guide, is that the case? It would also be important to show the distribution of the number of barcodes per gRNA and their abundances determined from read counts. I could imagine that if the distribution of barcodes per gRNA or the abundance of these barcodes is highly skewed (particularly if there are many barcodes with only few reads) that could lead to spurious differences in unique barcode number between the high and low fluorescence pool. I imagine some skew is present as is normal in pooled library experiments. The fold-changes in the control pools could show whether spurious differences are a problem, but it is not clear to me if and how these controls are used in the protein screen.

      Because of the large number of screens performed in this study (11 proteins, with 8 replicates for each) we had to trade off sequencing depth and power against cell sorting time and sequencing cost, resulting in lower read and barcode numbers than what might be ideally aimed for. As described further in the response to Comment #5, we added a new figure to the manuscript that shows that the correlation of fold-changes between replicates is high (Figure 3–S1A). The second figure below shows that the correlation between the number of unique barcodes and the number of reads per gRNA is highly significant (p < 2.2e-16).

      2) I like the idea of using an additional barcode (plasmid barcode) to distinguish between different cells with the same gRNA - this would directly allow to assess variability and serve as a sort of replicate within replicate. However, this information is not leveraged in the analysis. It would be nice to see an analysis of how well the different plasmid barcodes tagging the same gRNA agree (for fitness and protein abundance), to show how reproducible and reliable the findings are.

      We agree with the reviewer that this would be nice to do in principle, but our sequencing depth for the sorted cell populations was not high enough to compare the same barcode across the low/unsorted/high samples. See also our response to Comment #5 for the replicate analyses.

      3) From Fig 1 and previous research on base editors it is clear that mutation outcomes are often heterogeneous for the same gRNA and comprise a substantial fraction of wild-type alleles, alleles where only part of the Cs in the target window or where Cs outside the target window are edited, and non C-to-T edits. How does this reflect on the variability of phenotypic measurements, given that any barcode represents a genetically heterogeneous population of cells rather than a specific genotype? This would be important information for anyone planning to use the base editor in future.

      We agree with the reviewer that the heterogeneity of editing outcomes is an important point to keep in mind when working with base editors. In genetic screens, like the ones described here, often the individual edit is less important, and the overall effects of the base editor are specific/localized enough to obtain insights into the effects of mutations in the area where the gRNA targets the genome. For example, in our test screens for Canavanine resistance and fitness effects, in which we used gRNAs predicted to introduce stop codons into the CAN1 gene and into essential genes, respectively, we see the expected loss-of-function effect for a majority of the gRNAs (canavanine screen: expected effect for 67% of all gRNAs introducing stop codons into CAN1; fitness screen: expected effect for 59% of all gRNAs introducing stop codons into essential genes) (Figure 2). In the canavanine screen, we also see that gRNAs predicted to introduce missense mutations at highly conserved residues are more likely to lead to a loss-of-function effect than gRNAs predicted to introduce missense mutations at less conserved residues, further highlighting the differentiated results that can be obtained with the base editor despite the heterogeneity in editing outcomes overall. We would certainly advise anyone to confirm by sequencing the base edits in individual mutants whenever a precise mutation is desired, as we did in this study when following up on selected findings with individual mutants.

      4) How common are additional mutations in the genome of these cells and could they confound the measured effects? I can think of several sources of additional mutations, such as off-target editing, edits outside the target window, or when 2 gRNA plasmids are present in the same cell (both target windows obtain edits). Could some of these events explain the discrepancy in phenotype for two gRNAs that should make the same mutation (Fig S4)? Even though BE3 has been described in mammalian cells, an off-target analysis would be desirable as there can be substantial differences in off-target behavior between cell types and organisms.

      Generally, we are not very concerned about random off-target activity of the base editor because we would not expect this to cause a consistent signal that would be picked up in our screen as a significant effect of a particular gRNA. Reproducible off-target editing with a specific gRNA at a site other than the intended target site would be problematic, though. We limited the chance of this happening by not using gRNAs that may target similar sequences to the intended target site in the genome. Specifically, we excluded gRNAs that have more than one target in the genome when the 12 nucleotides in the seed region (directly upstream of the PAM site) are considered (DiCarlo et al., Nucleic Acids Research, 2013).

      We do observe some off-target editing right outside the target window, but generally at much lower frequency than the on-target editing in the target window (Figure 1B and Figure 1–S2). Since for most of our analyses we grouped perturbations per gene, such off-target edits should not affect our findings. In addition, we validated key findings with independent experiments. For our study, we used the Base Editor v3 (Komor et al., Nature, 2016); more recently, additional base editors have been developed that show improved accuracy and efficiency, and we would recommend these base editors when starting a new study (see, e.g., Anzalone et al., Nature Biotechnology, 2020).

      We are not concerned about cases in which one cell gets two gRNAs, since the chance that the same two gRNAs end up in one cell repeatedly is low, and such events would therefore not result in a significant signal in our screens.

      We don’t think that off-target mutations can explain the discrepancy between pairs of gRNAs that should introduce the same mutation (Figure 3–S1. The effect of the two gRNAs is actually well-correlated, but, often, one of the two gRNAs doesn’t pass our significance cut-off or simply doesn’t edit efficiently (i.e., most discrepancies arise from false negatives rather than false positives). We may therefore miss the effects of some mutations, but we are unlikely to draw erroneous conclusions from significant signals.

      5) In the protein screen normalization uses the total unique barcode counts. Does this efficiently correct for differences from sequencing (rather than total read counts or other methods)? It would be nice to see some replicate plots for the analysis of the fitness as well as the protein screen to be able to judge that.

      We made a new figure that shows a replicate comparison for the protein screen (see below; in the manuscript it is Figure 3–S1A) and commented on it in the manuscript. For this analysis, the eight replicates for each protein were split into two groups of four replicates each and analyzed the same way as the eight replicates. The correlation between the two groups of replicates is highly significant (p < 2.2e-16). The second figure shows that the total number of reads and the total number of unique barcodes are well correlated.

      For the fitness screen, we used read counts rather than barcode counts for the analysis since read counts better reflect the dropout of cells due to reduced fitness. The figure below shows a replicate comparison for the fitness screen. For this analysis, the four replicates were split into two groups of two replicates each and analyzed the same way as the four replicates. The correlation between the two groups of replicates is highly significant (p < 2.2e-16).

      6) In the main text the authors mention very high agreement between gRNAs introducing the same mutation but this is only based on 20 or so gRNA pairs; for many more pairs that introduce the same mutation only one reaches significance, and the correlation in their effects is lower (Fig S4). It would be better to reflect this in the text directly rather than exclusively in the supplementary information.

      We clarified this in the manuscript main text: “For 78 of these gRNA pairs, at least one gRNA had a significant effect (FDR < 0.05) on at least one of the eleven proteins; their effects were highly correlated (Pearson’s R2 = 0.43, p < 2.2E-16) (Figure 3–S1B). For the 20 gRNA pairs for which both gRNAs had a significant effect, the correlation was even higher (Pearson’s R2 = 0.819, p = 8.8e-13) (Figure 3–S1C). These findings show that the significant gRNA effects that we identify have a low false positive rate, but they also suggest that many real gRNA effects are not detected in the screen due to limitations in statistical power.”

      7) When the different gRNAs for a targeted gene are combined, instead of using an averaged measure of their effects the authors use the largest fold-change. This seems not ideal to me as it is sensitive to outliers (experimental error or background mutations present in that strain).

      We agree that the method we used is more sensitive to outliers than averaging per gene. However, because many gRNAs have no effect either because they are not editing efficiently or because the edit doesn’t have a phenotypic consequence, an averaging method across all gRNAs targeting the same gene would be too conservative and not properly capture the effect of a perturbation of that gene.

      8) Phenotyping is performed directly after editing, when the base editor is still present in the cells and could still interact with target sites. I could imagine this could lead to reduced levels of the proteins targeted for mutagenesis as it could act like a CRISPRi transcriptional roadblock. Could this enhance some of the effects or alter them in case of some missense mutations?

      To reduce potential “CRISPRi-like” effects of the base editor on gene expression, we placed the base editor under a galactose-inducible promoter. For both the fitness and protein screens we grew the cultures in media without galactose for another 24 hours (fitness screen) or 8-9 hours (protein screens) before sampling. In the latter case, this recovery time corresponded to more than three cell divisions, after which we assume base editor levels to have strongly decreased, and therefore to no longer interfere with transcription. This is also supported by our ability to detect discordant effects of gRNAs targeting the same gene (e.g., the two mutations leading to loss-of-function and gain-of-function of RAS2), which would otherwise be overshadowed by a CRISPRi effect.

      9) I feel that the main text does not reflect the actual editing efficiency very well (the main numbers I noticed were 95% C to T conversion and 89% of these occurring in a specific window). More informative for interpreting the results would be to know what fraction of the alleles show an edit (vs wild-type) and how many show the 'complete' edit (as the authors assume 100% of the genotypes generated by a gRNA to be conversion of all Cs to Ts in the target window). It would be important to state in the main text how variable this is for different gRNAs and what the typical purity of editing outcomes is.

      We now show the editing efficiency and purity in a new figure (Figure 1B), and discuss it in the main text as follows: “We found that the target window and mutagenesis pattern are very similar to those described in human cells: 95% of edits are C-to-T transitions, and 89% of these occurred in a five-nucleotide window 13 to 17 base pairs upstream of the PAM sequence (Figure 1A; Figure 1–S2) (Komor et al., 2016). Editing efficiency was variable across the eight gRNAs and ranged from 4% to 64% if considering only cases where all Cs in the window are edited; percentages are higher if incomplete edits are considered, too (Figure 1B).”

      Comments regarding findings

      10) It would be nice to see a comparison of the results to the effects of ~1500 yeast gene knockouts on cellular transcriptomes (https://doi.org/10.1016/j.cell.2014.02.054). This would show where the current study extends established knowledge regarding the regulatory inputs of each protein and highlight the importance of directly measuring protein levels. This would be particularly interesting for proteins whose abundance cannot be predicted well from mRNA abundance.

      We agree with the reviewer that it would be very interesting to compare the effect of perturbations on mRNA vs protein levels. We have compared our protein-level data to mRNA-level data from Kemmeren and colleagues (Kemmeren et al., Cell 2014), and we find very good agreement between the effects of gene perturbations on mRNA and protein levels when considering only genes with q < 0.05 and Log2FC > 0.5 in both studies (Pearson’s R = 0.79, p < 5.3e-15).

      Gene perturbations with effects detected only on mRNA but not protein levels are enriched in genes with a role in “chromatin organization” (FDR = 0.01; as a background for the analysis, only the 1098 genes covered in both studies were considered). This suggests that perturbations of genes involved in chromatin organization tend to affect mRNA levels but are then buffered and do not lead to altered protein levels. There was no enrichment of functional annotations among gene perturbations with effects on protein levels but not mRNA levels.

      We did not include these results in the manuscript because there are some limitations to the conclusions that can be drawn from these comparisons, including that our study has a relatively high number of false negatives, and that the genes perturbed in the Kemmeren et al. study were selected to play a role in gene regulation, meaning that differences in mRNA-vs-protein effects of perturbations are limited to this function, and other gene functions cannot be assessed.

      11) The finding that genes that affect only one or two proteins are enriched for roles in transcriptional regulation could be a consequence of 'only' looking at 10 proteins rather than a globally valid conclusion. Particularly as the 10 proteins were selected for diverse functions that are subject to distinct regulatory cascades. ('only' because I appreciate this was a lot of work.)

      We agree with this, and we think it is clear in the abstract and the main text of the manuscript that here we studied 11 proteins. We made this point also more explicit in the discussion, so that it is clear for readers that the findings are based on the 11 proteins and may not extrapolate to the entire yeast proteome.

      Reviewer #3 (Public Review):

      This manuscript presents two main contributions. First, the authors modified a CRISPR base editing system for use in an important model organism: budding yeast. Second, they demonstrate the utility of this system by using it to conduct an extremely high throughput study the effects of mutation on protein abundance. This study confirms known protein regulatory relationships and detects several important new ones. It also reveals trends in the type of mutations that influence protein abundances. Overall, the findings are of high significance and the method appears to be extremely useful. I found the conclusions to be justified by the data.

      One potential weakness is that some of the methods are not described in main body of the paper, so the reader has to really dive into the methods section to understand particular aspects of the study, for example, how the fitness competition was conducted.

      We expanded the first section for better readability.

      Another potential weakness is the comparison of this study (of protein abundances) to previous studies (of transcript abundances) was a little cursory, and left some open questions. For example, is it remarkable that the mutations affecting protein abundance are predominantly in genes involved in translation rather than transcription, or is this an expected result of a study focusing on protein levels?

      We thank the reviewer for pointing out that this paragraph requires more explanation. We expanded it as follows: “Of these 29 genes, 21 (72%) have roles in protein translation—more specifically, in ribosome biogenesis and tRNA metabolism (FDR < 8.0e-4, Figure 5C). In contrast, perturbations that affect the abundance of only one or two of the eleven proteins mostly occur in genes with roles in transcription (e.g., GO:0006351, FDR < 1.3e-5). Protein biosynthesis entails both transcription and translation, and these results suggest that perturbations of translational machinery alter protein abundance broadly, while perturbations of transcriptional machinery can tune the abundance of individual proteins. Thus, genes with post-transcriptional functions are more likely to appear as hubs in protein regulatory networks, whereas genes with transcriptional functions are likely to show fewer connections.”

      Overall, the strengths of this study far outweigh these weaknesses. This manuscript represents a very large amount of work and demonstrates important new insights into protein regulatory networks.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors seek to determine how various species combine their effects on the growth of a species of interest when part of the same community.

      To this end, the authors carry out an impressive experiment containing what I believe must be one of the largest pairwise + third-order co-culture experiments done to date, using a high-throughput co-culture system they had co-developed in previous work. The unprecedented nature of this data is a major strength of the paper. The authors also discover that species combine their effect through "dominance", i.e. the strongest effect masks the others. This is important as it calls into question the common assumption of additivity that is implicit in the choice of using Lotka-Volterra models.

      A stronger claim (i.e. in the abstract) is that joint effect of multiple species on the growth of another can be derived from the effect of individual species. Unless I am misunderstanding something, this statement may have to be qualified a little, as the authors show that a model based on pairwise dominance (i.e. the strongest pairwise) does a somewhat better job (lower RMSD, though granted, not by much, 0.57 vs 0.63) than a model based on single species dominance. This is, the effect of the strongest pair predicts better the effect of a trio than the effect of the larger species.

      This issue makes one wonder whether, had the authors included higher-order combinations of species (i.e. five-member consortia or higher), the strongest-effect trio would have predicted better than the strongest-effect pair, which in turn is better predictor than the strongest-effect species. This is important, as it would help one determine to what extent the strongest-effect model would work in more diverse communities, such as those one typically finds in nature. Indeed, the authors find that the predictive ability of the strongest effect species is much stronger for pairs than it is for trios (RMSD of 0.28 vs 0.63). Does the predictive ability of the single species model decline faster and faster as diversity grows beyond 4-member consortia?

      Thank you for raising this important point. It is true that in our study we see that single species predict pairs better than trios, and that pairs predict trios better than single species. As we did not perform experiments on more diverse communities (n>4), we are not sure if or how these rules will scale up. We explicitly address these caveats in our revised discussion.

      Reviewer #3 (Public Review):

      A problem in synthetic ecology is that one can't brute-force complex community design because combinatorics make it basically impossible to screen all possible communities from a bank of possible species. Therefore, we need a way to predict phenomena in complex communities from phenomena in simple communities. This paper aims to improve this predictive ability by comparing a few different simple models applied to a large dataset obtained with the use of the author's "kchip" microfluidics device. The main question they ask is whether the effect of two species on a focal species is predicted from the mean, the sum, or the max of the effect of each single "affecting" species on the focal species. They find that the max effect is often the best predictor, in the sense of minimizing the difference between predicted effect and measured effect. They also measure single-species trait data for their library of strains, including resource niche and antibiotic resistance, and then find that Pearson correlations between distance calculations generated from these metrics and the effect of added species are weak and unpredictive. This work is largely well-done, timely and likely to be of high interest to the field, as predicting ecosystem traits from species traits is a major research aim.

      My main criticism is that the main take-home from the paper (fig 3B)-that the strongest effect is the best predictor-is oversold. While it is true that, averaged over their six focal species, the "strongest effect" was the best overall predictor, when one looks at the species-specific data (S9), we see that it is not the best predictor for 1/3 of their focal species, and this fraction grows to 1/2 if one considers a difference in nRMSE of 0.01 to be negligible.

      As suggested, we have softened our language regarding the take-home message. This matter is addressed in detail above in response to 'Essential Revisions'. Briefly, we see that the strongest model works best when both single species have qualitatively similar effects, but is slightly less accurate when effects are mixed. We also see overall less accurate predictions for positive effects. In light of these findings, we propose that focal species for which the strongest model is not the most accurate is due to the interaction types, and not specific to the focal species.

      We made substantial changes to the manuscript, including the first paragraph of the discussion which more accurately describes these findings and emphasizes the relevant caveats:

      "By measuring thousands of simplified microbial communities, we quantified the effects of single species, pairs, and trios on multiple focal species. The most accurate model, overall and specifically when both single species effects were negative, was the strongest effect model. This is in stark contrast to models often used in antibiotic compound combinations, despite most effects being negative, where additivity is often the default model (Bollenbach 2015). The additive model performed well for mixed effects (i.e. one negative and one positive), but only slightly better than the strongest model, and poorly when both species had effects of the same sign. When both single species’ effects were positive, the strongest model was also the best, though the difference was less pronounced and all models performed worse for these interactions. This may be due to the small effect size seen with positive effects, as when we limited negative and mixed effects to a similar range of effects strength, their accuracy dropped to similar values (Figure 3–Figure supplement 5). We posit that the difference in accuracy across species is affected mainly by the effect type dominating different focal species' interactions, rather than by inherent species traits (Figure 3–Figure supplement 6)." (Lines 288-304)

      The same criticism applies to the result from figure 2-that pairs of affecting species have more negative effects than single species. Considered across all focal species this is true (though minor in effect size, Fig 2A). But there is only a significant effect within two individual species. Again, this points to the effects being focal-species-specific, and perhaps not as generalizable as is currently being claimed.

      Upon more rigorous analysis, and with regard to changes in the dataset after filtering, we see that the more accurate statement is that effects become stronger, not necessarily more negative (in line with the accuracy of the strongest model). The overall trend is towards more negative interactions, due to the majority of interactions being negative, but as stated this is not true for each individual focal. As such the following sentence in the manuscript has been changed:

      "The median effect on each focal was more negative by 0.28 on average, though the difference was not significant in all cases; additionally, focals with mostly positive single species interactions showed a small increase in median effect (Fig. 2D)" (Lines 151-154)

      As well as the title of this section: "Joint effects of species pairs tend to be stronger than those of individual affecting species" (Lines 127-128)

      Another thing that points to a focal-species-specific response is Fig 2D, which shows the distributions of responses of each focal species to pairs. Two of these distributions are unimodal, one appears bimodal, and three appear tri-modal. This suggests to me that the focal species respond in categorically different ways to species addition.

      We believe this distribution of pair effects is related to the distribution of single species effects, and not to the way in which different focal species respond to the addition of second species. Though this may be difficult to see from the swarm plots shown in the paper, below is a split violin plot that emphasizes this point.

      Fig R1: Distribution of single species and pair effects. Distribution of the effect of single and pairs of affecting species for each focal species individually. Dashed lines represent the median, while dotted lines the interquartile range.

      These differences occur even though the focal bacteria are all from the same family. This suggests to me that the generalizability may be even less when a more phylogenetically dispersed set of focal species are used.

      We have added the following sentence to the discussion explicitly emphasizing the phylogenetic limitations of our study:

      "Lastly, it is important to note that our focal species are all from the same order (Enterobacterales), which may also limit the purview of our findings." (Lines 364-366)

      Considering these points together, I argue that the conclusion should be shifted from "strongest effect is the best" to "in 3 of our focal species, strongest effect was the best, but this was not universal, and with only 6 focal species, we can't know if it will always be the best across a set of focal species".

      As mentioned above, we have softened our language regarding the take-home message in response to these evaluations.

      My second main criticism is that it is hard to understand exactly how the trait data were used to predict effects. It seems like it was just pearson correlation coefficients between interspecies niche distances (or antibiotic distances) and the effect. I'm not very surprised these correlations were unpredictive, because the underlying measurements don't seem to be relevant to the environment tested. What if, rather than using niche data across 20 nutrients, only the growth data on glucose (the carbon source in the experiments) was used? I understand that in a field experiment, for example, one might not know what resources are available, and so measuring niche across 20 resources may be the best thing to do. Here though it seems imperative to test using the most relevant data.

      It is true that much of the profiling data is not directly related to the experimental conditions (different carbon sources and antibiotics), but in addition to these we do use measurements from experiments carried out in the same environment as the interactions assays (i.e. growth rate and carrying capacity when growing on glucose), which also showed poor correlation with the effects on focals. Additionally, we believe that these profiles contain relevant information regarding metabolic similarity between species (similar to metabolic models often constructed computationally). To improve clarity, we added the following sentence to the figure legend of Figure 3–Figure supplement 1:

      "The growth rate, and maximum OD shown in panel A were measured only in M9 glucose, similar to conditions used in the interaction assays." (Lines 591-592)

      Additionally and relatedly, it would be valuable to show the scatterplots leading to the conclusion that trait data were uninformative. Pearson's r only works on an assumption of linearity. But there could be strong relationships between the trait data and effect that are monotonic but not linear, or even that are non-monotonic yet still strong (e.g. U-shaped). For the first case, I recommend switching to Spearman's rho over Pearson's r, because it only assumes monotonicity, not linearity. If there are observable relationships that are not monotonic, a different test should be used.

      Per your suggestion, we have changed the measurement of correlation in this analysis from Pearson's r, to Spearman's rho. As we observed similar, and still mostly weak correlations, we did not investigate these relationships further. See Figure 3–Figure supplement 1.

      Additionally, we generated heat maps including scatterplots mapping the data leading to these correlations. We found no notable dependency in these plots, and visually they were quite crowded and difficult to interpret. As this is not the central point of our study, we ultimately decided against adding this information to the plots.

      In general, I think the analyses using the trait data were too simplistic to conclude that the trait data are not predictive.

      We agree that more sophisticated analyses may help connect between species traits and their effects on focal species. In fact, other members of our research group have recently used machine learning to accomplish similar predictions (https://doi.org/10.1101/2022.08.02.502471). As such we have changed the wording in to reflect that this correlation is difficult to find using simple analyses:

      "These results indicate that it may be challenging to connect the effects of single and pairs of species on a focal strain to a specific trait of the involved strains, using simple analysis." (Lines 157-159)

    1. Author Response

      Reviewer #1 (Public Review):

      Slusarczyk et al present a very well written manuscript focused on understanding the mechanisms underlying aging of erythrophagocytic macrophages in the spleen (RPM) and its relationship to iron loading with age. The manuscript is diffuse with a broad swath of data elements. Importantly, the manuscript demonstrates that RPM erythrophagocytic capacity is diminished with age, restored in iron restricted diet fed aged mice. In addition, the mechanism for declining RPM erythrophagocytic capacity appears to be ferroptosis-mediated, insensitive to heme as it is to iron, and occur independently of ROS generation. These are compelling findings. However, some of the data relies on conjecture for conclusion and a clear causal association is not clear. The main conclusion of the manuscript points to the accumulation of unavailable insoluble forms of iron as both causing and resulting from decreased RPM erythrophagocytic capacity.

      We are proposing that intracellular iron accumulation progresses first and leads to global proteotoxic damage and increased lipid peroxidation. This eventually triggers the death of a fraction of aging RPMs, thus promoting the formation of extracellular iron-rich protein aggregates. More explanation can be found below. Besides, iron loading suppresses the erythrophagocytic activity of RPMs, hence further contributing to their functional impairment during aging.

      In addition, the finding that IR diet leads to increased TF saturation in aged mice is surprising.

      We believe that this observation implies better mobilization of splenic iron stores, and corroborates our conclusion that mice that age on an iron-reduced diet benefit from higher iron bioavailability, although these differences are relatively mild. More explanation can be found in our replies to Reviewer #2.

      Furthermore, whether the finding in RPMs is intrinsic or related to RBC-related changes with aging is not addressed.

      We now addressed this issue and we characterized in more detail both iron and ROS levels in RBCs.

      Finally, these findings in a single strain and only female mice is intriguing but warrants tempered conclusions.

      We tempered the conclusions and provided a basic characterization of the RPM aging phenotype in Balb/c female mice.

      Major points:

      1) The main concern is that there is no clear explanation of why iron increases during aging although the authors appear to be saying that iron accumulation is both the cause of and a consequence of decreased RPM erythrophagocytic capacity. This requires more clarification of the main hypothesis on Page 4, line 17-18.

      We thank the reviewer for this comment. It was previously reported that iron accumulates substantially in the spleen during aging, especially in female mice (Altamura et al., 2014). Since RPMs are those cells that process most of the iron in the spleen, we aimed to explore what is the relationship between iron accumulation and RPM functions during aging. This investigation led us to uncover that indeed iron accumulation is both the cause and the consequence of RPM dysfunction. Specifically, we propose that intracellular iron loading of RPMs precedes extracellular deposition of iron in a form of protein-rich aggregates, driven by RPMs damage. To support this, we now show that the proteome of RPMs overlaps with those proteins that are present in the age-triggered aggregates (Fig. 3F). Furthermore, corroborating our model, we now demonstrate that transient iron loading of RPMs via iron-dextran injection (new Fig. 3G) leads to the formation of protein-rich aggregates, closely resembling those present in aged spleens (new Fig. 3H). This implies that high iron content in RPMs is indeed a major driving factor that leads to aggregation of their proteome and cell damage. Importantly, we now supported this model with studies using iRPMs. We demonstrated that iron loading and blockage of ferroportin by synthetic mini-hepcidin (PR73)(Stefanova et al., 2018) cause protein aggregation in iRPMs and lead to their decreased viability only in cells that were exposed to heat shock, a well-established trigger of proteotoxicity (new Fig. 5K and L). We propose that these two factors, namely age-triggered decrease in protein homeostasis and exposure to excessive iron levels, act in concert and render RPMs particularly sensitive to damage during aging (see also Discussion, p. 16).

      In parallel, our data imply that the increased iron content in aged RPMs drives their decreased erythrophagocytic activity, as we now better documented by more extensive in vitro experiments in iRPMs (new Fig 6E-H). We cannot exclude that some of the senescent splenic RBCs that are retained in the red pulp and evade erythrophagocytosis due to RPM defects in aging, may also contribute to the formation of the aggregates. This is supported by the fact that mice that lack RPMs as well exhibit iron loading in the spleen (Kohyama et al., 2009; Okreglicka et al., 2021), and that the proteome of aggregates overlaps to some extent with the proteome of erythrocytes (new Fig. 3F).

      We believe that during aging intracellular iron accumulation is chiefly driven by ferroportin downregulation, as also suggested by Reviewer#3. We now show that ferroportin drops significantly already in mice aged 4 and 5 months (new Fig. 4H), preceding most of the other impairments. This drop coincides with the increase in hepcidin expression, but if this is the sole reason for ferroportin suppression during early aging would require further investigation outside the scope of the present manuscript.

      In sum, to address this comment, we now modified the fragment of the introduction that refers to our hypothesis and major findings to be more clear (p. 4), we improved our manuscript by providing new data mentioned above and we added more explanation in the corresponding sections of the Results and Discussion.

      2) It is unclear if RPMs are in limited supply. Based on the introduction (page 4, line 13-15), they have limited self-renewal capacity and blood monocytes only partially replenished. Fig 4D suggests that there is a decrease in RPMs from aged mice. The %RPM from CD45+ compartment suggests that there may just be relatively more neutrophils or fewer monocytes recruited. There is not enough clarity on the meaning of this data point.

      Thank you for this comment. We fully agree that %RPMs of CD45+ splenocytes, although well-accepted in literature (Kohyama et al., 2009; Okreglicka et al., 2021), is only a relative number. Hence, we now included additional data and explanations regarding the loss of RPMs during aging.

      It was reported that the proportion of RPMs derived from bone marrow monocytes increases mildly but progressively during aging (Liu et al., 2019). This implies that due to the loss of the total RPM population, as illustrated by our data, the cells of embryonic origin are likely even more affected. We could confirm this assumption by re-analysis of the data from Liu et al. that we now included in the manuscript as Fig. 5E. These data clearly show that the representation of embryonically-derived RPMs drops more drastically than the percent of total RPMs, whereas the replenishment rate from monocytes is not affected significantly during aging. Consistent with this, we have not observed any robust change in the population of monocytes (F4/80-low, CD11b-high) or pre-RPMs (F4/80-high, CD11b-high) in the spleen at the age of 10 months (Figure 5-figure supplement 2A and B). We also have detected a mild decrease, not an increase, in the number of granulocytes (new Figure 5-figure supplement 2C). Furthermore, we measured in situ apoptosis marker and found a clear sign of apoptosis in the aged spleen (especially in the red pulp area), a phenotype that is less pronounced in mice on an IR diet (new Fig. 5O). This is consistent with the observation that apoptosis markers can be elevated in tissues upon ferroptosis induction (Friedmann Angeli et al., 2014) and that the proteotoxic stress in aged RPMs, which we now emphasized better in our manuscript, may also lead to apoptosis (Brancolini & Iuliano, 2020). Taken together, we strongly believe that the functional defect of embryonically-derived RPMs chiefly contributes to their shortage during aging.

      3) Anemia of aging is a complex and poorly understood mechanistically. In general, it is considered similar to anemia of chronic inflammation with increased Epo, mild drop in Hb, and erythroid expansion, similar to ineffective erythropoiesis / low Epo responsiveness. It is not surprising that IR diet did not impact this mild anemia. However, was the MCV or MCH altered in aged and IR aged mice?

      We now included the data for hematocrit, RBC counts, MCV, and MCH in Figure 1-figure supplement 5. Hematocrit shows a similar tendency as hemoglobin levels, but the values for RBC counts, MCV, and MCH seem not to be altered. We also show now that the erythropoietic activity in the bone marrow is not affected in aged versus young mice. Taken together, the anemic phenotype in female C57BL/6J mice at this age is very mild, which we emphasized in the main text, and is likely affected by other factors than serum iron levels (p. 6).

      4) Page 6, line 23 onward: the conclusion is that KC compensate for the decreased function of RPM in the spleen, based on the expansion of KC fraction in the liver. Is there evidence that KCs are engaged in more erythrophagocytosis in aged mice? Furthermore, iron accumulation in the liver with age does not demonstrate specifically enhanced erythrophagocytosis of KC. Please clarify why liver iron accumulation would not be simply a consequence of increased parenchymal iron similar to increased splenic iron with age, independent of erythrophagocytic activity in resident macrophages in either organ.

      Thanks for these questions. For the quantification of the erythrophagocytosis rate in KC, we show, as for the RPMs (Fig. 1K), the % of PKH67-positive macrophages, following transfusion of PKH67-stained stressed RBCs (Fig. 1M). The data implies a mild (not statistically significant) drop (of approx. 30%) in EP activity. We believe that it is overridden by a more pronounced (on average, 2-fold) increase in the representation of KCs (Fig. 1N). The mechanisms of iron accumulation between the spleen and the liver are very different. In the liver, we observed iron deposition in the parenchymal cells (not non-parenchymal, new Fig. 1P) that we currently characterizing in more detail in a parallel manuscript. Our data demonstrate a drop in transferrin saturation in aged mice. Hence, it is highly unlikely that aging would be hallmarked by the presence of circulating non-transferrin-bound iron that would be sequestered by hepatocytes, as shown previously (Jenkitkasemwong et al., 2015). Thus, the iron released locally by KCs is the most likely contributor to progressive hepatocytic iron loading during aging. The mechanism of iron delivery to hepatocytes from erythrophagocytosing KCs was demonstrated by Theurl et al.(Theurl et al., 2016), and we propose that it may be operational, although in a much more prolonged time scale, during aging. We now discussed this part better in our Results sections (p. 7).

      5) Unclear whether the effect on RPMs is intrinsic or extrinsic. Would be helpful to evaluate aged iRPMs using young RBC vs. young iRPMs using old RBCs.

      We are skeptical if the generation of iRPMs cells from aged mice would be helpful – these cells are a specific type of primary macrophage culture, derived from bone marrow monocytes with MCSF1, and exposed additionally to heme and IL-33 for 4 days. We do not expect that bone marrow monocytes are heavily affected by aging, and would thus recapitulate some aspects of aged RPMs from the spleen, especially after 8-day in vitro culture. However, to address the concerns of the reviewer, we now provide additional data regarding RBC fitness. Consistent with the time life-span experiment (Fig, 2A), we show that oxidative stress in RBCs is only increased in splenic, but not circulating RBCs (new Fig. 2C, replacing the old Fig. 2B and C). In addition, we show no signs of age-triggered iron loading in RBCs, either in the spleen (new Fig. 2F) or in the circulation (new Fig. 2B). Hence, we do not envision a possibility that RPMs become iron-loaded during aging as a result of erythrophagocytosis of iron-loaded RBCs. In support of this, we also have observed that during aging first RPMs’ FPN levels drop, afterward erythrophagocytosis rate decreases, and lastly, RBCs start to exhibit significantly increased oxidative stress (presented now in new Fig. 4H, J and K).

      6) Discussion of aggregates in the spleen of aged mice (Fig 2G-2K and Fig 3) is very descriptive and non-specific. For example, if the iron-rich aggregates are hemosiderin, a hemosiderin-specific stain would be helpful. This data specifically is correlatory and difficult to extract value from.

      Thanks for these comments. To the best of our knowledge Prussian blue Perls’ staining (Fig. 2J) is considered a hemosiderin staining. Our investigations aimed to better understand the nature and the origin of splenic iron deposits that to some extent are referred to as hemosiderin. Most importantly, as mentioned in our reply R1 Ad. 1. to assign causality to our data, we now demonstrated that iron accumulation in RPMs in response to iron-dextran (Fig. 3G) increases lipid peroxidation (Fig. 5F), tends to provoke RPMs depletion (Fig. 5G) and triggers the formation of protein-rich aggregates (new Fig. 3H). Of note, we assume that the loss of embryonically-derived RPMs in this model may be masked by simultaneous replenishment of the niche from monocytes, a phenomenon that may be addressed by future studies using Ms4a3-driven reporter mice (as shown for aged mice in our new Fig. 5E).

      7) The aging phenotype in RPMs appears to be initiated sometime after 2 months of age. However, there is some reversal of the phenotype with increasing age, e.g. Fig 4B with decreased lipid peroxidation in 9 month old relative to 6 month old RPMs. What does this mean? Why is there a partial spontaneous normalization?

      Thanks for this comment and questions. Indeed, the degree of lipid peroxidation exhibits some kinetics, suggestive of partial normalization. Of note, such a tendency is not evident for other aging phenotypes of RPMs, hence, we did not emphasize this in the original manuscript. However, in a revised version of the manuscript, we now present the re-analysis of the published data which implies that the number of embryonically-derived RPMs drops substantially between mice at 20 weeks and 36 weeks (new Fig. 5E). We think that the higher proportion of monocyte-derived RPMs in total RPM population later in aging (9 months) might be responsible for the partial alleviation of lipid peroxidation. We now discussed this possibility in the Results sections (p. 12).

      8) Does the aging phenotype in RPMs respond to ferristatin? It appears that NAC, which is a glutathione generator and can reverse ferroptosis, does not reverse the decreased RPM erythrophagocytic capacity observed with age yet the authors still propose that ferroptosis is involved. A response to ferristatin is a standard and acceptable approach to evaluating ferroptosis.

      We fully agree with the Reviewer that using ferristatin or Liproxstatin-1 would be very helpful to fully characterize a mechanism of RPMs depletion in mice. However, previous in vivo studies involving Liproxstatin-1 administration required daily injections of this ferroptosis inhibitor (Friedmann Angeli et al., 2014). This would be hardly feasible during aging. Regarding the experiments involving iron-dextran injection, using Liproxstatin-1 would require additional permission from the ethical committee which takes time to be processed and received. However, to address this question we now provide data from iRPMs cell cultures (new Fig.5 K-L). In essence, our results imply that both proteotoxic stress and iron overload act in concert to trigger cytotoxicity in RPM in vitro model. Interestingly, this phenomenon does not depend solely on the increased lipid peroxidation, but when we neutralize the latter with Liproxstatin-1, the cytotoxic effect is diminished (please, see also Results on p. 13 and Discussion p. 15/16).

      9) The possible central role for HO-1 in the pathophysiology of decreased RPM erythrophagocytic capacity with age is interesting. However, it is not clear how the authors arrived at this hypothesis and would be useful to evaluate in the least whether RBCs in young vs. aged mice have more hemoglobin as these changes may be primary drivers of how much HO-1 is needed during erythrophagocytosis.

      Thanks for this comment. We got interested in HO-1 levels based on the RNA sequencing data, which detected lower Hmox-1 expression in aged RPMs (Figure 3-figure supplement 1). We now show that the content of hemoglobin is not significantly altered in aged RBCs (MCH parameter, Figure 1-figure supplement 5E), hence we do not think that this is the major driver for Hmox-1 downregulation. Likewise, the levels of the Bach1 message, a gene encoding Hmox-1 transcriptional repressor, are not significantly altered according to RNAseq data. Hence, the reason for the transcriptional downregulation of Hmox-1 is not clear. Of note, HO-1 protein levels in the total spleen are higher in aged versus young mice, and we also detected a clear appearance of its nuclear truncated and enzymatically-inactive form (see a figure below, we opt not to include this in the manuscript for better clarity). The appearance of truncated HO-1 seems to be partially rescued by the IR diet. It is well established that the nuclear form of HO-1 emerges via proteolytic cleavage and migrates to the nucleus under conditions of oxidative stress (Mascaro et al., 2021). This additionally confirms that the aging spleen is hallmarked by an increased burden of ROS. Moreover, we also detected HO-1 as one of the components of the protein iron-rich aggregates. Thus, we propose that the low levels of the cytoplasmic enzymatically active form of HO-1 in RPMs (that we preferentially detect with our intracellular staining and flow cytometry) may be underlain by its nuclear translocation and sequestration in protein aggregates that evade antibody binding [this is also supported by our observation that the protein aggregates, despite the high content of ferritin (as indicated by MS analysis) are negative for L-ferritin staining. Of note, we also cannot exclude that other cell types in the aging spleen (eg. lymphocytes) express higher levels of HO-1 in response to splenic oxidative stress.

      Fig. Total splenic levels of HO-1 in young, aged IR and aged mice.

      Reviewer #2 (Public Review):

      Slusarczyk et al. investigate the functional impairment of red pulp macrophages (RPMs) during aging. When red blood cells (RBCs) become senescent, they are recycled by RPMs via erythrophagocytosis (EP). This leads to an increase in intracellular heme and iron both of which are cytotoxic. The authors hypothesize that the continuous processing of iron by RPMs could alter their functions in an age-dependent manner. The authors used a wide variety of models: in vivo model using female mice with standard (200ppm) and restricted (25ppm) iron diet, ex vivo model using EP with splenocytes, and in vitro model with EP using iRPMs. The authors found iron accumulation in organs but markers for serum iron deficiency. They show that during aging, RPMs have a higher labile iron pool (LIP), decreased lysosomal activity with a concomitant reduction in EP. Furthermore, aging RPMs undergo ferroptosis resulting in a non-bioavailable iron deposition as intra and extracellular aggregates. Aged mice fed with an iron restricted diet restore most of the iron-recycling capacity of RPMs even though the mild-anemia remains unchanged.

      Overall, I find the manuscript to be of significant potential interest. But there are important discrepancies that need to be first resolved. The proposed model is that during aging both EP and HO-1 expression decreases in RPMs but iron and ferroportin levels are elevated. In their model, the authors show intracellular iron-rich proteinaceous aggregates. But if HO-1 levels decrease, intracellular heme levels should increase. If Fpn levels increase, intracellular iron levels should decrease. How does LIP stay high in RPMs under these conditions? I find these to be major conflicting questions in the model.

      We thank the Reviewer for her/his valuable feedback. As we mentioned in our replies we can only assume that a small misunderstanding in the interpretation of the presented data underlies this comment. We show that ferroportin levels in RPMs (Fig. 1F) are modulated in a manner that fully reflects the iron status of these cells (both labile and total iron levels, Figs. 1H and I). FPN levels drop in aged RPMs and are rescued when mice are maintained on a reduced iron diet. As pointed out by Reviewer#3, and explained in our replies we believe that ferroportin levels are critical for the observed phenotypes in aging. We now described our data in a more clear way to avoid any potential misinterpretation (p.6).

      Reviewer #3 (Public Review):

      This is a comprehensive study of the effects of aging of the function of red pulp macrophages (RPM) involved in iron recycling from erythrocytes. The authors document that insoluble iron accumulates in the spleen, that RPM become functionally impaired, and that these effects can be ameliorated by an iron-restricted diet. The study is well written, carefully done, extensively documented, and its conclusions are well supported. It is a useful and important addition for at least three distinct fields: aging, iron and macrophage biology.

      The authors do not explain why an iron-restricted diet has such a strong beneficial effect on RPM aging. This is not at all obvious. I assume that the number of erythrocytes that are recycled in the spleen, and are by far the largest source of splenic iron, is not changed much by iron restriction. Is the iron retention time in macrophages changed by the diet, i.e. the recycled iron is retained for a short time when diet is iron-restricted (making hepcidin low and ferroportin high), and long time when iron is sufficient (making hepcidin high and ferroportin low)? Longer iron retention could increase damage and account for the effect. Possibly, macrophages may not empty completely of iron before having to ingest another senescent erythrocyte, and so gradually accumulate iron.

      We are very grateful to this Reviewer for emphasizing the importance of the iron export capacity of RPMs as a possible driver of the observed phenotypes. Indeed, as mentioned above, we now show in the revised version of the manuscript that ferroportin drops early during aging (revised Fig. 4). Importantly, we now also observed that iron loading and limitation of iron export from iRPMs via ferroportin aggravate the impact of heat shock (a well-accepted trigger of proteotoxicity) on both protein aggregation and cell viability (new Fig. 5K and L). Physiologically, recent findings show that aging promotes a global decrease in protein solubility [BioRxiv manuscript (Sui X. et al., 2022)], and it is very likely that the constant exposure of RPMs to high iron fluxes renders these specialized cells particularly sensitive to proteome instability. This could be further aggravated by a build-up of iron due to the drop of ferroportin early during aging, ultimately leading to the appearance of the protein aggregates as early as at 5 months of age in C57BL/6J females. Based on the new data, we emphasized this model in the revised version of the manuscript (please, see Discussion on p. 16)

    1. Author Response:

      Reviewer #2:

      Cai & Padoa-Schioppa recorded from macaque dorsal anterior cingulate cortex (ACCd) while requiring animals to choose between different juice types offered in variable amounts and with different action costs. Authors compared neural activity in ACCd (present study) with previous, directly comparable, findings on this same task when recording in macaque orbitofrontal cortex. The behavioral task is very powerful and the analyses of both the choice behavior and neural data are rigorous. Authors conclude that ACCd is unique in representing more post-decision variables and in its encoding of chosen value and binary outcome in several reference frames (chosen juice, chosen cost, and chosen action), not offer value, like OFC. Indeed, the encoding of choice outcomes in ACCd was skewed toward a cost-based reference frame. Overall, this is important new information about primate ACCd. I have only a few suggestions to enhance clarity. Figures 5 and 7 are maximally informative, but it is not clear that Figure 6 adds much to the reported Results. It is also suggested to abbreviate the comparison with Hosokawa et al. as it presently takes up 3 paragraphs in the Discussion: it is clear the methods and task designs were different enough to not be so easily compared with the present study. An additional suggestion would be to include mention of the comparison with OFC in the abstract and possibly also in the title, since the finding and direct comparison in Figure 7 are some of the most novel and interesting effects of the paper. Other suggestions are minor, and have to do with definition of time windows, variables, and additional papers that authors may cite for a well-rounded Discussion.

      Please refer to Essential Revisions point #4. And we added “In contrast to the OFC” in the abstract to highlight the difference between these two regions.

      Essential Revisions Point #4 Response:

      We shortened the discussion from 3 paragraphs to 1 paragraph as follows.

      "In another study, Hosokawa, Kennerley et al. (2013) compared the neuronal coding in ACCd and OFC in a choice task involving cost-benefit tradeoff. Our findings differ in two aspects. First, Hosokawa et. al. (2013) reported contralateral action value coding in ACCd while we did not discover significant offer value coding in either spatial- or action-based reference frames in our ACCd recordings. Second, they reported that there was no action-based value representation in the OFC therefore concluded that OFC does not integrate action cost in economic choice. Two elements may help explain the discrepancies between our findings in ACCd and OFC (Cai and Padoa-Schioppa 2019) and those of Hosokawa et. al. (2013). First, we recall that Hosokawa et. al. (2013) only tested value-related variables such as the benefit, cost and discounted value in action-based reference frame. Most importantly, they did not test the variable that is related to the saccade direction, which is highly correlated with the spatial value signal. As a consequence, contralateral value signal may not be significant if chosen target location was included in their regression analysis. Indeed, in our analysis, saccade direction (or chosen target location) was identified as one of the variables that explained a significant portion of neuronal activity in ACCd (Cai and Padoa-Schioppa 2012, Cai and Padoa-Schioppa 2019).The second and often overlooked aspect is that value may be encoded in schemes other than the action-based reference frame. In their study, each unique combination of reward quantity and cost was presented by a unique picture. Thus, information on good attributes were conveyed to the animal with an “integrated” visual representation. Accordingly, a distinct group of neurons may have been recruited to encode the reward and cost conjunctively represented by a unique fractal, which would result in 16 groups of offer value coding neurons."

      Reviewer #3:

      Cai and Padoa-Schioppa present a paper titled 'Neuronal Activity in Dorsal Anterior Cingulate Cortex during Economic Choices under Variable Action Costs'. They used a binary choice task where both offers indicated the reward type, reward amount, and the action cost (but not the specific action.) Variable action costs were then operationalized by placing targets on concentric circles of different radius. Here, and in a previous study that included OFC recordings (Cai and Padoa-Schioppa, 2019), monkeys integrated action costs into their decisions. Single-unit recordings in ACCd revealed that neurons predominantly coded for post-decision variables, such as cost of the chosen target and the juice type of the chosen offer, but not pre-decision variables, such as offer values. Given this finding, the authors compared the percentage of neurons in OFC and ACCd that coded for decision variables. In OFC neurons, the activity was mostly restricted to the offer presentation phase, whereas ACCd neurons showed sustained coding of chosen value and costs that lasted until the appearance of the saccade targets. Overall, this is an interesting study that provides evidence that decision-related signals evolve from coding offer values in the OFC to representing chosen costs in the ACC. This finding could highlight the roles of ACC neurons in learning and decision making. We have only a few questions.

      1) Do any of the variables used in this study correlate with a conflict? When the authors previously studied ACC, they discarded the conflict monitoring hypothesis - a hypothesis that is well established for ACC hemodynamic responses - for ACC single cell activity based on neural data from 'difficult' decisions (Cai and Padoa-Schioppa, 2012). The definition of difficulty they used, then, was descriptive and based on reaction times (RTs). They defined the most difficult trials as those trials with the longest RTs and discovered that those trials had options with similar offer values. This definition of choice difficulty appears to be contrived from evidence accumulation models/tasks, where normatively harder judgments elicit longer RTs. However, there is no normative economic reason that trials with similar offer values are more difficult or should cause conflict. After all, according to theory, choosing between two options with the same value is as easy as flipping a coin. Here, it seems like the authors could have a more fitting definition of conflict. For example, conflict can be operationalized by considering trials when the animal must choose between a high value/high-cost option and a low-value/low-cost option. In that case, the costs and benefits are in conflict. What do the RTs look like? Do the RTs indicate conflict resolution? If so, is this reflected in neuronal responses?

      We thank the reviewer for raising this important point. First, we would like to clarify that both in this study and in our previous study of ACC (Cai and Padoa-Schioppa 2012) we imposed a delay between offer presentation and the go signal. Such delay is critical to disentangle value comparison from action selection. However, the delay effectively dissociates reaction times from the decision difficulty. Normally, we operationalize the decision difficulty (or conflict) with the variable value ratio = chosen value / unchosen value. In an early behavioral study conducted in capuchin monkeys, where no delay was imposed between offer presentation and the go signal, we found that reaction times were strongly correlated with the value ratio, as one would naturally expect (Padoa-Schioppa, Jandolo et al. 2006). In the previous study of ACC (Cai and Padoa-Schioppa 2012) we referenced that earlier result but, again, we did not analyze reaction times.

      Coming to the present study, we addressed this question by including in the variable selection analyses the two variables value ratio and cost/benefit conflict = cost of A * sign(offer value A – offer value B) (see also Table 2). The results of the updated analysis are illustrated in the new Figure 4, which we include here below. In essence, including these two variables did not affect the results of the variable selection analysis. That is, both the stepwise and best-subset methods selected the variables chosen value, chosen cost, chosen juice, chosen offer location only and chosen target location only.

      Figure 4. Population summary of ANCOVA (all time windows). (A) Explained responses. Row and columns represent, respectively, time windows and variables. In each location, the number indicates the number of responses explained by the corresponding variable in that time window. For example, chosen value (juice) explained 34 responses in the post-offer time window. The same numbers are also represented in gray scale. Note that each response could be explained by more than one variable and thus could contribute to multiple bins in this panel. (B) Best fit. In each location, the number indicates the number of responses for which the corresponding variable provided the best fit (highest R2 in that time window. For example, chosen value (juice) provided the best fit for 40 responses in the late-delay time window. The numerical values are also represented in gray scale. In this plot, each response contributes to at most one bin.

      2) The authors claimed that the ACCd neurons integrated juice identity, juice quantity and action costs later in the trial. As they acknowledge, the evidence for this claim is marginal. The conclusion the authors made in line 211, therefore, could be moderated. Given that the model containing cost-related variables is more complex, it is equally valid and more appropriately to write '… we cannot reject the null hypothesis that action cost was not integrated by chosen value responses later in the trial.

      We acknowledge the complexity of this claim. However, results from previous studies (Kennerley, Dahmubed et al. 2009, Kennerley and Wallis 2009, Hosokawa, Kennerley et al. 2013) are in favor of establishing a null hypothesis of integration rather than non-integration. Therefore, we feel that it is more appropriate to keep the null hypothesis of cost integration while in the meantime acknowledging that in our study the evidence for cost integration is rather weak.

    1. Author Response

      Reviewer #1 (Public Review):

      1) It would be helpful to include some sort of comparison in Fig. 4, e.g. the regressions shown in Fig 3, to indicate to what extent the ICCl data corresponds to the "control range" of frequency tuning.

      Figure 4 was modified to show the frequency range typically found in the ICCls. This range is based on results from Wagner et al., 2007, which extensively surveyed ICCls responses. This modification shows that our ICCls recordings in the ruff-removed owls cover the normal frequency hearing range of the owl.

      2) A central hypothesis of the study is that the frequency preference of the high-frequency neurons is lower in ruff-removed owls because of the lowered reliability caused by a lack of the ruff. Yet, while lower, the frequency range of many neurons in juvenile and ruff-removed owls seems sufficiently high to be still responsive at 7-8 kHz. I think it would be important to know to what extent neurons are still ITD sensitive at the "unreliable high frequencies" even if the CFs are lower since the "optimization" according to reliability depends not on the best frequency of each neuron per se, but whether neurons are less ITD sensitive at the higher, less reliable frequencies.

      The concern regarding the frequency range that elicits responsivity was largely addressed above. Specifically, Figure L1 showing frequency tuning of frontally tuned ICx neurons in ruff-removed owls indicates that while there is some variability of tuning across neurons, there is little responsivity above 6 kHz. In contrast, equivalent analysis in juvenile owls (Figure L3), shows there is much more responsiveness and variability across neurons to high and low frequencies. This evidence supports our hypothesis that the juvenile owl brain is still highly plastic, which facilitates learning during development. Although the underlying data was already reported in Figure 7 of our previously submitted manuscript, we can include Figures L1 and L2, potentially as supplemental figures, if considered useful by editors and reviewers. Nevertheless, this argumentation was further expanded in the revised text (Line 229).

      Figure L1. Frequency tuning of frontally-tuned ICx neurons in ruff-removed owls. Tuning curves are normalized by the max response. Thick black line indicates the average tuning curve. Dashed black line indicates basal response.

      Figure L2. ITD sensitivity across frequencies in ruff-removed owl. Two example neurons shown in a and b. ITD tuning for tones (colored) and broadband (black) plotted by firing rate (non-normalized). Solid colored lines indicate responses to frequencies that are within the neuron’s preferred frequency range (i.e. above the half-height, see Methods), dashed lines indicate frequencies outside of the neuron’s frequency range.

      Figure L3. Frequency tuning of frontally-tuned ICx neurons in juvenile owls. Tuning curves are normalized by the max response. Thick black line indicates the average tuning curve. Dashed black line indicates basal response.

      3) It would be interesting to have an estimate of the time scale of experience dependency that induces tuning changes. Do the authors have any data on this question? I appreciate the authors' notion that the quantifications in Fig 7 might indicate that juvenile owls are already "beginning to be shaped by ITD reliability" (line 323 in Discussion). How many days after hearing onset would this correspond to? Does this mean that a few days will already induce changes?

      While tracking changes induced by ruff-removal over development were outside of the scope of this study, many other studies have assessed experience-dependent plasticity in the barn owl. The recordings in this study were performed approximately 20 days after hearing onset, suggesting that the juveniles had ample time to begin learning. These points were expanded upon in the discussion (Lines 254, 280-283).

      Reviewer #2 (Public Review):

      1) Why is IPD variability plotted instead of ITD variability (or indeed spatial reliability)? The relationship between these measures is likely to vary across frequency, which makes it difficult to compare ITD variability across frequency when IPDs are plotted. Normalizing data across frequencies also makes it difficult to compare different locations and acoustical conditions. For example, in Fig.1a and Fig.1b, the data shown for 3 kHz at ~160 degrees seems quantitatively and visually quite different, but the difference (in Fig.1c) appears to be negligible.

      Justification of why IPD variability is used as an estimate of ITD variability was added to introduction (Lines 55-60), results (Line 100) and methods (Lines 371-374) sections of the manuscript, explaining the fact that because ITD detection is based on phase locking by auditory nerve and ITD detector neurons tuned to narrow frequency bands, responses of ITD detector neurons forwarded to downstream midbrain regions are therefore determined by IPD variability. Additionally, ITD is calculated by dividing IPD by frequency, which makes comparisons of ITD reliability across frequency mathematically uninformative.

      2) How well do the measures of ITD reliability used reflect real-world listening? For example, the model used to calculate ITD reliability appears to assume the same (flat) spectral profile for targets and distractors, which are presented simultaneously with the same temporal envelope, and a uniform spatial distribution of sounds across space. It is therefore unclear how robust the study's results are to violations of these assumptions.

      While we agree that our analysis cannot completely capture real-world listening for the barn owl, a general analysis using similar flat spectral profiles for targets and concurrent sounds provides a broad assessment of reliability of ITD cues. While a full recapitulation of real-world listening is beyond the scope of this study (i.e. recording natural scenes from the ear canals of wild barn owls), we included additional analyses of ITD reliability in Figure 1-figure supplement 1, described above.

      3) Does facial ruff removal produce an isolated effect on ITD variability or does it also produce changes in directional gain, and the relationship between spatial cues and sound location? Although the study considers this issue in some places (e.g. Fig.2, Fig.5), a clearer presentation of the acoustical effects of facial ruff removal and their implications (for all locations, not just those to the front), as well as an attempt to understand how these acoustical changes lead to the observed changes in ITD reliability, would greatly strengthen the study. In addition, Fig.1 shows average ITD reliability across owls, but it would be helpful to know how consistent these measures are across owls, given individual variability in Head-Related Transfer Functions (HRTFs). This potentially has implications for the electrophysiological experiments, if the HRTFs of those animals were not measured. One specific question that is potentially very relevant is whether the facial ruff attenuates sounds presented behind the animal and whether it does so in a frequency-dependent way. In addition, if facial ruff removal enables ILDs to be used for azimuth, then ITDs may also become less necessary at higher frequencies, even if their reliability remains unchanged.

      Additional analysis was conducted to generate representation of changes in directional gain induced by ruff removal, added to new figure (Fig 5). This analysis shows that changes in gain following ruff-removal are largely frequency-independent: there is a de-attenuation of peripherally and rearwardly located sounds, but the highest gain remains for high frequencies in frontal space. There is an additional increase in gain for high frequencies from rearward space, these changes would not explain the changes in frequency tuning we report. As mentioned in new additions to the manuscript, the changes at the most rearward-located auditory spatial locations are unlikely to have an effect on the auditory midbrain. No studies in the barn owl have found neurons in the ICx or optic tectum tuned to >120° (Knudsen, 1982; Knudsen, 1984; Cazettes et al., 2014). In addition, variability of IPD reliability across owls was analyzed and reported in the amended Figure 1, which notes very little changes across owls. In this analysis, we did realize that the file of one of the HRTFs obtained from von Campenhausen et al. 2006 was mislabeled, which explains slight differences in revised Fig 1b. Nevertheless, added analysis of IPD reliability across owls indicates that the pattern in ITD reliability is stable across owls (Fig. 1d,e), which supports our decision to not record HRTFs from owls used in this study. Finally, we added to the discussion that clarifies that the use of ILD for azimuth would not provide the same resolution as ITD would (Lines 295-303). We also do not believe that the use of ILD for azimuth would make “ITDs… less necessary at higher frequencies”, given that the ICCls is still computing ITD at these high frequencies (Fig 4), and that ILDs also have higher resolution at higher frequencies, with and without the facial ruff (Olsen et al, 1989; Keller et al., 1998; von Campenhausen et al., 2006).

      1) It is unclear why some analyses (Fig.5, Fig.7) are focused on frontal locations and frontally-tuned neurons. It is also unclear why neurons with a best ITDs of 0 are described as frontally tuned since locations behind the animal produce an ITD of 0 also. Related to this, in Fig.1, facial ruff removal appears to reduce IPD variability at low frequencies for locations to the rear (~160 degrees), where the ITD is likely to be close to 0. Neurons with a best ITD of 0 might therefore be expected to adjust their frequency tuning in opposite directions depending on whether they are tuned to frontal or rearward locations.

      An extensive explanation was added to the methods detailing why we do not believe the neurons recorded in this study are tuned to the rear. Namely, studies mapping the barn owl’s ICx and optic tectum have not reported neurons tuned to locations >120°, with the number of neurons representing a given spatial location decreasing with eccentricity (Knudsen, 1982; Knudsen, 1984; Cazettes et al., 2014). While we agree that there does seem to be a change in ITD reliability at ~160° following ruff-removal, the result is largely similar to the change that occurs in frontal space (Fig 1b), which is consistent with the ruff-removed head functioning as a sphere. Thus, we wouldn’t expect rearwardly-tuned neurons, if they could be readily found, to adjust their frequency tuning to higher frequencies. Finally, we want to clarify that we focused our analyses on frontally-tuned neurons because frontal space is where we observed the largest change in ITD reliability. Text was added to the Discussion section to clarify this point (Lines 313-321).

      2) The study suggests that information about high-frequency ITDs is not passed on to the ICX if the ICX does not contain neurons that have a high best frequency. However, neurons might be sensitive to ITDs at frequencies other than the best frequency, particularly if their frequency tuning is broader. It is also unclear whether the best frequency of a neuron always corresponds to the frequency that provides the most reliable ITD information, which the study implicitly assumes.

      The concern about ITD sensitivity at non-preferred frequencies was addressed under the essential revision #3, as well as under Reviewer 1’s concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a systematic study of the cortical propagation patterns of human beta bursts (~13-35Hz) generated around simple finger movements (index and middle finger button presses).

      The authors deployed a sophisticated and original methodology to measure the anatomical and dynamical characteristics of the cortical propagation of these transient events. MEG data from another study (visual discrimination task) was repurposed for the present investigation. The data sample is small (8 participants). However, beta bursts were extracted over a +/- 2s time window about each button press, from single trials, yielding the detection and analysis of hundreds of such events of interest. The main finding consists of the demonstration that the cortical activity at the source of movement related beta bursts follows two main propagation patterns: one along an anteroposterior directions (predominantly originating from pre central motor regions), and the other along a medio- lateral (i.e., dorso lateral) direction (predominantly originating from post central sensory regions). Some differences are reported, post-hoc, in terms of amplitude/cortical spread/propagation velocity between pre and post-movement beta bursts. Several control tests are conducted to ascertain the veracity of those findings, accounting for expected variations of signal-to-noise ration across participants and sessions, cortical mesh characteristics and signal leakage expected from MEG source imaging.

      One major perceived weakness is the purely descriptive nature of the reported findings: no meaningful difference was found between bursts traveling along the two different principal modes of propagation, and importantly, no relation with behavior (response time) was found. The same stands for pre vs. post motor bursts, except for the expected finding that post-motor bursts are more frequent and tend to be of greater amplitude (yielding the observation of a so-called beta rebound, on average across trials).

      Overall, and despite substantial methodological explorations and the description of two modes of propagation, the study falls short of advancing our understanding of the functional role of movement related beta bursts.

      For these reasons, the expected impact of the study on the field may be limited. The data is also relatively limited (simple button presses), in terms of behavioral features that could be related to the neurophysiological observations. One missed opportunity to explain the functional role of the distinct propagation patterns reports would have been, for instance, to measure the cortical "destination" of their respective trajectories.

      In response to this comment, we would like to highlight two important points.

      First, our work constitutes the first non-invasive human confirmation of invasive work in animals (Balasubramanian et al., 2020; Roberts et al., 2019; Rule et al., 2018; (Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Takahashi et al., 2011, 2015) and patients (Takahashi et al., 2011). Thus, these results bridges between recordings limited to the size of multielectrode arrays (roughly 0.16 cm2; Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Takahashi et al., 2011, 2015) and human EEG recordings spanning across large areas of the cortex and several functionally distinct regions (Alexander et al., 2016; Stolk et al., 2019). The ability to access these neural signatures non- invasively is important for cross-species comparison. This further enables us, to provide an in-depth analysis of the spatiotemporal diversity of human MEG signals and a detailed characterisation of the two propagation directions, which significantly extends previous reports. We note that their functional role remains undetermined also in these animal studies, but being able to identify these signals now in humans can provide a steppingstone for identifying their role.

      Second, and related, the reviewers are correct that we did not observe distinct propagation directions between pre- and post-movement bursts, nor a relationship with reaction time. However, such a null result would be relevant, in our view, towards understanding what the functional relevance of these signals, if any, might be. Recent work in macaques indicates that the spatiotemporal patterns of high-gamma activity carry kinematic information about the upcoming movement (Liang et al 2023). The functional role of beta may therefore be more complex and not relate to reaction times or kinematics in a straightforward manner. We believe this is a relevant observation, and in keeping with the continued efforts to identify how sensorimotor beta relates to behaviour. It is increasingly clear that spatiotemporal diversity in animal recordings and human E/MEG and intracranial recordings can constitute a substantial proportion of the measured dynamics. As such, our report is relevant in narrowing down what these signals may reflect.

      Together, we think that our work provides new insights into the multidimensional and propagating features of burst activity. This is important for the entire electrophysiology community, as it transforms how we commonly analyse and interpret these important brain signals. We anticipate that our work will guide and inspire future work on the mechanistic underpinnings of these dominant neural signals. We are confident that our article has the scope to reach out to the diverse readership of eLife.

      Reviewer #2 (Public Review):

      The authors devised novel and interesting experiments using high precision human MEG to demonstrate the propagation of beta oscillation events along two axes in the brain. Using careful analysis, they show different properties of beta events pre- and post movement, including changes in amplitude. Due to beta's prominent role in motor system dynamics, these changes are therefore linked to behavior and offer insights into the mechanisms leading to movement. The linking of wave-like phenomena and transient dynamics in the brain offers new insight into two paradigms about neural dynamics, offering new ways to think about each phenomena on its own.

      Although there is a substantial, and recent, body of literature supporting the conclusions that beta and other neural oscillations are transient, care must be taken when analyzing the data and the resulting conclusions about beta properties in both time and space. For example, modifying the threshold at which beta events are detected could alter their reported properties and expression in space and time. The authors should therefore performing parameter sweeps on e.g. the thresholds for detection of oscillation bursts to determine whether their conclusions on beta properties and propagation hold. If this additional analysis does not change their story, it would lend confidence in the results/conclusions.

      We thank the reviewing team for this comment. As suggested, we evaluated the effect of different burst thresholds on the burst parameters.

      The threshold in the main analysis was determined empirically from the data, as in previous work (Little et al., 2019). Specifically, trial-wise power was correlated with the burst probability across a range of different threshold values (from median to median plus seven standard deviations (std), in steps of 0.25, see Figure 6-figure supplement 1). The threshold value that retained the highest correlation between trial-wise power and burst probability was used to binarize the data.

      We repeated our original analysis using four additional thresholds, i.e., original threshold - 0.5 std, -0.25 std, +0.25 std, +0.5 std. As one would expect, burst threshold is negatively related to the number of bursts (i.e., higher thresholds yield fewer bursts, Figure R4a [top]), and positively related to burst amplitude (i.e., higher thresholds yield higher burst amplitudes, Figure R4a [bottom]).

      Similarly, the temporal duration of bursts and apparent spatial width are modulated by the burst threshold: lowering the threshold leads to longer temporal duration and larger apparent spatial width while increasing the threshold leads to shorter temporal duration and smaller apparent spatial width Figure R4b. Note that for the temporal and spectral burst characteristics, the difference to the original threshold can be numerically zero, i.e., changing the burst threshold did not lead to changes exceeding the temporal and spectral resolution of the applied time-frequency transformation (i.e., 200ms and 1Hz respectively).

      Importantly, across these threshold values, the propagation direction and propagation speed remain comparable.

      We now include this result as Figure 6-figure supplement 2and refer to this analysis in the manuscript (page 28 line 717).

      “To explore the robustness of the results analyses were repeated using a range of thresholds (Figure 6-figure supplement 2).”

      Determining the generators of beta events at different locations is a tricky issue. The authors mentioned a single generator that is responsible for propagating beta along the two axes described. However, it is not clear through what mechanism the beta events could travel along the neural substrate without additional local generators along the way. Previous work on beta events examined how a sequence of synaptic inputs to supra and infragranular layers would contribute to a typical beta event waveform. Although it is possible other mechanisms exist, how might this work as the beta events propagate through space? Some further explanation/investigation on these issues is therefore warranted.

      Based on this and other comments (i.e., comments 7 and 8) we re-evaluated the use of the term ‘generator’ in this manuscript.

      While the term generator can be used across scales, from micro- to macroscale, ifor the purpose of the present paper, we believe one should differentiate at least two concepts: a) generator of beta bursts, and b) generator of travelling waves.

      We realised that in the previous version of the manuscript the term ‘generator’ was at times used without context. We removed the term where no longer necessary.

      Further, the previous version of the manuscript discussed putative generators of travelling waves (page 19f.) but not generators of beta bursts. We now address this as follows:

      “Studies using biophysical modelling have proposed that beta bursts are generated by a broad infragranular excitatory synaptic drive temporally aligned with a strong supragranular synaptic drive (Law et al., 2022; Neymotin et al., 2020; Sherman et al., 2016; Shin et al., 2017) whereby layer specific inhibition acts to stabilise beta bursts in the temporal domain (West et al., 2023). The supragranular drive is thought to originate in the thalamus (E. G. Jones, 1998, 2001; Mo & Sherman, 2019; Seedat et al., 2020), indicating thalamocortical mechanisms (page 22f).”

      Once the mechanisms have been better understood, a question of how much the results generalize to other oscillation frequencies and other brain areas. On the first question of other oscillation frequencies, the authors could easily test whether nearby frequency bands (alpha and low gamma) have similar properties. This would help to determine whether the observations/conclusions are unique to beta, or more generally applicable to transient bursts/waves in the brain. On the second issue of applicability to other brain areas, the authors could relate their work to transient bursts and waves recorded using ECoG and/or iEEG. Some recent work on traveling waves at the brain-wide level would be relevant for such comparisons.

      We appreciate the enthusiasm and the suggestions. To comment on the frequency specificity of the observed effects we conducted the same analysis focusing on the gamma frequency range (60-90 Hz). For computational reasons, we limited this analysis to one subject. Figure R1 shows the polar probability histogram for the beta frequency range (left) and the gamma frequency range (right). In contrast to the beta frequency range, no dominant directions were observed for the gamma range and von Mises functions did not converge. These preliminary results suggest some frequency specificity of the spatiotemporal pattern in sensorimotor beta activity. We believe this paves the way for future analysis mapping propagation direction across frequency and space.

      Here we did not investigate the spatial specificity of the effects, as the beta frequency range is dominant in sensorimotor areas. Investigating beta bursts in other cortical areas would have likely resulted in very few bursts. We discuss our results across spatial scales in the section: Distinct anatomical propagation axes of sensorimotor beta activity. However, please note that most of the previous literature operates on a different spatial scale (roughly 4mm; Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Rule et al., 2018; Takahashi et al., 2011, 2015) and different species (e.g., non-human primates). Non-invasive recordings in humans capture temporospatial patterns of a very different scale, i.e., often across the whole cortex (Alexander et al., 2016; Roberts et al., 2019). Comparing spatiotemporal patterns, across different spatial scales is inherently difficult. Work

      investigating different spatial scales simultaneously, such as Sreekumar et al. 2020, is required to fully unpack the relationship between mesoscopic and macroscopic spatiotemporal patterns.

      Figure R1: Spatiotemporal organisation for the beta (β, 13-30Hz) and gamma (γ, 60-90) frequency range for one exemplar subject. Same as Figure 4a, but for one exemplar subject.

      If the source code could be provided on github along with documentation and a standard "notebook" on use other researchers would benefit greatly.

      All analyses are performed using freely available tools in MATLAB. The code carrying out the analysis in this paper can be found here: [link provided upon acceptance]. The 3D burst analyses can be very computationally intensive even on a modern computer system. The analyses in this paper were computed on a MacBook Pro with a 2.6 GHz 6-Core Intel Core i7 and 32 Gb of RAM. Details on the installation and setup of the dependencies can be found in the README.md file in the main study repository.

      This information has been added to the paper in the methods section on page 35.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides a comprehensive investigation of the effects of the genetic ablation of three different transcription factors (Srf, Mrtfa, and Mrtfb) in the inner ear hair cells. Based on the published data, the authors hypothesized that these transcription factors may be involved in the regulation of the genes essential for building the actin-rich structures at the apex of hair cells, the mechanosensory stereocilia and their mechanical support - the cuticular plate. Indeed, the authors found that two of these transcription factors (Srf and Mrtfb) are essential for the proper formation and/or maintenance of these structures in the auditory hair cells. Surprisingly, Srf- and Mrtfb- deficient hair cells exhibited somewhat similar abnormalities in the stereocilia and in the cuticular plates even though these transcription factors have very different effects on the hair cell transcriptome. Another interesting finding of this study is that the hair cell abnormalities in Srfdeficient mice could be rescued by AAV-mediated delivery of Cnn2, one of the downstream targets of Srf. However, despite a rather comprehensive assessment of the novel mouse models, the authors do not have yet any experimentally testable mechanistic model of how exactly Srf and Mrtfb contribute to the formation of actin cytoskeleton in the hair cells. The lack of any specific working model linking Srf and/or Mrtfb with stereocilia formation decreases the potential impact of this study.

      Major comments:

      Figures 1 & 3: The conclusion on abnormalities in the actin meshwork of the cuticular plate was based largely on the comparison of the intensities of phalloidin staining in separate samples from different groups. In general, any comparison of the intensity of fluorescence between different samples is unreliable, no matter how carefully one could try matching sample preparation and imaging conditions. In this case, two other techniques would be more convincing: 1) quantification of the volume of the cuticular plates from fluorescent images; and 2) direct examination of the cuticular plates by transmission electron microscopy (TEM).

      In fact, the manuscript provides no single TEM image of the F-actin abnormalities either in the cuticular plate or in the stereocilia, even though these abnormalities seem to be the major focus of the study. Overall, it is still unclear what exactly Srf or Mrtfb deficiencies do with F-actin in the hair cells.

      Yes, we agree. As suggested by the reviewer, to directly examine the defects in F-actin organization within the cuticular plate of mutant mice, we conducted Transmission Electron Microscopy (TEM) analyses. The results, as presented in the revised Figures 1 and 4 (panels F, G, and E, F, respectively), provide crucial insights into the structural changes in the cuticular plate. Meanwhile, the comparison of the volume of the phalloidin labeled cuticular plate after 3-D reconstruction using Imaris software was conducted and shown in Author response image 1. The results of the cuticular plate (CP) volume were consistent with the relative F-actin intensity change of the cuticular plate in the revised Figures 1B and 4B. For the TEM analysis of the stereocilia, we regret that due to time constraints, we were unable to collect TEM images of stereocilia with sufficient quality for a meaningful comparison. However, we believe that the data we have presented sufficiently addresses the primary concerns, and we appreciate the reviewers’ understanding of these limitations.

      Author response image 1.

      Figures 2 & 4 represent another example of how deceiving could be a simple comparison of the intensity of fluorescence between the genotypes. It is not clear whether the reduced immunofluorescence of the investigated molecules (ESPN1, EPS8, GNAI3, or FSCN2) results from their mis-localization or represents a simple consequence of the fact that a thinner stereocilium would always have a smaller signal of the protein of interest, even though the ratio of this protein to the number of actin filaments remains unchanged. According to my examination of the representative images of these figures, loss of Srf produces mis-localization of the investigated proteins and irregular labeling in different stereocilia of the same bundle, while loss of Mrtfb does not. Obviously, a simple quantification of the intensity of fluorescence conceals these important differences.

      Yes, we agree. In addition to the quantification of tip protein intensity, we have added a few more analyses in the revised Figure 3 and Figure 6, such as the percentage of row 1 tip stereocilia with tip protein staining and the percentage of IHCs with tip protein staining on row 2 tip. Using the results mentioned above, the differences in the expression level, the row-specific distribution and the irregular labeling of tip proteins between the control and the mutants can be analyzed more thoroughly.

      Reviewer #2 (Public Review):

      The analysis of bundle morphology using both confocal and SEM imaging is a strength of the paper and the authors have some nice images, especially with SEM. Still, the main weakness is that it is unclear how significant their findings are in terms of understanding bundle development; the mouse phenotypes are not distinct enough to make it clear that they serve different functions so the reader is left wondering what the main takeaway is.

      Based on the reviewer’s comments, in this revised manuscript, we put more emphasis on describing the effects of SRF and MRTFB on key tip proteins’ localization pattern during stereocilia development, represented by ESPN1, EPS8 and GNAI3, as well as the effects of SRF and MRTFB on the F-actin organization of cuticular plate using TEM. We have made substantial efforts to interpret the mechanistic underpinnings of the roles of SRF and MRTFB in hair cells. This is reflected in the revised Figures 1, 3, 4, 6, and 10, where we provide more comprehensive insights into the mechanisms at play.

      We interpret our data in a way that both SRF and MRTF regulate the development and maintenance of the hair cell’s actin cytoskeleton in a complementary manner. Deletion of either gene thus results in somewhat similar phenotypes in hair cell morphology, despite the surprising lack of overlap of SRF and MRTFB downstream targets in the hair cell.

      In Figure 1 and 3, changes in bundle morphology clearly don't occur until after P5. Widening still occurs to some extent but lengthening does not and instead the stereocilia appear to shrink in length. EPS8 levels appear to be the most reduced of all the tip proteins (Srf mutants) so I wonder if these mutants are just similar to an EPS8 KO if the loss of EPS8 occurred postnatally (P0-P5).

      To address this question, we performed EPS8 staining on the control and Srf cKO hair cells at P4 and P10. We found that the dramatic decrease of the row 1 tip signal for EPS8 started since P4 in Srf cKO IHCs. Although the major hair bundle phenotype of Eps8 KO, including the defects of row 1 stereocilia lengthening and additional rows of short stereocilia also appeared in Srf cKO IHCs, there are still some bundle morphology differences between Eps8 KO and Srf cKO. For example, firstly, both Eps8 KO OHCs and IHCs showed additional rows of short stereocilia, but we only observed additional rows of short stereocilia in Srf cKO IHCs. Secondly, in Valeria Zampini’s study, SEM and TEM images did not show an obvious reduction of row 2 stereocilia widening (P18-P35), while our analysis of SEM images confirmed that the width of row 2 IHC stereocilia was drastically reduced by 40% in Srf cKO (P15). Generally, we think although Srf cKO hair bundles are somewhat similar to Eps8 KO, the Srf cKO hair bundle phenotype might be governed by multiple candidate genes cooperatively.

      Reference:

      Valeria Zampini, et al. Eps8 regulates hair bundle length and functional maturation of mammalian auditory hair cells. PLoS Biol. 2011 Apr;9(4): e1001048.

      A major shortcoming is that there are few details on how the image analyses were done. Were SEM images corrected for shrinkage? How was each of the immunocytochemistry quantitation (e.g., cuticular plates for phalloidin and tip staining for antibodies) done? There are multiple ways of doing this but there are few indications in the manuscript.

      We apologize for not making the description of the procedure of images analyses clear enough. As described in Nicolas Grillet group’s study, live and mildly-fixed IHC stereocilia have similar dimensions, while SEM preparation results in a hair bundle at a 2:3 scale compared to the live preparation. In our study, the hair cells selected for SEM imaging and measurements were located in the basal turn (30-32kHz), while the hair cells selected for fluorescence-based imaging and measurements were located in the middle turn (20-24kHz) or the basal turn (32-36kHz). Although our SEM imaging and fluorescence-based imaging of basal turn’s hair bundles were not from the same area exactly, the control hair bundles with SEM imaging have reduced row 1 stereocilia length by 10%-20%, compared to the control hair bundles with fluorescence-based imaging (revised Figure 2 and Figure 5). Generally, our stereocilia dimensions data showed appropriate shrinkage caused by the SEM preparation.

      Recognizing the need for clarity, we have provided a detailed description of our image quantification and analysis procedures in the “Materials and Methods” section, specifically under “Immunocytochemistry.” This will aid readers in understanding our methodologies and ensure transparency in our approach.

      Reference:

      Katharine K Miller, et al. Dimensions of a Living Cochlear Hair Bundle. Front Cell Dev Biol. 2021 Nov 25:9:742529.

      The tip protein analysis in Figs 2 and 4 is nice but it would be nice for the authors to show the protein staining separately from the phalloidin so you could see how restricted to the tips it is (each in grayscale). This is especially true for the CNN2 labeling in Fig 7 as it does not look particularly tip specific in the x-y panels. It would be especially important to see the antibody staining in the reslices separate from phalloidin.

      Thank you for the suggestions. We have shown tip proteins staining in grayscale separately from the phalloidin in the revised Figure 3 and Figure 6. To clearly show the tip-specific localization of CNN2, we conducted CNN2 staining at different ages during hair bundle development and showed CNN2 labeling in grayscale and in reslices in revised Figure 9-figure supplement 1B.

      In Fig 6, why was the transcriptome analysis at P2 given that the phenotype in these mice occurs much later? While redoing the transcriptome analysis is probably not an option, an alternative would be to show more examples of EPS8/GNAI/CNN2 staining in the KO, but at younger ages closer to the time of PCR analysis, such as at P5. Pinpointing when the tip protein intensities start to decrease in the KOs would be useful rather than just showing one age (P10).

      We agree with the reviewer. To address this question, we have performed ESPN1, EPS8 and GNAI3 staining on the control and the mutant’s hair cells at P4, P10 and P15 (the revised Figures 3 and 6). According to the new results, we found that the dramatic decreases of the row 1 tip signal for ESPN1 and EPS8 started since P4 in Srf cKO IHCs, is consistent with the appearance of the mild reduction of row 1 stereocilia length in P5 Srf cKO IHCs. For Mrtfb cKO hair cells, the obvious reduction of the row 1 tip signal for ESPN1 was observed until P10. However, a few genes related to cell adhesion and regulation of actin cytoskeleton were significantly down-regulated in P2 Mrtfb deficient hair cell transcriptome. We think that in hair cells the MRTFB may not play a major role in the regulation of stereocilia development, so the morphological defects of stereocilia happened much later in the Mrtfb mutant than in the Srf mutant.

      While it is certainly interesting if it turns out CNN2 is indeed at tips in this phase, the experiments do not tell us that much about what role CNN2 may be playing. It is notable that in Fig 7E in the control+GFP panel, CNN2 does not appear to be at the tips. Those images are at P11 whereas the images in panel A are at P6 so perhaps CNN2 decreases after the widening phase. An important missing control is the Anc80L65-Cnn2 AAV in a wild-type cochlea.

      We agree with the reviewer. We have conducted more immunostaining experiments to confirm the expression pattern of CNN2 during the stereocilia development, from P0 to P11. The results were included in the revised Figure 9-figure supplement 1B. As the reviewer suggested, CNN2 expression pattern in control cochlea injected with Anc80L65-Cnn2 AAV has also been provided in revised Figure 9E.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an awesome comprehensive manuscript. Authors start by sorting putative stromal cellcontaining BM non-hematopoietic (CD235a-/CD45-) plus additional CD271+/CD235a/CD45- populations to identify nine individual stromal identities by scRNA-seq. The dual sorting strategy is a clever trick as it enriches for rare stromal (progenitor) cell signals but may suffer a certain bias towards CD271+ stromal progenitors. The lack of readable signatures already among CD45-/CD45- sorts might argue against this fear. This reviewer would appreciate a brief discussion on number & phenotype of putative additional MSSC phenotypes in light of the fact that the majority of 'blood lineage(s)'-negative scRNA-seq signatures identified blood cell progenitor identities (glycophorin A-negative & leukocyte common antigen-negative). The nine stromal cell entities share the CXCL12, VCAN, LEPR main signature. Perhaps the authors could speculate if future studies using VCAN or LEPRbased sort strategies could identify additional stromal progenitor identities?

      We would like to thank the reviewer for critically evaluating our work and for the generally positive evaluation of the paper. We apologize for delayed resubmission as it took a long time for a specific antibody to arrive to complete the confocal microscopy analyses.

      The reviewer asks for a brief discussion on the cell numbers and phenotypes of MSSC phenotypes. The cell numbers and percentages of MSSC in sorted CD45low/-CD235a- and CD45low/-CD235a-CD271+ cells can be found in Supplementary File 3 and we have added a summary of the phenotypes of MSSC in the new Supplementary File 7.

      Due to the extremely low frequency of stromal cells in human bone marrow, we chose a sorting strategy that also included CD45low cells (Fig 1A) to ensure that no stromal cells were excluded from the analysis. Although stromal elements are certainly enriched using this approach, the CD45low population contains several different hematopoietic cell types. These include CD34+ HSPCs which are characterized by low CD45 expression2, as well as the CD45low-expressing fractions of other hematopoietic cell populations such as B cells, T cells, NK cells, megakaryocytes, monocytes, dendritic cells, and granulocytes. Furthermore, CD235a- late-stage erythroid progenitors, which are negative for CD45, are represented as well. Of note, our data are consistent with previously reported murine studies showing the presence of a number of hematopoietic populations in CD45- cells, which accounted for the majority of CD45-Ter119-CD31- murine BM cells3,4. However, despite a certain enrichment of stromal elements in the CD45low cell fraction, frequencies were still too low to allow for a detailed analysis of this important bone marrow compartment. This prompted us to adopt the stromal cell-enrichment strategy as described in the manuscript to achieve a better resolution of the stromal compartment. In fact, sorting based on CD45low/-CD235a-CD271+ allowed us to sufficiently enrich bone marrow stromal cells to be clearly detectable in scRNAseq analysis. According to the reviewer’s suggestion, a brief discussion on this issue is now included in the Discussion (page 28, lines 10-15).

      The reviewer also suggested using VCAN or LEPR-based sorting strategy to identify additional stromal identities in future studies.

      However, as an extracellular matrix protein, FACS analysis of cellular VCAN expression can only be achieved based on its intracellular expression after fixation and permeabilization5,6. Additionally, while VCAN is highly and ubiquitously expressed by stromal clusters, VCAN is also expressed by monocytes (cluster 36). Therefore, VCAN is not an optimal marker to isolate viable stromal cells.

      LEPR is the marker that was reported to identify the majority of colony-forming cells in adult murine bone marrow7. We have previously reported that the majority of human adult bone marrow CFU-Fs is contained in the LEPR+ fraction 8. In our current scRNAseq surface marker profiling analysis, group A cells showed high expression of several canonical stromal markers including VCAM1, PDGFRB, ENG (CD73), as well as LEPR (Fig. 4A). However, the four stromal clusters in Group A could not be separated based on the expression of LEPR. Therefore, we chose not to use LEPR as a marker to prospectively isolate the different stromal cell types.

      The authors furthermore localized CD271+, CD81+ and NCAM/CD56+ cells in BM sections in situ. Finally, referring to the strong background of the group in HSC research, in silico prediction by CellPhoneDB identified a wide range of interactions between stromal cells and hematopoietic cells. Evidence for functional interdependence of FCU-F forming cells is completing the novel and more clear bone marrow stromal cell picture.

      We thank the reviewer for the positive comments.

      An illustrative abstract naming the top9 stromal identities in their top4 clusters by their "top10 markers" + functions would be highly appreciated.

      We thank the reviewer for the suggestion. A summary of the characteristics of stromal clusters is now shown in the new Supplementary File 7, which we hope matches the reviewer’s expectations.

      Reviewer #2 (Public Review):

      Knowledge about composition and function of the different subpopulations of the hematopoietic niche of the BM is limited. Although such knowledge about the mouse BM has been accumulating in recent years, a thorough study of the human BM still needs to be performed. The present manuscript of Li and coworkers fills this gap by performing single cell RNA sequencing (scRNAseq) on control BM as well as CD271+ BM cells enriched for non-hematopoietic niche cells.

      We apologize for delayed resubmission as it took a long time for a specific antibody to arrive to complete the confocal microscopy analyses. We thank the reviewer for the critical expert review and overall positive comments.

      Based on their scRNAseq, the authors propose 41 different BM cell populations, ten of which represented non-hematopoietic cells, including one endothelial cell cluster. The nine remaining skeletal subpopulations were subdivided into multipotent stromal stem cells (MSSC), four distinct populations of osteoprogenitors, one cluster of osteoblasts and three clusters of pre-fibroblasts. Using bioinformatic tools, the authors then compare their results and divisions of subpopulations to some previously published work from others and attempt to delineate lineage relationships using RNA velocity analyses. From these, they propose different paths from which MSSC enter the progenitor stages, and might differentiate into pre-osteoblasts and -fibroblasts.

      It is of interest to note, that apparently adipo-primed cells may also differentiate into osteolineage cells, something that should be further explored or validated. Furthermore, although this analysis yields a large adipo-primed populations, pre-adipocytes and mature adipocytes appear not to be included in the data set the authors used, which should also be explained.

      We thank the reviewer for this comment. We chose to annotate Cluster 5 as adipoprimed cluster based on the higher expression of adipogenic differentiation markers as well as a group of stress-related transcription factors (FOS, FOSB, JUNB, EGR1) (Fig. 2B-C, Figure 2-figure supplement 1C) some of which had been shown to mark bone marrow adipogenic progenitors1. Although at considerably lower levels compared to adipogenic genes, osteogenic genes were also expressed in cluster 5 cells (Fig. 2B and D), indicating the multi-potent potential of this cluster. Therefore, our initial annotation of these cells as adipoprimed progenitors was too narrow as it did not include the possible osteogenic differentiation potential. We apologize for the confusion caused by the inappropriate annotation and, in order to avoid any further confusion, cluster 5 has now been re-annotated as ‘highly adipocytic gene-expressing progenitors (HAGEPs), which we believe is a better representation of the cells. We furthermore agree with the reviewer that in-vivo differentiation needs to be performed to address potential differentiation capacities in future studies.

      With regard to the lack of adipocytes in our data set, we described in the Materials and Methods section that human bone marrow cells were isolated based on density gradient centrifugation. After centrifugation, the mononuclear cell-containing monolayers were harvested for further analysis. However, the resulting supernatant containing mature adipocytic cells was discarded14. Therefore, adipocyte clusters were not identified in our dataset. We have amended the manuscript accordingly (page 5, line 7).

      Regarding the pre-adipocytes, we are not aware of any specific markers for pre-adipocytes in the bone marrow. We examined the only known markers (ICAM1, PPARG, FABP4) that have been shown to mark committed pre-adipocytes in human adipose tissue15. As illustrated in Fig. R1 (below), low expression of all three markers was not restricted to a single distinct cluster but could be found in almost all stromal clusters. These data thus allow us to neither confirm nor exclude the presence of pre-adipocytes in the dataset. Due to the lack of specific markers for pre-adipocytes and the absence of mature adipocytes in the current dataset, it is therefore difficult to identify a well-defined pre-adipocytes cluster.

      Figure R1. UMAP illustration of the normalized expression of the markers for pre-adipocytes in stromal clusters.

      In addition, based on a separate analysis of surface molecules, the authors propose new markers that could be used to prospectively isolate different human subpopulations of BM niche cells by using CD52, CD81 and NCAM1 (=CD56). Indeed, these analyses yield six different populations with differential abilities to form fibroblast-like colonies and differentiate into adipo-, osteo-, and chondrogenic lineages. To explore how the scRNAseq data may help to understand regulatory processes within the BM, the authors predict possible interactions between hematopoietic and non-hematopoietic subpopulations in the BM. These should be further validated, to support statements as the suggestion in the abstract that separate CXCL12- and SPP1-regulated BM niches might exist.

      We agree with the reviewer that functional validation of the CellPhoneDB results using for example in vivo humanized mouse models would be needed to demonstrate the presence of different niches in the bone marrow. At this point of time we only put forward the hypothesis that different niche types exist while we will work on providing experimental proof in our future studies.

      The scRNAseq analysis is indeed a strong and important resource, also for later studies meant to increase knowledge about the hematopoietic niche of the BM. Although the analyses using different bioinformatic tools is very helpful, they remain mostly speculative, since validatory experiments, as already mentioned, are missing. As such, I feel the authors did not succeed in achieving their goals of understanding how non-hematopoietic cells of the BM regulate the different hematopoietic processes within the BM. Nevertheless, they have created valuable resources, both in the scRNAseq data they generated, as well as the different predictions about different cell populations, their lineage relationships, and how they might interact with hematopoietic cells.

      We thank the reviewer for the appreciation of the value of this dataset. We agree with the reviewer that it is of great importance to validate the contribution of potential driver genes for stromal cell differentiation and verify the in vitro data and in-silico prediction using in-vivo models. As the main goal of the current study was to formulate hypotheses based on the scRNAseq data for future studies, we believe that in vivo validation experiments using engineered human bone marrow models or humanized bone marrow ossicles are out of the scope of the current study, but certainly need to be performed in the future.

      The impact of this work is difficult to envision, since validations still need to be performed. Also, it has the born in mind that humans are not mice, which can be studied in neat homogeneous inbred populations. Human populations on the other hand, are quite diverse, so that the data generated in this manuscript and others will probably have to be combined to extrapolate data relevant to the whole of the human population. However, as it is equally difficult to generate reliable scRNAseq data from human BM, it seems likely that the data will indeed an important resource, when more data from different donors become available.

      We thank the reviewer for the generally positive evaluation of this study.

      Taken at point value, the authors provide evidence that human counterparts exist to several BM populations described in mice. In my opinion, the lineage relationships predicted using the RNA velocity analyses need more substance, as it seems the differentiation-paths may diverge from what is known from mice. If so, this issue should be studied more stringently. Similarly, the paper would have been strengthened considerably if a relevant experimental validation would have been attempted, perhaps by using genetically modified (knockdown) MSSC, similar to Battula et al. (doi: 10.1182/blood-2012-06-437988).

      In the study from Welner’s group, stromal differentiation trajectory was inferred based on scRNAseq analysis of murine bone marrow cells using Velocyto16. Velocyto identified MSCs as the ‘source’ cell state with pre-adipocytes, pro-osteoblasts, and prochondrocytes being end states. In our study, the MSSC population was predicted to be at the apex of the trajectory and the pre-osteoblast cluster was placed close to the terminal state of differentiation, which is consistent with the murine study. However, different stromal cell types were identified in mice compared with humans. For example, we have identified prefibroblasts in our dataset which are absent in the murine study, while a well-defined murine pre-adipocyte population was not identified in our human dataset. Therefore, it is not surprising to find some discrepancies between human and murine stromal differentiation trajectories. Of course and as mentioned before, critical in-vivo functional validations need to be carried out to address these important issues in the future.

      In summary, this is a very interesting but also descriptive paper with highly important resources. However, to prospectively identify or isolate human non-hematopoietic/nonendothelial niche populations, more stringent validations should have been performed to strengthen the validity of the different analyses that have been performed. As such, it remains an open question which niche subpopulations has the most impact on the different hematopoietic processes important for normal and stress hematopoiesis, as well as malignancies.

      Thank you for this comment. We completely agree that more stringent validations are necessary but are outside of the aim of our current hypothesis-generating study. Accordingly, we are planning functional verification studies using genetically manipulated stromal cells in combination with in-vivo humanized ossicles. Furthermore, other groups will hopefully use our database and contribute with functional studies in model systems that are currently not available to us, e.g. iPS-derived bone marrow in-vitro proxies.

      Specific remarks

      • Since CD45, CD235a, and CD271 are used as distinguishing markers in the sample preparation of the scRNAseq, it would be helpful to highlight these markers in the different analyses (Figures 1D, 2B, 2C-F, and 4A), and restrict the analyses to those cells that also not express CD45, CD235a (why use CD71?) and highly express CD271.

      Thank you for this comment. As shown in Fig. R2, we have modified figures Fig. 1D, 2B, and 4A showing now also the expression of PTPRC (CD45), GYPA (CD235a), and NGFR (CD271) on the top (Fig. 1D and 2B) or right (Fig. 4A) panel of the figures. To complement Fig. 2C-F, we have generated new stacked violin plots showing the expression level of three markers by all 9 stromal clusters (Fig. R2B). As we believe that including these three markers in the figures does not provide a better strategy to improve the analyses, we decided to leave the original figures unchanged in this respect.

      Figure R2. (A) Modified Fig. 1D, 2B and 4A with PTPRC (CD45), GYPA (CD235a) and NGFR (CD271) expression. (B) Stacked violin plots of PTPRC, GYPA and NGFR expressed by stromal clusters to complement Fig. 2C-F.

      With regard to cell exclusion based on CD45, as shown in the modified Figure corresponding to Fig 1A in the manuscript (Fig R2A), CD45 gene expression is observed also in the endothelial cluster, basal cluster, and neuronal cluster (Fig. R2A). These clusters represent non-hematopoietic clusters that we would like to keep in our dataset for further analysis, such as cell-cell interaction. Therefore, we choose to not restrict the analysis to solely CD45 nonexpressing cells.

      With regard to CD235a (GYPA), expression of CD235a is not detected in any of the nonhematopoietic clusters. Thus, CD235a-expressing cell exclusion is not necessary.

      For CD271, according to our previous results (own unpublished data, belonging to a dataset of which only significantly expressed genes were reported in Li et al.8), protein expression of CD271 is not necessarily reflected by gene expression. In the other words, stromal cells with CD271 protein expression do not always have high mRNA expression. A significant fraction of stromal cells would be excluded if we restrict the analyses only to those cells that show high CD271 gene expression, which would not reflect the real cellular composition of human bone marrow stroma. In order to not risk losing stromal cells, we therefore kept our previous analyses which included stromal cells with various CD271 expression levels.

      With regard to using CD71 as an exclusion marker, please see also the comments to reviewer 1. Briefly, according to our data, CD71 (TFRC)-expressing erythroid precursors could still be found after excluding CD45 and CD235a positive cells (Figure 1-figure supplement 1B and R3). As furthermore shown in Figure 1-figure supplement 1G and R2, CD71 expression in the stromal clusters is negligible. Therefore, we believe that this justifies the use of CD71 as an additional marker to exclude erythroid cells. We have amended the discussion to address this issue (page 19, lines 7-8).

      Figure R3. FACS plots illustrating the expression of (A) CD71 (TFRC) vs CD271 in CD45- CD235a- cells and (B) FSC-A vs CD81 in CD45-CD235a-CD271+CD71+ cells following exclusion of doublets and dead cells.

      • Despite a distinct neuronal cluster (39), there does not seem to be a distinctive marker for these cells. Is this true?

      Yes, the reviewer is correct that there is no significantly-expressed distinctive marker for neuronal cells. Multiple markers indicating the presence of different cell types were identified in cluster 39 (Supplementary File 4). Among them, several neuronal markers (NEUROD1, CHGB, ELAVL2, ELAVL3, ELAVL4, STMN2, INSM1, ZIC2, NNAT) were found to be enriched in this cluster (Supplementary File 4 and Fig. 1D) with higher fold changes compared to other identified genes. However, the expression of these genes was not statistically significant, which is mainly due to the heterogeneity of the cluster and thus does not allow us to draw any firm conclusions.

      Several genes including MALAT1, HNRNPH1, AC010970.1, and AD000090.1 were identified to be statistically highly expressed by cluster 39 (Supplementary File 4). The expression of these genes is not restricted to any specific cell type. It is therefore impossible to annotate the cluster based on this and our data thus indicated that cluster 39 is a heterogeneous population containing multiple cell types. Based on the expression of neuronal markers, we nevertheless chose to annotate Cluster 39 as “neuronal” as the prominent expression of neuronal markers indicated the presence of neurons in this cluster. To be more accurate, the annotation of cluster 39 has been changed to ‘neuronal cell-containing cluster’ to correctly reflect the presence of non-neuronal gene expressing cells as well (page 29, lines 3-8).

      • Since based on 2C and 2D, the authors are unable to distinguish adipo- from osteogenic cells, would the authors use the same molecules to distinguish different populations of 2C-D, or would they use other markers, if so which and why.

      We agree with the reviewer that at the first glance adipo-primed (cluster 5, now annotated as “highly adipocytic gene-expressing progenitors”, HAGEPs), balanced progenitors (cluster 16), and pre-osteoblasts (cluster 38) shared a similar expression pattern according to the violin plots in Fig. 2C and 2D. However, as illustrated in the heatmap (Fig. 2B), the expression patterns of adipo-primed (HAGEP) and balanced progenitors were quite different in terms of their expression of adipogenic and osteogenic markers. Both adipogenic and osteogenic marker expression was detected in HAGEPs, balanced progenitors, and preosteoblasts. Thus, as violin plots are summarizing the overall expression levels of a certain marker in a certain cluster, these plots tend to make it more difficult to detect differential expression patterns between different clusters. In this case, the heatmap shown in Fig. 2B is a good complement to the violin plots as it is demonstrating the different expression patterns of every cell in the different stromal clusters.

      Additionally, cluster 5 showed the expression of a group of stress-related transcription factors (FOS, FOSB, JUNB, EGR1) (Fig. 2B and Figure 2-figure supplement 1C), some of which had been shown to mark bone marrow adipogenic progenitors1. The expression of the abovementioned stress-related transcription factors (putative adipogenic progenitor markers) was generally lower in cluster 38 compared to cluster 5, further demonstrating that clusters were different.

      Furthermore, there was a gradual upregulation of more mature osteogenic markers such as RUNX1, CDH11, EBF1, and EBF3 from cluster 5 to cluster 16 and finally cluster 38. As shown in Fig. 2D, the expression of these markers was higher in cluster 38 compared to cluster 5. Therefore, cluster 38 was annotated as pre-osteoblasts.

      Most of the stromal clusters form a continuum (Fig. 2A), which correlates very well with the gradual transition of different cellular states during stromal cell development. It is highly unlikely that abrupt and dramatic gene expression changes would occur during the cellular state transition of cells of the same lineage. Therefore, it is not surprising to find the differences in gene expression profiles between stromal clusters share a certain level of similarities.

      In summary, we rely on several factors to distinguish different stromal clusters, which include canonical adipo-, osteo- and chondrogenic markers, stress markers, heatmap, violin plots, and the gradual up-regulation of certain lineage-specific markers.

      To directly answer the reviewer’s question, we believe that we are able to distinguish different stromal clusters based on our data.

      • In de Jong et al., an inflammatory MSC population (iMSC) is defined. Since the Schneider group showed that inflammatory S100A8 and A9 are expressed by inflamed MSC, is it possible that the some of the designated pre-fibroblasts actually correspond to these S100A8/A9-expressing iMSC?

      We thank the reviewer for raising this interesting question.

      First of all, we would like to point out that scRNAseq was performed using viably frozen bone marrow aspirates in de Jong’s study while freshly isolated bone marrows were used in our study. There might be discrepancies between frozen and fresh bone marrow samples in terms of cellular composition including stromal composition and, importantly, processinginduced stress-related gene expression profiles.

      To investigate if designated pre-fibroblasts actually correspond to iMSCs as suggested by the reviewer, we have re-examined the expression of some of the key iMSC genes as reported by de Jong et al 17. As shown in Fig. R6, the markers that can distinguish iMSC from other MSC clusters in de Jong et al. study were not exclusively expressed by pre-fibroblasts, but also by other stromal cell types including HAGEPs, balanced progenitors, and pre-osteoblasts.

      In the study by R. Schneider’s group18, significant upregulation of S100A8/S100A9 was observed in stromal cells from patients with myelofibrosis. Furthermore, base-line expression of S100A8/A9 was also observed in the fibroblast clusters in the control group, which correlates very well with our data of S100A8/9 expression in pre-fibroblasts in normal donors (Fig. 2F). Our data thus indicate – in line with Schneider’s findings - that there is a baseline level expression of S100A8/9 in fibroblasts in hematologically normal samples and that the expression of S100A8/9 is not restricted to inflamed MSC.

      In summary, the gene expression profiles observed in our study do not indicate the presence of iMSC in the healthy bone marrow.

      • Figure 3A: Do human adipo-primed cells (cluster 5) indeed differentiate into osteogenic cells (clusters 6, 38, and 39). This would be highly unexpected. Can the authors substantiate this "reliable outcome of the RNA velocity analysis"?

      Please refer to our previous responses regarding this topic. Briefly, as shown in Fig. 2B and D, both osteogenic and adipogenic genes are expressed in cluster 5, indicating the multi-potent potentials of this cluster. Although the cluster was initially annotated as adipo-primed progenitors, this was not intended to exclude the osteogenic differentiation potential of these progenitors. Nevertheless, this annotation did not correctly reflect the differentiation potential and might thus have caused confusion, for which we apologize. In order to more correctly describe the characteristics of these cells, cluster 5 has now been reannotated as ‘highly adipocytic gene-expressing progenitors (HAGEPs)’.

      In general, the outcome of the RNA velocity analysis needs to be corroborated by in-vivo differentiation experiments. But we believe that functional verification, which would be extensive, is out of the scope of the current study and we will address these questions in future studies.

      • How statistically certain are the authors, that the populations in Figure 4B as defined by flow cytometry, correspond to MSSC, adipo-primed cells, osteoprogenitors, etc., as defined by scRNAseq?

      To address this question, we sorted the A1-A4 populations and performed RT- PCR to examine the CD81 expression level in each cluster. As shown in Figure 4-figure supplement 1B, CD81 expression levels were higher in A1 and A2 compared with A3 and A4, which is consistent with the scRNAseq data that showed the highest CD81 expression in MSSCs compared to other clusters (Supplementary File 4).

      The phenotypes defined in this study allowed us to isolate different stromal cell types which demonstrated significant functional differences as described in the manuscript (page 19, lines 17-25; page 20, lines 1-11). These results, in combination with the quantitative real-time PCR results (Figure 4-figure supplement 1B), demonstrated that the A1-A4 subsets in FACS are functionally distinct populations and are likely to be – at least in large parts – identical or equivalent to the transcriptionally identified clusters in group A stromal cells. However, at this point, we do not have performed the required experiments (scRNAseq of sorted cells) that would provide sufficient proof to confirm this statement statistically.

      • The immunohistochemistry results shown do not allow distinct conclusions as the colors give unequivocal mix-colors, and surface expression cannot be distinguished from intracellular expression. Please use a 3D (confocal) method for such statements.

      We thank the reviewer for the suggestion and we have performed additional confocal microscopy analysis of human bone marrow biopsies as suggested by the reviewer. Representative confocal images are now presented in the middle and right panel of Fig. 6E. We also include a separate file (Supplemental confocal image file). Here, confocal scans of all maker combinations are shown as ortho views in addition to detailed intensity profile analyses of the cells of interest clearly distinguishing surface staining from intracellular staining.

      Confocal analysis of bone marrow biopsies confirmed our findings presented in the manuscript. As observed in the scanning images, CD271-expressing cells were negative for CD45 and were located in perivascular, endosteal, and peri-adipocytic regions. CD271/CD81double positive cells could be found either in the peri-adipocytic regions or perivascular regions while CD271/NCAM1 double-positive cells were exclusively situated at the bone-lining endosteal regions. The results of the confocal analysis have been added to the revised manuscript (page 21, lines 15-17).

      • Figure 5A: as all cells seem to interact with all other cells, this figure does not convey relevant information about BM regions using for instance CXCL12 or SPP1. Please reanalyze to show specificity of the interactions of the single clusters. Also, since it is unlikely the CellPhoneDB2-predicted interactions are restricted to hematopoietic responders, please also describe the possible interactions between non-hematopoietic cells.

      Fig. 5A was used to demonstrate the complexity of the interactions between hematopoietic cells and stromal cells.

      To gain a more detailed understanding of the interactions, we also performed an analysis with the top-listed ligand-receptor pairs as shown in Fig. 5B-C and Figure 5-figure supplement 1B. Here, each dot represents the interaction of a specific ligand-receptor pair listed on the x-axis between the two individual clusters indicated in the y-axis, which we believe shows what the reviewer is asking for.

      The specificity of the interactions between single clusters were shown in Fig. 5B-C and Figure 5-figure supplement 1B. The CXCL12- and SPP1-mediated interactions between MSSC/OC and hematopoietic clusters clearly suggested stromal cell type-specific interactions.

      Regarding non-hematopoietic cells, both inter- and intra-stromal interactions were identified to be operative between different stromal subsets as well as within the same stromal cell population as shown in Figure 5-figure supplement 3B. In addition, we have also analyzed the interaction pattern between endothelial cells and hematopoietic cells as shown in Fig. 7A, and thus we believe that we have sufficiently described these interactions as requested by the reviewer.

    1. Author Response

      Reviewer #2 (Public Review):

      Point 1: The transcriptomic analysis of E12.5 endocardial cushion cells in the various mouse models is informative in the extraction of Igf2- and H19-specific gene functions. In Fig. 6D, a huge sex effect is obvious with many more DEGs in female embryos compared to males. How can this be explained given that Igf2/H19 reside on Chr7 and do not primarily affect gene expression on the X chromosome? Is any chromosomal bias observed in the genomic distribution of DEGs?

      We examined chromosomal distribution of DEGs between WT and +/hIC1 (Supplemental Figure 6D) and did not see any bias on X chromosome. We described this result on lines 278-280: “Although the number of +/hIC1-specific DEGs largely differed between males and females, there was no sex-specific bias on the X chromosome (Supplemental Figure 6D).” Additionally, we agree with the reviewer that it is noteworthy that the dysregulated H19/Igf2 expression affected transcriptome in a sex-specific manner, especially when the mutation is located on a somatic chromosome. Although investigating the role of hormones versus sex chromosome in these effects would be quite interesting, it is beyond the scope of current study.

      Point 2: A separate issue is raised by Fig. 6E that shows a most dramatic dysregulation of a single gene in the delta3.8/hIC1 "rescue" model. Interestingly, this gene is Shh. Hence, these embryos should exhibit some dramatic skeletal abnormalities or other defects linked to sonic hedgehog function.

      The reason why Shh appeared to be differentially expressed between wild-type and d3.8/hIC1 samples was that Shh expression was 0 across all the samples except for two wild-type samples. In order to detect all the DEGs that might be lowly expressed, we did not want to filter DEGs based on the level of total expression. As a result, Shh was represented as significantly differently expressed in d3.8/hIC1 samples, although its expression in our samples appears to be too low to have any significant effect on development. This explanation was added to lines 310-312. To confirm that this was an exceptional case, we analyzed the expression of DEGs obtained from other pairwise comparisons. In the volcano plots below, genes of which expression is not statistically different between two groups are marked grey. Genes of which expression is statistically different and detected in both groups are marked red. Genes with statistically different but not detected in one group at all, such as Shh, are marked blue (Figure G). It is clear that that almost all of our DEGs are expressed consistently across the groups, and genes with no expression detected in one group are very rare.

      Point 3: The placental analysis needs to be strengthened. Placentas should be consistently positioned with the decidua facing up, and the chorionic plate down. The placentas in Fig. 3F are sectioned at an angle and the chorionic plate is missing. These images must be replaced with better histological sections.

      As requested, we have replaced placental images with better representative sections (Figure 3F and 4E). In addition, we have improved alignment of placental histology figures.

      Point 4: The CD34 staining has not worked and does not show any fetal vasculature, in particular not in the WT sample.

      As requested, we have replaced the CD34 vascular stained images with those that better represent fetal vasculature (Figure 3G).

      Point 5: The "thrombi" highlighted in Fig. 4E are well within the normal range, to make the point that these are persistent abnormalities more thorough measurements would need to be performed (number, size, etc).

      As requested, we measured the number and relative size of the thrombi that are found in dH19/hIC1 placentas with lesions. No thrombi were found in wild-type placentas whereas an average of 1.3 thrombi were found in six dH19/hIC1 placentas. The size of the thrombi widely varied, but occupied average of 2.58% of the labyrinth zone where these lesions were found (Supplemental Figure 4D). Additionally, we replaced the image in Figure 4E into the section that better represents the lesion.

      Point 6: The statement that H19 is disproportionately contributing to the labyrinth phenotype (lines 154/155) is not warranted as Igf2 expression is reduced to virtually nothing in these mice. Even though there is more H19 in the labyrinth than in the junctional zone, the phenotype may still be driven by a loss of Igf2. Given the quasi Igf2-null situation in +/hIC1 mice, is the glycogen cell type phenotype recapitulated in these mice, and how do glycogen numbers compare in the other mouse models?

      The sentence was edited in line 157. We performed Periodic acid Schiff (PAS) staining on +/hIC1 placentas to address if glycogen cells are affected by abnormal H19/Igf2 expression (Supplemental Figure 1E). In contrary to previous reports where Igf2-null mice had lower placental glycogen concentration (Lopez et al., 1996) and H19 deletion led to increased placental glycogen storage (Esquiliano et al., 2009), our quantification on PAS-stained images showed that the glycogen content is not significantly different between wild-type and +/hIC1 placentas. We have described this result in lines 166-168.

      Point 7: How do delta3.8/+ and delta3.8/hIC1 mice with a VSD survive? Is it resolved some time after birth such that heart function is compatible with postnatal viability? And more importantly, do H19 expression levels correlate with phenotype severity on an individual basis?

      Our study was limited to phenotypes prior to birth, thus postnatal/adult phenotypes were not examined. Because the VSD showed only partial penetrance in these mice, we cannot state that the d3.8/+ or d3.8/hlC1 mice with VSDs survive. It has also been previously reported in another mouse model with incomplete penetrance of a VSD that the mice which survived to adulthood did not have the VSDs (Sakata et al., 2002). We find it highly unlikely that either mouse model would survive significantly past the postnatal timepoint with a VSD. We have examined two PN0 d3.8/hIC1 neonates, and both did not have VSD.

      Regarding the second point, the only way to quantitatively address this question would be to do qPCR or RNA-seq on individual hearts, which then makes it impossible for those hearts to be examined for histology to confirm the VSD. Thus, hearts used to identify VSDs via histology could not also be used for quantitative H19 measurements. One thing to note is that the H19/Igf2 expression in independent replicates of d3.8/hIC1 cardiac ECs used in our RNA-seq experiment is quite variable, not clustering together in contrast to other mouse models used in this study (Fig. 6A). Such wide range of variability in the extent of H19/Igf2 dysregulation suggests that H19/Igf2 levels could have an impact on the penetrance or the severity of the VSD phenotype in d3.8/hIC1 embryos.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1) The relevance of the LPS-induced calvarial osteolysis model is not clear. Calvaria is mostly composed of cortical bone-like structures lacking marrow space, though small marrow space exists near the suture. Osteolysis appears to occur in areas apart from where marrow is located. The authors did not show in the manuscript which cells Adipoq-Cre marks in the calvaria.

      We have shown in a recent publication that MALPs exist in the calvarial bone marrow (2). As shown in Fig. R1A, Td+ cells are layer of cortical bone (Fig. R1B, blue arrows). In WT mice, after LPS injection, the normal bone structure, including suture and cortical bone, were mostly eroded, and filled with inflammatory cells (green arrows). Thus, osteolysis does occur at the area where bone marrow is originally located. On the contrary, calvarial bone structure was preserved in the CKO mice, demonstrating that Csf1 deficiency in MALPs suppresses LPS-induced osteolysis. We included the H&E staining data in the revised manuscript:

      "H&E staining showed that calvarial bone marrow is surrounded by a thin layer of cortical bone (Fig. 5C). After the LPS injection, normal calvarial structure, including suture and cortical bone, were mostly eroded and filled with inflammatory cells in WT mice, but unaltered in CKO mice."

      Figure R1. Calvarial bone marrow structure. (A) Representative coronal section of 1.5-month-old Adipoq/Td mouse calvaria. Bone surfaces are outlined by dashed lines. Boxed areas in the low magnification image (top) are enlarged to show periosteum (bottom left), suture (bottom middle), and bone marrow (BM, bottom right) regions. Red: Td; Blue: DAPI. Adopted from our previous publication (2). (B) H&E staining of coronal sections of WT and Csf1 CKOAdipoq mice after LPS injection. Blue arrows point to bone marrow space close to suture (indicated by *). Green arrows point to the osteolytic lesion where cortical bone was eroded, and the space were filled with inflammatory cells.

      2) Although the contrast between the two Csf1 conditional deletion models (Adipoq-Cre and Prx1-Cre) is very interesting, the relationship between these two cell populations are not well described. The authors did not clarify if MALPs are also targeted by Prx1-Cre, or these two cell types are from different cell lineages. "Other mesenchymal lineage cells" in the subtitle is not extremely helpful to place this finding in context.

      We thank the Reviewer for this comment. The original article constructing Prx1-Cre mouse line demonstrates that Prx1-Cre targets all mesenchymal cells in the limb bud at early as 10.5 dpc (10). This early expression pattern ensures that all bone marrow mesenchymal lineage cells, including MALPs, are targeted by Prx1-Cre. In addition, based on our scRNA-seq data (1), Adipoq is mainly expressed in MALPs, while Prrx1 (Prx1) is highly expressed not only in MALPs but also in EMPs, IMPs, LMPs, LCPs, and OBs (Fig. R2). Thus, the fact that Prx1-Cre driven CKO mice have much more severer bone phenotypes than AdipoqCre driven CKO mice indicates that mesenchymal lineage cells other than MALPs also contribute Csf1 to regulate bone resorption. To avoid confusion, we changed the title and the first sentence in the Result session about Prx1 mice to the following:

      "Csf1 from mesenchymal lineage cells other than MALPs regulate bone structure.

      To explore whether Csf1 from MALPs plays a dominant role in regulating bone structure, we generated Prx1-Cre Csf1flox/flox (Csf1 CKOPrx1) mice to knockout Csf1 in all mesenchymal lineage cells in bone (10), including MALPs."

      Figure R2. Dotplot of Prrx1 and Adipoq expression in bone marrow mesenchymal lineage cells based on our scRNA-seq analysis of 1-month-old mice.

      3) The data supporting defective bone marrow hematopoiesis in Csf1 CKO mice are not particularly strong. They observed a reduction in bone marrow cellularity, but this was only associated with an expected reduction in macrophages and a mild reduction in overall HSPC populations. More in-depth analyses might be required to define mechanisms underlying reduced bone marrow cellularity in CKO mice.

      We thank the Reviewer for this constructive comment. Accordingly, we performed a thorough analysis of bone marrow hematopoietic compartments and observed significant decreases of monocytes and erythroid progenitors in CKO mice compared to WT mice. These results are now included as Fig. 6E.

      4) Some of the phenotypic analyses are still incomplete. The authors did not report whether CHet (Adipoq-Cre Csf1(flox/+)) showed any bone phenotype. Further, the authors did not report whether Csf1 mRNA or M-Csf protein is indeed expressed by MALPs, with current evidence solely reliant on scRNAseq and qPCR data of bulk-isolated cells. More specific histological methods will be helpful to support the premise of the study.

      A pilot microCT study revealed the same femoral trabecular bone structure in WT and Adipoq-Cre Csf1flox/+ (Csf1 Het) mice at 3 months of age (Fig. R3). While the sample number for Het is low, we are confident about this conclusion.

      Figure R3. MicroCT measurement of trabecular bone structural parameters from WT and Csf1 Het mice. BV/TV: bone volume fraction; BMD: bone mineral density; Tb.N: trabecular number; Tb.Th: trabecular thickness; Tb.Sp: trabecular separation; SMI: structural model index. n=3-8 mice/group.

    1. Author response:

      Reviewer #1 (Public Review):

      In this paper, Tompary & Davachi present work looking at how memories become integrated over time in the brain, and relating those mechanisms to responses on a priming task as a behavioral measure of memory linkage. They find that remotely but not recently formed memories are behaviorally linked and that this is associated with a change in the neural representation in mPFC. They also find that the same behavioral outcomes are associated with the increased coupling of the posterior hippocampus with category-sensitive parts of the neocortex (LOC) during a post-learning rest period-again only for remotely learned information. There was also correspondence in rest connectivity (posterior hippocampus-LOC) and representational change (mPFC) such that for remote memories specifically, the initial post-learning connectivity enhancement during rest related to longer-term mPFC representational change.

      This work has many strengths. The topic of this paper is very interesting, and the data provide a really nice package in terms of providing a mechanistic account of how memories become integrated over a delay. The paper is also exceptionally well-written and a pleasure to read. There are two studies, including one large behavioral study, and the findings replicate in the smaller fMRI sample. I do however have two fairly substantive concerns about the analytic approach, where more data will be required before we can know whether the interpretations are an appropriate reflection of the findings. These and other concerns are described below.

      Thank you for the positive comments! We are proud of this work, and we feel that the paper is greatly strengthened by the revisions we made in response to your feedback. Please see below for specific changes that we’ve made.

      1) One major concern relates to the lack of a pre-encoding baseline scan prior to recent learning.

      a) First, I think it would be helpful if the authors could clarify why there was no pre-learning rest scan dedicated to the recent condition. Was this simply a feasibility consideration, or were there theoretical reasons why this would be less "clean"? Including this information in the paper would be helpful for context. Apologies if I missed this detail in the paper.

      This is a great point and something that we struggled with when developing this experiment. We considered several factors when deciding whether to include a pre-learning baseline on day two. First, the day 2 scan session was longer than that of day 1 because it included the recognition priming and explicit memory tasks, and the addition of a baseline scan would have made the length of the session longer than a typical scan session – about 2 hours in the scanner in total – and we were concerned that participant engagement would be difficult to sustain across a longer session. Second, we anticipated that the pre-learning scan would not have been a ‘clean’ measure of baseline processing, but rather would include signal related to post-learning processing of the day 1 sequences, as multi-variate reactivation of learned stimuli have been observed in rest scans collected 24-hours after learning (Schlichting & Preston, 2014). We have added these considerations to the Discussion (page 39, lines 1047-1070).

      b) Second, I was hoping the authors could speak to what they think is reflected in the post-encoding "recent" scan. Is it possible that these data could also reflect the processing of the remote memories? I think, though am not positive, that the authors may be alluding to this in the penultimate paragraph of the discussion (p. 33) when noting the LOC-mPFC connectivity findings. Could there be the reinstatement of the old memories due to being back in the same experimental context and so forth? I wonder the extent to which the authors think the data from this scan can be reflected as strictly reflecting recent memories, particularly given it is relative to the pre-encoding baseline from before the remote memories, as well (and therefore in theory could reflect both the remote + recent). (I should also acknowledge that, if it is the case that the authors think there might be some remote memory processing during the recent learning session in general, a pre-learning rest scan might not have been "clean" either, in that it could have reflected some processing of the remote memories-i.e., perhaps a clean pre-learning scan for the recent learning session related to point 1a is simply not possible.)

      We propose that theoretically, the post-learning recent scan could indeed reflect mixture of remote and recent sequences. This is one of the drawbacks of splitting encoding into two sessions rather than combining encoding into one session and splitting retrieval into an immediate and delayed session; any rest scans that are collected on Day 2 may have signal that relates to processing of the Day 1 remote sequences, which is why we decided against the pre-learning baseline for Day 2, as you had noted.

      You are correct that we alluded to in our original submission when discussing the LOC-mPFC coupling result, and we have taken steps to discuss this more explicitly. In Brief, we find greater LOC-mPFC connectivity only after recent learning relative to the pre-learning baseline, and cortical-cortical connectivity could be indicative of processing memories that already have undergone some consolidation (Takashima et al., 2009; Smith et al., 2010). From another vantage point, the mPFC representation of Day 1 learning may have led to increased connectivity with LOC on Day 2 due to Day 1 learning beginning to resemble consolidated prior knowledge (van Kesteren et al., 2010). While this effect is consistent with prior literature and theory, it's unclear why we would find evidence of processing of the remote memories and not the recent memories. Furthermore, the change in LOC-mPFC connectivity in this scan did not correlate with memory behaviors from either learning session, which could be because signal from this scan reflects a mix of processing of the two different learning sessions. With these ideas in mind, we have fleshed out the discussion of the post-encoding ‘recent’ scan in the Discussion (page 38-39, lines 1039-1044).

      c) Third, I am thinking about how both of the above issues might relate to the authors' findings, and would love to see more added to the paper to address this point. Specifically, I assume there are fluctuations in baseline connectivity profile across days within a person, such that the pre-learning connectivity on day 1 might be different from on day 2. Given that, and the lack of a pre-learning connectivity measure on day 2, it would logically follow that the measure of connectivity change from pre- to post-learning is going to be cleaner for the remote memories. In other words, could the lack of connectivity change observed for the recent scan simply be due to the lack of a within-day baseline? Given that otherwise, the post-learning rest should be the same in that it is an immediate reflection of how connectivity changes as a function of learning (depending on whether the authors think that the "recent" scan is actually reflecting "recent + remote"), it seems odd that they both don't show the same corresponding increase in connectivity-which makes me think it may be a baseline difference. I am not sure if this is what the authors are implying when they talk about how day 1 is most similar to prior investigation on p. 20, but if so it might be helpful to state that directly.

      We agree that it is puzzling that we don’t see that hippocampal-LOC connectivity does not also increase after recent learning, equivalently to what we see after remote learning. However, the fact that there is an increase from baseline rest to post-recent rest in mPFC – LOC connectivity suggests that it’s not an issue with baseline, but rather that the post-recent learning scan is reflecting processing of the remote memories (although as a caveat, there is no relationship with priming).

      On what is now page 23, we were referring to the notion that the Day 1 procedure (baseline rest, learning, post-learning rest) is the most straightforward replication of past work that finds a relationship between hippocampal-cortical coupling and later memory. In contrast, the Day 2 learning and rest scan are less ‘clean’ of a replication in that they are taking place in the shadow of Day 1 learning. We have clarified this in the Results (page 23, lines 597-598).

      d) Fourth and very related to my point 1c, I wonder if the lack of correlations for the recent scan with behavior is interpretable, or if it might just be that this is a noisy measure due to imperfect baseline correction. Do the authors have any data or logic they might be able to provide that could speak to these points? One thing that comes to mind is seeing whether the raw post-learning connectivity values (separately for both recent and remote) show the same pattern as the different scores. However, the authors may come up with other clever ways to address this point. If not, it might be worth acknowledging this interpretive challenge in the Discussion.

      We thought of three different approaches that could help us to understand whether the lack of correlations in between coupling and behavior in the recent scan was due to noise. First, we correlated recognition priming with raw hippocampal-LOC coupling separately for pre- and post-learning scans, as in Author response image 1:

      Author response image 1.

      Note that the post-learning chart depicts the relationship between post-remote coupling and remote priming and between post-recent coupling and recent priming (middle). Essentially, post-recent learning coupling did not relate to priming of recently learned sequences (middle; green) while there remains a trend for a relationship between post-remote coupling and priming for remotely learned sequences (middle; blue). However, the significant relationship between coupling and priming that we reported in the paper (right, blue) is driven both by the initial negative relationship that is observed in the pre-learning scan and the positive relationship in the post-remote learning scan. This highlights the importance of using a change score, as there may be spurious initial relationships between connectivity profiles and to-be-learned information that would then mask any learning- and consolidation-related changes.

      We also reasoned that if comparisons between the post-recent learning scan and the baseline scan are noisier than between the post-remote learning and baseline scan, there may be differences in the variance of the change scores across participants, such that changes in coupling from baseline to post-recent rest may be more variable than coupling from baseline to post-remote rest. We conducted F-tests to compare the variance of the change in these two hippocampal-LO correlations and found no reliable difference (ratio of difference: F(22, 22) = 0.811, p = .63).

      Finally, we explored whether hippocampal-LOC coupling is more stable across participants if compared across two rest scans within the same imaging session (baseline and post-remote) versus across two scans across two separate sessions (baseline and post-recent). Interestingly, coupling was not reliably correlated across scans in either case (baseline/post-remote: r = 0.03, p = 0.89 Baseline/post-recent: r = 0.07, p = .74).

      Finally, we evaluated whether hippocampal-LOC coupling was correlated across different rest scans (see Author response image 2). We reasoned that if such coupling was more correlated across baseline and post-remote scans relative to baseline and post-recent scans, that would indicate a within-session stability of participants’ connectivity profiles. At the same time, less correlation of coupling across baseline and post-recent scans would be an indication of a noisier change measure as the measure would additionally include a change in individuals’ connectivity profile over time. We found that there was no difference in the correlation of hipp-LO coupling is across sessions, and the correlation was not reliably significant for either session (baseline/post-remote: r = 0.03, p = 0.89; baseline/post-recent: r = 0.07, p = .74; difference: Steiger’s t = 0.12, p = 0.9).

      Author response image 2.

      We have included the raw correlations with priming (page 25, lines 654-661, Supplemental Figure 6) as well as text describing the comparison of variances (page 25, lines 642-653). We did not add the comparison of hippocampal-LOC coupling across scans to the current manuscript, as an evaluation of stability of such coupling in the context of learning and reactivation seems out of scope of the current focus of the experiment, but we find this result to be worthy of follow-up in future work.

      In summary, further analysis of our data did not reveal any indication that a comparison of rest connectivity across scan sessions inserted noise into the change score between baseline and post-recent learning scans. However, these analyses cannot fully rule that possibility out, and the current analyses do not provide concrete evidence that the post-recent learning scan comprises signals that are a mixture of processing of recent and remote sequences. We discuss these drawbacks in the Discussion (page 39, lines 1047-1070).

      2) My second major concern is how the authors have operationalized integration and differentiation. The pattern similarity analysis uses an overall correspondence between the neural similarity and a predicted model as the main metric. In the predicted model, C items that are indirectly associated are more similar to one another than they are C items that are entirely unrelated. The authors are then looking at a change in correspondence (correlation) between the neural data and that prediction model from pre- to post-learning. However, a change in the degree of correspondence with the predicted matrix could be driven by either the unrelated items becoming less similar or the related ones becoming more similar (or both!). Since the interpretation in the paper focuses on change to indirectly related C items, it would be important to report those values directly. For instance, as evidence of differentiation, it would be important to show that there is a greater decrease in similarity for indirectly associated C items than it is for unrelated C items (or even a smaller increase) from pre to post, or that C items that are indirectly related are less similar than are unrelated C items post but not pre-learning. Performing this analysis would confirm that the pattern of results matches the authors' interpretation. This would also impact the interpretation of the subsequent analyses that involve the neural integration measures (e.g., correlation analyses like those on p. 16, which may or may not be driven by increased similarity among overlapping C pairs). I should add that given the specificity to the remote learning in mPFC versus recent in LOC and anterior hippocampus, it is clearly the case that something interesting is going on. However, I think we need more data to understand fully what that "something" is.

      We recognize the importance of understanding whether model fits (and changes to them) are driven by similarity of overlapping pairs or non-overlapping pairs. We have modified all figures that visualize model fits to the neural integration model to separately show fits for pre- and post-learning (Figure 3 for mPFC, Supp. Figure 5 for LOC, Supp. Figure 9 for AB similarity in anterior hippocampus & LOC). We have additionally added supplemental figures to show the complete breakdown of similarity each region in a 2 (pre/post) x 2 (overlapping/non-overlapping sequence) x 2 (recent/remote) chart. We decided against including only these latter charts rather than the model fits since the model fits strike a good balance between information and readability. We have also modified text in various sections to focus on these new results.

      In brief, the decrease in model fit for mPFC for the remote sequences was driven primarily by a decrease in similarity for the overlapping C items and not the non-overlapping ones (Supplementary Figure 3, page 18, lines 468-472).

      Interestingly, in LOC, all C items grew more similar after learning, regardless of their overlap or learning session, but the increase in model fit for C items in the recent condition was driven by a larger increase in similarity for overlapping pairs relative to non-overlapping ones (Supp. Figure 5, page 21, lines 533-536).

      We also visualized AB similarity in the anterior hippocampus and LOC in a similar fashion (Supplementary Figure 9).

      We have also edited the Methods sections with updated details of these analyses (page 52, lines 1392-1397). We think that including these results considerably strengthen our claims and we are pleased to have them included.

      3) The priming task occurred before the post-learning exposure phase and could have impacted the representations. More consideration of this in the paper would be useful. Most critically, since the priming task involves seeing the related C items back-to-back, it would be important to consider whether this experience could have conceivably impacted the neural integration indices. I believe it never would have been the case that unrelated C items were presented sequentially during the priming task, i.e., that related C items always appeared together in this task. I think again the specificity of the remote condition is key and perhaps the authors can leverage this to support their interpretation. Can the authors consider this possibility in the Discussion?

      It's true that only C items from the same sequence were presented back-to-back during the priming task, and that this presentation may interfere with observations from the post-learning exposure scan that followed it. We agree that it is worth considering this caveat and have added language in the Discussion (page 40, lines 1071-1086). When designing the study, we reasoned that it was more important for the behavioral priming task to come before the exposure scans, as all items were shown only once in that task, whereas they were shown 4-5 times in a random order in the post-learning exposure phase. Because of this difference in presentation times, and because behavioral priming findings tend to be very sensitive, we concluded that it was more important to protect the priming task from the exposure scan instead of the reverse.

      We reasoned, however, that the additional presentation of the C items in the recognition priming task would not substantially override the sequence learning, as C items were each presented 16 times in their sequence (ABC1 and ABC2 16 times each). Furthermore, as this reviewer suggests, the order of C items during recognition was the same for recent and remote conditions, so the fact that we find a selective change in neural representation for the remote condition and don’t also see that change for the recent condition is additional assurance that the recognition priming order did not substantially impact the representations.

      4) For the priming task, based on the Figure 2A caption it seems as though every sequence contributes to both the control and primed conditions, but (I believe) this means that the control transition always happens first (and they are always back-to-back). Is this a concern? If RTs are changing over time (getting faster), it would be helpful to know whether the priming effects hold after controlling for trial numbers. I do not think this is a big issue because if it were, you would not expect to see the specificity of the remotely learned information. However, it would be helpful to know given the order of these conditions has to be fixed in their design.

      This is a correct understanding of the trial orders in the recognition priming task. We chose to involve the baseline items in the control condition to boost power – this way, priming of each sequence could be tested, while only presenting each item once in this task, as repetition in the recognition phase would have further facilitated response times and potentially masked any priming effects. We agree that accounting for trial order would be useful here, so we ran a mixed-effects linear model to examine responses times both as a function of trial number and of priming condition (primed/control). While there is indeed a large effect of trial number such that participants got faster over time, the priming effect originally observed in the remote condition still holds at the same time. We now report this analysis in the Results section (page 14, lines 337-349 for Expt 1 and pages 14-15, lines 360-362 for Expt 2).

      5) The authors should be cautious about the general conclusion that memories with overlapping temporal regularities become neurally integrated - given their findings in MPFC are more consistent with overall differentiation (though as noted above, I think we need more data on this to know for sure what is going on).

      We realize this conclusion was overly simplistic and, in several places, have revised the general conclusions to be more specific about the nuanced similarity findings.

      6) It would be worth stating a few more details and perhaps providing additional logic or justification in the main text about the pre- and post-exposure phases were set up and why. How many times each object was presented pre and post, and how the sequencing was determined (were any constraints put in place e.g., such that C1 and C2 did not appear close in time?). What was the cover task (I think this is important to the interpretation & so belongs in the main paper)? Were there considerations involving the fact that this is a different sequence of the same objects the participants would later be learning - e.g., interference, etc.?

      These details can be found in the Methods section (pages 50-51, lines 1337-1353) and we’ve added a new summary of that section in the Results (page 17, lines 424- 425 and 432-435). In brief, a visual hash tag appeared on a small subset of images and participants pressed a button when this occurred, and C1 and C2 objects were presented in separate scans (as were A and B objects) to minimize inflated neural similarity due to temporal proximity.

      Reviewer #2 (Public Review):

      The manuscript by Tompary & Davachi presents results from two experiments, one behavior only and one fMRI plus behavior. They examine the important question of how to separate object memories (C1 and C2) that are never experienced together in time and become linked by shared predictive cues in a sequence (A followed by B followed by one of the C items). The authors developed an implicit priming task that provides a novel behavioral metric for such integration. They find significant C1-C2 priming for sequences that were learned 24h prior to the test, but not for recently learned sequences, suggesting that associative links between the two originally separate memories emerge over an extended period of consolidation. The fMRI study relates this behavioral integration effect to two neural metrics: pattern similarity changes in the medial prefrontal cortex (mPFC) as a measure of neural integration, and changes in hippocampal-LOC connectivity as a measure of post-learning consolidation. While fMRI patterns in mPFC overall show differentiation rather than integration (i.e., C1-C2 representational distances become larger), the authors find a robust correlation such that increasing pattern similarity in mPFC relates to stronger integration in the priming test, and this relationship is again specific to remote memories. Moreover, connectivity between the posterior hippocampus and LOC during post-learning rest is positively related to the behavioral integration effect as well as the mPFC neural similarity index, again specifically for remote memories. Overall, this is a coherent set of findings with interesting theoretical implications for consolidation theories, which will be of broad interest to the memory, learning, and predictive coding communities.

      Strengths:

      1) The implicit associative priming task designed for this study provides a promising new tool for assessing the formation of mnemonic links that influence behavior without explicit retrieval demands. The authors find an interesting dissociation between this implicit measure of memory integration and more commonly used explicit inference measures: a priming effect on the implicit task only evolved after a 24h consolidation period, while the ability to explicitly link the two critical object memories is present immediately after learning. While speculative at this point, these two measures thus appear to tap into neocortical and hippocampal learning processes, respectively, and this potential dissociation will be of interest to future studies investigating time-dependent integration processes in memory.

      2) The experimental task is well designed for isolating pre- vs post-learning changes in neural similarity and connectivity, including important controls of baseline neural similarity and connectivity.

      3) The main claim of a consolidation-dependent effect is supported by a coherent set of findings that relate behavioral integration to neural changes. The specificity of the effects on remote memories makes the results particularly interesting and compelling.

      4) The authors are transparent about unexpected results, for example, the finding that overall similarity in mPFC is consistent with a differentiation rather than an integration model.

      Thank you for the positive comments!

      Weaknesses:

      1) The sequence learning and recognition priming tasks are cleverly designed to isolate the effects of interest while controlling for potential order effects. However, due to the complex nature of the task, it is difficult for the reader to infer all the transition probabilities between item types and how they may influence the behavioral priming results. For example, baseline items (BL) are interspersed between repeated sequences during learning, and thus presumably can only occur before an A item or after a C item. This seems to create non-random predictive relationships such that C is often followed by BL, and BL by A items. If this relationship is reversed during the recognition priming task, where the sequence is always BL-C1-C2, this violation of expectations might slow down reaction times and deflate the baseline measure. It would be helpful if the manuscript explicitly reported transition probabilities for each relevant item type in the priming task relative to the sequence learning task and discussed how a match vs mismatch may influence the observed priming effects.

      We have added a table of transition probabilities across the learning, recognition priming, and exposure scans (now Table 1, page 48). We have also included some additional description of the change in transition probabilities across different tasks in the Methods section. Specifically, if participants are indeed learning item types and rules about their order, then both the control and the primed conditions would violate that order. Since C1 and C2 items never appeared together, viewing C1 would give rise to an expectation of seeing a BL item, which would also be violated. This suggests that our priming effects are driven by sequence-specific relationships rather than learning of the probabilities of different item types. We’ve added this consideration to the Methods section (page 45, lines 1212-1221).

      Another critical point to consider (and that the transition probabilities do not reflect) is that during learning, while C is followed either by A or BL, they are followed by different A or BL items. In contrast, a given A is always followed by the same B object, which is always followed by one of two C objects. While the order of item types is semi-predictable, the order of objects (specific items) themselves are not. This can be seen in the response times during learning, such that response times for A and BL items are always slower than for B and C items. We have explained this nuance in the figure text for Table 1.

      2) The choice of what regions of interest to include in the different sets of analyses could be better motivated. For example, even though briefly discussed in the intro, it remains unclear why the posterior but not the anterior hippocampus is of interest for the connectivity analyses, and why the main target is LOC, not mPFC, given past results including from this group (Tompary & Davachi, 2017). Moreover, for readers not familiar with this literature, it would help if references were provided to suggest that a predictable > unpredictable contrast is well suited for functionally defining mPFC, as done in the present study.

      We have clarified our reasoning for each of these choices throughout the manuscript and believe that our logic is now much more transparent. For an expanded reasoning of why we were motivated to look at posterior and not anterior hippocampus, see pages 6-7, lines 135-159, and our response to R2. In brief, past research focusing on post-encoding connectivity with the hippocampus suggests that posterior aspect is more likely to couple with category-selective cortex after learning neutral, non-rewarded objects much like the stimuli used in the present study.

      We also clarify our reasoning for LOC over mPFC. While theoretically, mPFC is thought to be a candidate region for coupling with the hippocampus during consolidation, the bulk of empirical work to date has revealed post-encoding connectivity between the hippocampus and category-selective cortex in the ventral and occipital lobes (page 6, lines 123-134).

      As for the use of the predictable > unpredictable contrast for functionally defining cortical regions, we reasoned that cortical regions that were sensitive to the temporal regularities generated by the sequences may be further involved in their offline consolidation and long-term storage (Danker & Anderson, 2010; Davachi & Danker, 2013; McClelland et al., 1995). We have added this justification to the Methods section (page 18, lines 454-460).

      3) Relatedly, multiple comparison corrections should be applied in the fMRI integration and connectivity analyses whenever the same contrast is performed on multiple regions in an exploratory manner.

      We now correct for multiple comparisons using Bonferroni correction, and this correction depends on the number of regions in which each analysis is conducted. Please see page 55, lines 1483-1490, in the Methods section for details of each analysis.

      Reviewer #3 (Public Review):

      The authors of this manuscript sought to illuminate a link between a behavioral measure of integration and neural markers of cortical integration associated with systems consolidation (post-encoding connectivity, change in representational neural overlap). To that aim, participants incidentally encoded sequences of objects in the fMRI scanner. Unbeknownst to participants, the first two objects of the presented ABC triplet sequences overlapped for a given pair of sequences. This allowed the authors to probe the integration of unique C objects that were never directly presented in the same sequence, but which shared the same preceding A and B objects. They encoded one set of objects on Day 1 (remote condition), another set of objects 24 hours later (recent condition) and tested implicit and explicit memory for the learned sequences on Day 2. They additionally collected baseline and post-encoding resting-state scans. As their measure of behavioral integration, the authors examined reaction time during an Old/New judgement task for C objects depending on if they were preceded by a C object from an overlapping sequence (primed condition) versus a baseline object. They found faster reaction times for the primed objects compared to the control condition for remote but not recently learned objects, suggesting that the C objects from overlapping sequences became integrated over time. They then examined pattern similarity in a priori ROIs as a measure of neural integration and found that participants showing evidence of integration of C objects from overlapping sequences in the medial prefrontal cortex for remotely learned objects also showed a stronger implicit priming effect between those C objects over time. When they examined the change in connectivity between their ROIs after encoding, they also found that connectivity between the posterior hippocampus and lateral occipital cortex correlated with larger priming effects for remotely learned objects, and that lateral occipital connectivity with the medial prefrontal cortex was related to neural integration of remote objects from overlapping sequences.

      The authors aim to provide evidence of a relationship between behavioral and neural measures of integration with consolidation is interesting, important, and difficult to achieve given the longitudinal nature of studies required to answer this question. Strengths of this study include a creative behavioral task, and solid modelling approaches for fMRI data with careful control for several known confounds such as bold activation on pattern analysis results, motion, and physiological noise. The authors replicate their behavioral observations across two separate experiments, one of which included a large sample size, and found similar results that speak to the reliability of the observed behavioral phenomenon. In addition, they document several correlations between neural measures and task performance, lending functional significance to their neural findings.

      Thank you for this positive assessment of our study!

      However, this study is not without notable weaknesses that limit the strength of the manuscript. The authors report a behavioral priming effect suggestive of integration of remote but not recent memories, leading to the interpretation that the priming effect emerges with consolidation. However, they did not observe a reliable interaction between the priming condition and learning session (recent/remote) on reaction times, meaning that the priming effect for remote memories was not reliably greater than that observed for recent. In addition, the emergence of a priming effect for remote memories does not appear to be due to faster reaction times for primed targets over time (the condition of interest), but rather, slower reaction times for control items in the remote condition compared to recent. These issues limit the strength of the claim that the priming effect observed is due to C items of interest being integrated in a consolidation-dependent manner.

      We acknowledge that the lack of a day by condition interaction in the behavioral priming effect should discussed and now discuss this data in a more nuanced manner. While it’s true that the priming effect emerges due to a slowing of the control items over time, this slowing is consistent with classic time-dependent effects demonstrating slower response times for more delayed memories. The fact that the response times in the primed condition does not show this slowing can be interpreted as a protection against this slowing that would otherwise occur. Please see page 29, lines 758-766, for this added discussion.

      Similarly, the interactions between neural variables of interest and learning session needed to strongly show a significant consolidation-related effect in the brain were sometimes tenuous. There was no reliable difference in neural representational pattern analysis fit to a model of neural integration between the short and long delays in the medial prefrontal cortex or lateral occipital cortex, nor was the posterior hippocampus-lateral occipital cortex post-encoding connectivity correlation with subsequent priming significantly different for recent and remote memories. While the relationship between integration model fit in the medial prefrontal cortex and subsequent priming (which was significantly different from that occurring for recent memories) was one of the stronger findings of the paper in favor of a consolidation-related effect on behavior, is it possible that lack of a behavioral priming effect for recent memories due to possible issues with the control condition could mask a correlation between neural and behavioral integration in the recent memory condition?

      While we acknowledge that lack of a statistically reliable interaction between neural measures and behavioral priming in many cases, we are heartened by the reliable difference in the relationship between mPFC similarity and priming over time, which was our main planned prediction. In addition to adding caveats in the discussion about the neural measures and behavioral findings in the recent condition (see our response to R1.1 and R1.4 for more details), we have added language throughout the manuscript noting the need to interpret these data with caution.

      These limitations are especially notable when one considers that priming does not classically require a period of prolonged consolidation to occur, and prominent models of systems consolidation rather pertain to explicit memory. While the authors have provided evidence that neural integration in the medial prefrontal cortex, as well as post-encoding coupling between the lateral occipital cortex and posterior hippocampus, are related to faster reaction times for primed objects of overlapping sequences compared to their control condition, more work is needed to verify that the observed findings indeed reflect consolidation dependent integration as proposed.

      We agree that more work is needed to provide converging evidence for these novel findings. However, we wish to counter the notion that systems consolidation models are relevant only for explicit memories. Although models of systems consolidation often mention transformations from episodic to semantic memory, the critical mechanisms that define the models involve changes in the neural ensembles of a memory that is initially laid down in the hippocampus and is taught to cortex over time. This transformation of neural traces is not specific to explicit/declarative forms of memory. For example, implicit statistical learning initially depends on intact hippocampal function (Schapiro et al., 2014) and improves over consolidation (Durrant et al., 2011, 2013; Kóbor et al., 2017).

      Second, while there are many classical findings of priming during or immediately after learning, there are several instances of priming used to measure consolidation-related changes to newly learned information. For instance, priming has been used as a measure of lexical integration, demonstrating that new word learning benefits from a night of sleep (Wang et al., 2017; Gaskell et al., 2019) or a 1-week delay (Tamminen & Gaskell, 2013). The issue is not whether priming can occur immediately, it is whether priming increases with a delay.

      Finally, it is helpful to think about models of memory systems that divide memory representations not by their explicit/implicit nature, but along other important dimensions such as their neural bases, their flexibility vs rigidity, and their capacity for rapid vs slow learning (Henke, 2010). Considering this evidence, we suggest that systems consolidation models are most useful when considering how transformations in the underlying neural memory representation affects its behavioral expression, rather than focusing on the extent that the memory representation is explicit or implicit.

      With all this said, we have added text to the discussion reminding the reader that there was no statistically significant difference in priming as a function of the delay (page 29, lines 764 - 766). However, we are encouraged by the fact that the relationship between priming and mPFC neural similarity was significantly stronger for remotely learned objects relative to recently learned ones, as this is directly in line with systems consolidation theories.

      References

      Abolghasem, Z., Teng, T. H.-T., Nexha, E., Zhu, C., Jean, C. S., Castrillon, M., Che, E., Di Nallo, E. V., & Schlichting, M. L. (2023). Learning strategy differentially impacts memory connections in children and adults. Developmental Science, 26(4), e13371. https://doi.org/10.1111/desc.13371

      Dobbins, I. G., Schnyer, D. M., Verfaellie, M., & Schacter, D. L. (2004). Cortical activity reductions during repetition priming can result from rapid response learning. Nature, 428(6980), 316–319. https://doi.org/10.1038/nature02400

      Durrant, S. J., Cairney, S. A., & Lewis, P. A. (2013). Overnight consolidation aids the transfer of statistical knowledge from the medial temporal lobe to the striatum. Cerebral Cortex, 23(10), 2467–2478. https://doi.org/10.1093/cercor/bhs244

      Durrant, S. J., Taylor, C., Cairney, S., & Lewis, P. A. (2011). Sleep-dependent consolidation of statistical learning. Neuropsychologia, 49(5), 1322–1331. https://doi.org/10.1016/j.neuropsychologia.2011.02.015

      Gaskell, M. G., Cairney, S. A., & Rodd, J. M. (2019). Contextual priming of word meanings is stabilized over sleep. Cognition, 182, 109–126. https://doi.org/10.1016/j.cognition.2018.09.007

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), 523–532. https://doi.org/10.1038/nrn2850

      Kóbor, A., Janacsek, K., Takács, Á., & Nemeth, D. (2017). Statistical learning leads to persistent memory: Evidence for one-year consolidation. Scientific Reports, 7(1), 760. https://doi.org/10.1038/s41598-017-00807-3

      Kuhl, B. A., & Chun, M. M. (2014). Successful remembering elicits event-specific activity patterns in lateral parietal cortex. The Journal of Neuroscience, 34(23), 8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014

      Richter, F. R., Chanales, A. J. H., & Kuhl, B. A. (2016). Predicting the integration of overlapping memories by decoding mnemonic processing states during learning. NeuroImage, 124, Part A, 323–335. https://doi.org/10.1016/j.neuroimage.2015.08.051

      Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M., & Turk-Browne, N. B. (2014). The necessity of the medial-temporal lobe for statistical learning. Journal of Cognitive Neuroscience, 1–12. https://doi.org/10.1162/jocn_a_00578

      Schlichting, M. L., & Preston, A. R. (2014). Memory reactivation during rest supports upcoming learning of related content. Proceedings of the National Academy of Sciences, 111(44), 15845–15850. https://doi.org/10.1073/pnas.1404396111

      Smith, J. F., Alexander, G. E., Chen, K., Husain, F. T., Kim, J., Pajor, N., & Horwitz, B. (2010). Imaging systems level consolidation of novel associate memories: A longitudinal neuroimaging study. NeuroImage, 50(2), 826–836. https://doi.org/10.1016/j.neuroimage.2009.11.053

      Takashima, A., Nieuwenhuis, I. L. C., Jensen, O., Talamini, L. M., Rijpkema, M., & Fernández, G. (2009). Shift from hippocampal to neocortical centered retrieval network with consolidation. The Journal of Neuroscience, 29(32), 10087–10093. https://doi.org/10.1523/JNEUROSCI.0799-09.2009

      Tamminen, J., & Gaskell, M. G. (2013). Novel word integration in the mental lexicon: Evidence from unmasked and masked semantic priming. The Quarterly Journal of Experimental Psychology, 66(5), 1001–1025. https://doi.org/10.1080/17470218.2012.724694

      van Kesteren, M. T. R. van, Fernández, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proceedings of the National Academy of Sciences, 107(16), 7550–7555. https://doi.org/10.1073/pnas.0914892107

      Wang, H.-C., Savage, G., Gaskell, M. G., Paulin, T., Robidoux, S., & Castles, A. (2017). Bedding down new words: Sleep promotes the emergence of lexical competition in visual word recognition. Psychonomic Bulletin & Review, 24(4), 1186–1193. https://doi.org/10.3758/s13423-016-1182-7

    1. Author Response:

      Reviewer #3:

      The authors modified a previously reported hybrid cytochrome bcc-aa3 supercomplex, consisting of bcc from M. tuberculosis and aa3 from M. smegmatis, (Kim et al 2015) by appending an affinity tag facilitating purification. The cryo-EM experiments are based on the authors' earlier work (Gong et al. 2018) on the structure of the bcc-aa3 supercomplex from M. smegmatis. The authors then determine the structure of the bcc part alone and in complex with Q203 and TB47.

      The manuscript is well written and the obtained results are presented in a concise, clear-cut manner. In general, the data support the conclusions drawn.

      We thank the reviewer for this evaluation.

      To this reviewer, the following points are unclear:

      1. The purified enzyme elutes from the gel filtration column as one peak, but there seems to be no information given on the subunit composition and the enzymatic activity of the purified hybrid cytochrome bcc-aa3 supercomplex.

      See answers to Question 1 from the major Essential Revisions and Question 1 from the minor Essential Revisions.

      "We have now shown that the purified chimeric supercomplex is a functional assembly with a (mean ± s.d., n = 4), in agreement with the previous study that shows M. tuberculosis CIII can functionally complement native M. smegmatis CIII and maintain the growth of M. smegmatis (Kim et al., 2015). The in vitro inhibitions of this enzyme by Q203 and TB47 was determined by means of an DMNQH2/oxygen oxidoreductase activity assay. In the assay, 500 nM Q203 or TB47 was chosen, which is close to the median inhibitory concentration (IC50) obtained from the menadiol-induced oxygen consumption in our previous study (Gong et al., 2018). After addition of Q203 and TB47, the values of turnover number of the hybrid supercomplex are reduced to 5.8 +/- 2.4 e-s-1 (Figure 4-figure supplement 4) and 5.1 +/- 2.9 e-s-1 (Figure 5-figure supplement 4) respectively, from 23.3 +/- 2.4 e-s-1. We have incorporated this new data into the text (lines 90-93, 187-189, 206-209)."

      "The subunit composition of the purified enzyme has now been provided in Figure 2-figure supplement 1."

      1. It is unclear what is the conclusion of the structure comparison (Fig 6) is regarding the affinity of Q203 for M. smegmatis.

      The structural comparison indicates that Q203 should have a similar binding mechanism and a similar effect on the activity of cytochrome bcc from M. smegmatis and M. tuberculosis. This is in good agreement with previous antimycobacterial activity data and inhibition data for the bcc complexes from M. smegmatis and M. tuberculosis (Gong et al., 2018; Lu et al., 2018a). These have now been incorporated into the revised manuscript (line 223-227).

    1. Author Response

      Reviewer #1 (Public Review):

      This study used a multi-day learning paradigm combined with fMRI to reveal neural changes reflecting the learning of new (arbitrary) shape-sound associations. In the scanner, the shapes and sounds are presented separately and together, both before and after learning. When they are presented together, they can be either consistent or inconsistent with the learned associations. The analyses focus on auditory and visual cortices, as well as the object-selective cortex (LOC) and anterior temporal lobe regions (temporal pole (TP) and perirhinal cortex (PRC)). Results revealed several learning-induced changes, particularly in the anterior temporal lobe regions. First, the LOC and PRC showed a reduced bias to shapes vs sounds (presented separately) after learning. Second, the TP responded more strongly to incongruent than congruent shape-sound pairs after learning. Third, the similarity of TP activity patterns to sounds and shapes (presented separately) was increased for non-matching shape-sound comparisons after learning. Fourth, when comparing the pattern similarity of individual features to combined shape-sound stimuli, the PRC showed a reduced bias towards visual features after learning. Finally, comparing patterns to combined shape-sound stimuli before and after learning revealed a reduced (and negative) similarity for incongruent combinations in PRC. These results are all interpreted as evidence for an explicit integrative code of newly learned multimodal objects, in which the whole is different from the sum of the parts.

      The study has many strengths. It addresses a fundamental question that is of broad interest, the learning paradigm is well-designed and controlled, and the stimuli are real 3D stimuli that participants interact with. The manuscript is well written and the figures are very informative, clearly illustrating the analyses performed.

      There are also some weaknesses. The sample size (N=17) is small for detecting the subtle effects of learning. Most of the statistical analyses are not corrected for multiple comparisons (ROIs), and the specificity of the key results to specific regions is also not tested. Furthermore, the evidence for an integrative representation is rather indirect, and alternative interpretations for these results are not considered.

      We thank the reviewer for their careful reading and the positive comments on our manuscript. As suggested, we have conducted additional analyses of theoretically-motivated ROIs and have found that temporal pole and perirhinal cortex are the only regions to show the key experience-dependent transformations. We are much more cautious with respect to multiple comparisons, and have removed a series of post hoc across-ROI comparisons that were irrelevant to the key questions of the present manuscript. The revised manuscript now includes much more discussion about alternative interpretations as suggested by the reviewer (and also by the other reviewers).

      Additionally, we looked into scanning more participants, but our scanner has since had a full upgrade and the sequence used in the current study is no longer supported by our scanner. However, we note that while most analyses contain 17 participants, we employed a within-subject learning design that is not typically used in fMRI experiments and increases our power to detect an effect. This is supported by the robust effect size of the behavioural data, whereby 17 out of 18 participants revealed a learning effect (Cohen’s D = 1.28) and which was replicated in a follow-up experiment with a larger sample size.

      We address the other reviewer comments point-by-point in the below.

      Reviewer #2 (Public Review):

      Li et al. used a four-day fMRI design to investigate how unimodal feature information is combined, integrated, or abstracted to form a multimodal object representation. The experimental question is of great interest and understanding how the human brain combines featural information to form complex representations is relevant for a wide range of researchers in neuroscience, cognitive science, and AI. While most fMRI research on object representations is limited to visual information, the authors examined how visual and auditory information is integrated to form a multimodal object representation. The experimental design is elegant and clever. Three visual shapes and three auditory sounds were used as the unimodal features; the visual shapes were used to create 3D-printed objects. On Day 1, the participants interacted with the 3D objects to learn the visual features, but the objects were not paired with the auditory features, which were played separately. On Day 2, participants were scanned with fMRI while they were exposed to the unimodal visual and auditory features as well as pairs of visual-auditory cues. On Day 3, participants again interacted with the 3D objects but now each was paired with one of the three sounds that played from an internal speaker. On Day 4, participants completed the same fMRI scanning runs they completed on Day 2, except now some visual-auditory feature pairs corresponded with Congruent (learned) objects, and some with Incongruent (unlearned) objects. Using the same fMRI design on Days 2 and 4 enables a well-controlled comparison between feature- and object-evoked neural representations before and after learning. The notable results corresponded to findings in the perirhinal cortex and temporal pole. The authors report (1) that a visual bias on Day 2 for unimodal features in the perirhinal cortex was attenuated after learning on Day 4, (2) a decreased univariate response to congruent vs. incongruent visual-auditory objects in the temporal pole on Day 4, (3) decreased pattern similarity between congruent vs. incongruent pairs of visual and auditory unimodal features in the temporal pole on Day 4, (4) in the perirhinal cortex, visual unimodal features on Day 2 do not correlate with their respective visual-auditory objects on Day 4, and (5) in the perirhinal cortex, multimodal object representations across Days 2 and 4 are uncorrelated for congruent objects and anticorrelated for incongruent. The authors claim that each of these results supports the theory that multimodal objects are represented in an "explicit integrative" code separate from feature representations. While these data are valuable and the results are interesting, the authors' claims are not well supported by their findings.

      We thank the reviewer for the careful reading of our manuscript and positive comments. Overall, we now stay closer to the data when describing the results and provide our interpretation of these results in the discussion section while remaining open to alternative interpretations (as also suggested by Reviewer 1).

      (1) In the introduction, the authors contrast two theories: (a) multimodal objects are represented in the co-activation of unimodal features, and (b) multimodal objects are represented in an explicit integrative code such that the whole is different than the sum of its parts. However, the distinction between these two theories is not straightforward. An explanation of what is precisely meant by "explicit" and "integrative" would clarify the authors' theoretical stance. Perhaps we can assume that an "explicit" representation is a new representation that is created to represent a multimodal object. What is meant by "integrative" is more ambiguous-unimodal features could be integrated within a representation in a manner that preserves the decodability of the unimodal features, or alternatively the multimodal representation could be completely abstracted away from the constituent features such that the features are no longer decodable. Even if the object representation is "explicit" and distinct from the unimodal feature representations, it can in theory still contain featural information, though perhaps warped or transformed. The authors do not clearly commit to a degree of featural abstraction in their theory of "explicit integrative" multimodal object representations which makes it difficult to assess the validity of their claims.

      Due to its ambiguity, we removed the term “explicit” and now make it clear that our central question was whether crossmodal object representations require only unimodal feature-level representations (e.g., frogs are created from only the combination of shape and sound) or whether crossmodal object representations also rely on an integrative code distinct from the unimodal features (e.g., there is something more to “frog” than its original shape and sound). We now clarify this in the revised manuscript.

      “One theoretical view from the cognitive sciences suggests that crossmodal objects are built from component unimodal features represented across distributed sensory regions.8 Under this view, when a child thinks about “frog”, the visual cortex represents the appearance of the shape of the frog whereas the auditory cortex represents the croaking sound. Alternatively, other theoretical views predict that multisensory objects are not only built from their component unimodal sensory features, but that there is also a crossmodal integrative code that is different from the sum of these parts.9,10,11,12,13 These latter views propose that anterior temporal lobe structures can act as a polymodal “hub” that combines separate features into integrated wholes.9,11,14,15” – pg. 4

      For this reason, we designed our paradigm to equate the unimodal representations, such that neural differences between the congruent and incongruent conditions provide evidence for a crossmodal integrative code different from the unimodal features (because the unimodal features are equated by default in the design).

      “Critically, our four-day learning task allowed us to isolate any neural activity associated with integrative coding in anterior temporal lobe structures that emerges with experience and differs from the neural patterns recorded at baseline. The learned and non-learned crossmodal objects were constructed from the same set of three validated shape and sound features, ensuring that factors such as familiarity with the unimodal features, subjective similarity, and feature identity were tightly controlled (Figure 2). If the mind represented crossmodal objects entirely as the reactivation of unimodal shapes and sounds (i.e., objects are constructed from their parts), then there should be no difference between the learned and non-learned objects (because they were created from the same three shapes and sounds). By contrast, if the mind represented crossmodal objects as something over and above their component features (i.e., representations for crossmodal objects rely on integrative coding that is different from the sum of their parts), then there should be behavioral and neural differences between learned and non-learned crossmodal objects (because the only difference across the objects is the learned relationship between the parts). Furthermore, this design allowed us to determine the relationship between the object representation acquired after crossmodal learning and the unimodal feature representations acquired before crossmodal learning. That is, we could examine whether learning led to abstraction of the object representations such that it no longer resembled the unimodal feature representations.” – pg. 5

      Furthermore, we agree with the reviewer that our definition and methodological design does not directly capture the structure of the integrative code. With experience, the unimodal feature representations may be completely abstracted away, warped, or changed in a nonlinear transformation. We suggest that crossmodal learning forms an integrative code that is different from the original unimodal representations in the anterior temporal lobes, however, we agree that future work is needed to more directly capture the structure of the integrative code that emerges with experience.

      “In our task, participants had to differentiate congruent and incongruent objects constructed from the same three shape and sound features (Figure 2). An efficient way to solve this task would be to form distinct object-level outputs from the overlapping unimodal feature-level inputs such that congruent objects are made to be orthogonal from the representations before learning (i.e., measured as pattern similarity equal to 0 in the perirhinal cortex; Figure 5b, 6, Supplemental Figure S5), whereas non-learned incongruent objects could be made to be dissimilar from the representations before learning (i.e., anticorrelation, measured as patten similarity less than 0 in the perirhinal cortex; Figure 6). Because our paradigm could decouple neural responses to the learned object representations (on Day 4) from the original component unimodal features at baseline (on Day 2), these results could be taken as evidence of pattern separation in the human perirhinal cortex.11,12 However, our pattern of results could also be explained by other types of crossmodal integrative coding. For example, incongruent object representations may be less stable than congruent object representations, such that incongruent objects representation are warped to a greater extent than congruent objects (Figure 6).” – pg. 18

      “As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation.” – pg. 18

      (2) After participants learned the multimodal objects, the authors report a decreased univariate response to congruent visual-auditory objects relative to incongruent objects in the temporal pole. This is claimed to support the existence of an explicit, integrative code for multimodal objects. Given the number of alternative explanations for this finding, this claim seems unwarranted. A simpler interpretation of these results is that the temporal pole is responding to the novelty of the incongruent visual-auditory objects. If there is in fact an explicit, integrative multimodal object representation in the temporal pole, it is unclear why this would manifest in a decreased univariate response.

      We thank the reviewer for identifying this issue. Our behavioural design controls unimodal feature-level novelty but allows object-level novelty to differ. Thus, neural differences between the congruent and incongruent conditions reflects sensitivity to the object-level differences between the combination of shape and sound. However, we agree that there are multiple interpretations regarding the nature of how the integrative code is structured in the temporal pole and perirhinal cortex. We have removed the interpretation highlighted by the reviewer from the results. Instead, we now provide our preferred interpretation in the discussion, while acknowledging the other possibilities that the reviewer mentions.

      As one possibility, these results in temporal pole may reflect “conceptual combination”. “hummingbird” – a congruent pairing – may require less neural resources than an incongruent pairing such as “bark-frog”.

      “Furthermore, these distinct anterior temporal lobe structures may be involved with integrative coding in different ways. For example, the crossmodal object representations measured after learning were found to be related to the component unimodal feature representations measured before learning in the temporal pole but not the perirhinal cortex (Figure 5, 6, Supplemental Figure S5). Moreover, pattern similarity for congruent shape-sound pairs were lower than the pattern similarity for incongruent shape-sound pairs after crossmodal learning in the temporal pole but not the perirhinal cortex (Figure 4b, Supplemental Figure S3a). As one interpretation of this pattern of results, the temporal pole may represent new crossmodal objects by combining previously learned knowledge. 8,9,10,11,13,14,15,33 Specifically, research into conceptual combination has linked the anterior temporal lobes to compound object concepts such as “hummingbird”.34,35,36 For example, participants during our task may have represented the sound-based “humming” concept and visually-based “bird” concept on Day 1, forming the crossmodal “hummingbird” concept on Day 3; Figure 1, 2, which may recruit less activity in temporal pole than an incongruent pairing such as “barking-frog”. For these reasons, the temporal pole may form a crossmodal object code based on pre-existing knowledge, resulting in reduced neural activity (Figure 3d) and pattern similarity towards features associated with learned objects (Figure 4b).”– pg. 18

      (3) The authors ran a neural pattern similarity analysis on the unimodal features before and after multimodal object learning. They found that the similarity between visual and auditory features that composed congruent objects decreased in the temporal pole after multimodal object learning. This was interpreted to reflect an explicit integrative code for multimodal objects, though it is not clear why. First, behavioral data show that participants reported increased similarity between the visual and auditory unimodal features within congruent objects after learning, the opposite of what was found in the temporal pole. Second, it is unclear why an analysis of the unimodal features would be interpreted to reflect the nature of the multimodal object representations. Since the same features corresponded with both congruent and incongruent objects, the nature of the feature representations cannot be interpreted to reflect the nature of the object representations per se. Third, using unimodal feature representations to make claims about object representations seems to contradict the theoretical claim that explicit, integrative object representations are distinct from unimodal features. If the learned multimodal object representation exists separately from the unimodal feature representations, there is no reason why the unimodal features themselves would be influenced by the formation of the object representation. Instead, these results seem to more strongly support the theory that multimodal object learning results in a transformation or warping of feature space.

      We apologize for the lack of clarity. We have now overhauled this aspect of our manuscript in an attempt to better highlight key aspects of our experimental design. In particular, because the unimodal features composing the congruent and incongruent objects were equated, neural differences between these conditions would provide evidence for an experience-dependent crossmodal integrative code that is different from its component unimodal features.

      Related to the second and third points, we were looking at the extent to which the original unimodal representations change with crossmodal learning. Before crossmodal learning, we found that the perirhinal cortex tracked the similarity between the individual visual shape features and the crossmodal objects that were composed of those visual shapes – however, there was no evidence that perirhinal cortex was tracking the unimodal sound features on those crossmodal objects. After crossmodal learning, we see that this visual shape bias in perirhinal cortex was no longer present – that is, the representation in perirhinal cortex started to look less like the visual features that comprise the objects. Thus, crossmodal learning transformed the perirhinal representations so that they were no longer predominantly grounded in a single visual modality, which may be a mechanism by which object concepts gain their abstraction. We have now tried to be clearer about this interpretation throughout the paper.

      Notably, we suggest that experience may change both the crossmodal object representations, as well as the unimodal feature representations. For example, we have previously shown that unimodal visual features are influenced by experience in parallel with the representation of the conjunction (e.g., Liang et al., 2020; Cerebral Cortex). Nevertheless, we remain open to the myriad possible structures of the integrative code that might emerge with experience.

      We now clarify these points throughout the manuscript. For example:

      “We then examined whether the original representations would change after participants learned how the features were paired together to make specific crossmodal objects, conducting the same analysis described above after crossmodal learning had taken place (Figure 5b). With this analysis, we sought to measure the relationship between the representation for the learned crossmodal object and the original baseline representation for the unimodal features. More specifically, the voxel-wise activity for unimodal feature runs before crossmodal learning was correlated to the voxel-wise activity for crossmodal object runs after crossmodal learning (Figure 5b). Another linear mixed model which included modality as a fixed factor within each ROI revealed that the perirhinal cortex was no longer biased towards visual shape after crossmodal learning (F1,32 = 0.12, p = 0.73), whereas the temporal pole, LOC, V1, and A1 remained biased towards either visual shape or sound (F1,30-32 between 16.20 and 73.42, all p < 0.001, η2 between 0.35 and 0.70).” – pg. 14

      “To investigate this effect in perirhinal cortex more specifically, we conducted a linear mixed model to directly compare the change in the visual bias of perirhinal representations from before crossmodal learning to after crossmodal learning (green regions in Figure 5a vs. 5b). Specifically, the linear mixed model included learning day (before vs. after crossmodal learning) and modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object). Results revealed a significant interaction between learning day and modality in the perirhinal cortex (F1,775 = 5.56, p = 0.019, η2 = 0.071), meaning that the baseline visual shape bias observed in perirhinal cortex (green region of Figure 5a) was significantly attenuated with experience (green region of Figure 5b). After crossmodal learning, a given shape no longer invoked significant pattern similarity between objects that had the same shape but differed in terms of what they sounded like. Taken together, these results suggest that prior to learning the crossmodal objects, the perirhinal cortex had a default bias toward representing the visual shape information and was not representing sound information of the crossmodal objects. After crossmodal learning, however, the visual shape bias in perirhinal cortex was no longer present. That is, with crossmodal learning, the representations within perirhinal cortex started to look less like the visual features that comprised the crossmodal objects, providing evidence that the perirhinal representations were no longer predominantly grounded in the visual modality.” – pg. 13

      “Importantly, the initial visual shape bias observed in the perirhinal cortex was attenuated by experience (Figure 5, Supplemental Figure S5), suggesting that the perirhinal representations had become abstracted and were no longer predominantly grounded in a single modality after crossmodal learning. One possibility may be that the perirhinal cortex is by default visually driven as an extension to the ventral visual stream,10,11,12 but can act as a polymodal “hub” region for additional crossmodal input following learning.” – pg. 19

      (4) The most compelling evidence the authors provide for their theoretical claims is the finding that, in the perirhinal cortex, the unimodal feature representations on Day 2 do not correlate with the multimodal objects they comprise on Day 4. This suggests that the learned multimodal object representations are not combinations of their unimodal features. If unimodal features are not decodable within the congruent object representations, this would support the authors' explicit integrative hypothesis. However, the analyses provided do not go all the way in convincing the reader of this claim. First, the analyses reported do not differentiate between congruent and incongruent objects. If this result in the perirhinal cortex reflects the formation of new multimodal object representations, it should only be true for congruent objects but not incongruent objects. Since the analyses combine congruent and incongruent objects it is not possible to know whether this was the case. Second, just because feature representations on Day 2 do not correlate with multimodal object patterns on Day 4 does not mean that the object representations on Day 4 do not contain featural information. This could be directly tested by correlating feature representations on Day 4 with congruent vs. incongruent object representations on Day 4. It could be that representations in the perirhinal cortex are not stable over time and all representations-including unimodal feature representations-shift between sessions, which could explain these results yet not entail the existence of abstracted object representations.

      We thank the reviewer for this suggestion and have conducted the two additional analyses. Specifically, we split the congruent and incongruent conditions and also investigated correlations between unimodal representations on Day 4 with crossmodal object representations on Day 4. There was no significant interaction between modality and congruency in any ROI across or within learning days. One possible explanation for these findings is that both congruent and incongruent crossmodal objects are represented differently from their underlying unimodal features, and all of these representations can transform with experience.

      However, the new analyses also revealed that perirhinal cortex was the only region without a modality-specific bias after crossmodal learning (e.g., Day 4 Unimodal Feature runs x Day 4 Crossmodal Object runs; now shown in Supplemental Figure S5). Overall, these results are consistent with the notion of a crossmodal integrative code in perirhinal cortex that has changed with experience and is different from the component unimodal features. Nevertheless, we explore alternative interpretations for how the crossmodal code emerges with experience in the discussion.

      “To examine whether these results differed by congruency (i.e., whether any modality-specific biases differed as a function of whether the object was congruent or incongruent), we conducted exploratory linear mixed models for each of the five a priori ROIs across learning days. More specifically, we correlated: 1) the voxel-wise activity for Unimodal Feature Runs before crossmodal learning to the voxel-wise activity for Crossmodal Object Runs before crossmodal learning (Day 2 vs. Day 2), 2) the voxel-wise activity for Unimodal Feature Runs before crossmodal learning to the voxel-wise activity for Crossmodal Object Runs after crossmodal learning (Day 2 vs Day 4), and 3) the voxel-wise activity for Unimodal Feature Runs after crossmodal learning to the voxel-wise activity for Crossmodal Object Runs after crossmodal learning (Day 4 vs Day 4). For each of the three analyses described, we then conducted separate linear mixed models which included modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object) and congruency (congruent vs. incongruent)….There was no significant relationship between modality and congruency in any ROI between Day 2 and Day 2 (F1,346-368 between 0.00 and 1.06, p between 0.30 and 0.99), between Day 2 and Day 4 (F1,346-368 between 0.021 and 0.91, p between 0.34 and 0.89), or between Day 4 and Day 4 (F1,346-368 between 0.01 and 3.05, p between 0.082 and 0.93). However, exploratory analyses revealed that perirhinal cortex was the only region without a modality-specific bias and where the unimodal feature runs were not significantly correlated to the crossmodal object runs after crossmodal learning (Supplemental Figure S5).” – pg. 14

      “Taken together, the overall pattern of results suggests that representations of the crossmodal objects in perirhinal cortex were heavily influenced by their consistent visual features before crossmodal learning. However, the crossmodal object representations were no longer influenced by the component visual features after crossmodal learning (Figure 5, Supplemental Figure S5). Additional exploratory analyses did not find evidence of experience-dependent changes in the hippocampus or inferior parietal lobes (Supplemental Figure S4c-e).” – pg. 14

      “The voxel-wise matrix for Unimodal Feature runs on Day 4 were correlated to the voxel-wise matrix for Crossmodal Object runs on Day 4 (see Figure 5 in the main text for an example). We compared the average pattern similarity (z-transformed Pearson correlation) between shape (blue) and sound (orange) features specifically after crossmodal learning. Consistent with Figure 5b, perirhinal cortex was the only region without a modality-specific bias. Furthermore, perirhinal cortex was the only region where the representations of both the visual and sound features were not significantly correlated to the crossmodal objects. By contrast, every other region maintained a modality-specific bias for either the visual or sound features. These results suggest that perirhinal cortex representations were transformed with experience, such that the initial visual shape representations (Figure 5a) were no longer grounded in a single modality after crossmodal learning. Furthermore, these results suggest that crossmodal learning formed an integrative code different from the unimodal features in perirhinal cortex, as the visual and sound features were not significantly correlated with the crossmodal objects. * p < 0.05, ** p < 0.01, *** p < 0.001. Horizontal lines within brain regions indicate a significant main effect of modality. Vertical asterisks denote pattern similarity comparisons relative to 0.” – Supplemental Figure S5

      “We found that the temporal pole and perirhinal cortex – two anterior temporal lobe structures – came to represent new crossmodal object concepts with learning, such that the acquired crossmodal object representations were different from the representation of the constituent unimodal features (Figure 5, 6). Intriguingly, the perirhinal cortex was by default biased towards visual shape, but that this initial visual bias was attenuated with experience (Figure 3c, 5, Supplemental Figure S5). Within the perirhinal cortex, the acquired crossmodal object concepts (measured after crossmodal learning) became less similar to their original component unimodal features (measured at baseline before crossmodal learning); Figure 5, 6, Supplemental Figure S5. This is consistent with the idea that object representations in perirhinal cortex integrate the component sensory features into a whole that is different from the sum of the component parts, which might be a mechanism by which object concepts obtain their abstraction…. As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation.” – pg. 18

      In sum, the authors have collected a fantastic dataset that has the potential to answer questions about the formation of multimodal object representations in the brain. A more precise delineation of different theoretical accounts and additional analyses are needed to provide convincing support for the theory that “explicit integrative” multimodal object representations are formed during learning.

      We thank the reviewer for the positive comments and helpful feedback. We hope that our changes to our wording and clarifications to our methodology now more clearly supports the central goal of our study: to find evidence of crossmodal integrative coding different from the original unimodal feature parts in anterior temporal lobe structures. We furthermore agree that future research is needed to delineate the structure of the integrative code that emerges with experience in the anterior temporal lobes.

      Reviewer #3 (Public Review):

      This paper uses behavior and functional brain imaging to understand how neural and cognitive representations of visual and auditory stimuli change as participants learn associations among them. Prior work suggests that areas in the anterior temporal (ATL) and perirhinal cortex play an important role in learning/representing cross-modal associations, but the hypothesis has not been directly tested by evaluating behavior and functional imaging before and after learning cross- modal associations. The results show that such learning changes both the perceived similarities amongst stimuli and the neural responses generated within ATL and perirhinal regions, providing novel support for the view that cross-modal learning leads to a representational change in these regions.

      This work has several strengths. It tackles an important question for current theories of object representation in the mind and brain in a novel and quite direct fashion, by studying how these representations change with cross-modal learning. As the authors note, little work has directly assessed representational change in ATL following such learning, despite the widespread view that ATL is critical for such representation. Indeed, such direct assessment poses several methodological challenges, which the authors have met with an ingenious experimental design. The experiment allows the authors to maintain tight control over both the familiarity and the perceived similarities amongst the shapes and sounds that comprise their stimuli so that the observed changes across sessions must reflect learned cross-modal associations among these. I especially appreciated the creation of physical objects that participants can explore and the approach to learning in which shapes and sounds are initially experienced independently and later in an associated fashion. In using multi-echo MRI to resolve signals in ventral ATL, the authors have minimized a key challenge facing much work in this area (namely the poor SNR yielded by standard acquisition sequences in ventral ATL). The use of both univariate and multivariate techniques was well-motivated and helpful in testing the central questions. The manuscript is, for the most part, clearly written, and nicely connects the current work to important questions in two literatures, specifically (1) the hypothesized role of the perirhinal cortex in representing/learning complex conjunctions of features and (2) the tension between purely embodied approaches to semantic representation vs the view that ATL regions encode important amodal/crossmodal structure.

      There are some places in the manuscript that would benefit from further explanation and methodological detail. I also had some questions about the results themselves and what they signify about the roles of ATL and the perirhinal cortex in object representation.

      We thank the reviewer for their positive feedback and address the comments in the below point-by-point responses.

      (A) I found the terms "features" and "objects" to be confusing as used throughout the manuscript, and sometimes inconsistent. I think by "features" the authors mean the shape and sound stimuli in their experiment. I think by "object" the authors usually mean the conjunction of a shape with a sound---for instance, when a shape and sound are simultaneously experienced in the scanner, or when the participant presses a button on the shape and hears the sound. The confusion comes partly because shapes are often described as being composed of features, not features in and of themselves. (The same is sometimes true of sounds). So when reading "features" I kept thinking the paper referred to the elements that went together to comprise a shape. It also comes from ambiguous use of the word object, which might refer to (a) the 3D- printed item that people play with, which is an object, or (b) a visually-presented shape (for instance, the localizer involved comparing an "object" to a "phase-scrambled" stimulus---here I assume "object" refers to an intact visual stimulus and not the joint presentation of visual and auditory items). I think the design, stimuli, and results would be easier for a naive reader to follow if the authors used the terms "unimodal representation" to refer to cases where only visual or auditory input is presented, and "cross-modal" or "conjoint" representation when both are present.

      We thank the reviewer for this suggestion and agree. We have replaced the terms “features” and “objects” with “unimodal” and “crossmodal” in the title, text, and figures throughout the manuscript for consistency (i.e., “crossmodal binding problem”). To simplify the terminology, we have also removed the localizer results.

      (B) There are a few places where I wasn't sure what exactly was done, and where the methods lacked sufficient detail for another scientist to replicate what was done. Specifically:

      (1) The behavioral study assessing perceptual similarity between visual and auditory stimuli was unclear. The procedure, stimuli, number of trials, etc, should be explained in sufficient detail in methods to allow replication. The results of the study should also minimally be reported in the supplementary information. Without an understanding of how these studies were carried out, it was very difficult to understand the observed pattern of behavioral change. For instance, I initially thought separate behavioral blocks were carried out for visual versus auditory stimuli, each presented in isolation; however, the effects contrast congruent and incongruent stimuli, which suggests these decisions must have been made for the conjoint presentation of both modalities. I'm still not sure how this worked. Additionally, the manuscript makes a brief mention that similarity judgments were made in the context of "all stimuli," but I didn't understand what that meant. Similarity ratings are hugely sensitive to the contrast set with which items appear, so clarity on these points is pretty important. A strength of the design is the contention that shape and sound stimuli were psychophysically matched, so it is important to show the reader how this was done and what the results were.

      We agree and apologize for the lack of sufficient detail in the original manuscript. We now include much more detail about the similarity rating task. The methodology and results of the behavioral rating experiments are now shown in Supplemental Figure S1. In Figure S1a, the similarity ratings are visualized on a multidimensional scaling plot. The triangular geometry for shape (blue) and sound (red) indicate that the subjective similarity was equated within each unimodal feature across individual participants. Quantitatively, there was no difference in similarity between the congruent and incongruent pairings in Figure S1b and Figure S1c prior to crossmodal learning. In addition to providing more information on these methods in the Supplemental Information, we also now provide a more detailed description of the task in the manuscript itself. For convenience, we reproduce these sections below.

      “Pairwise Similarity Task. Using the same task as the stimulus validation procedure (Supplemental Figure S1a), participants provided similarity ratings for all combinations of the 3 validated shapes and 3 validated sounds (each of the six features were rated in the context of every other feature in the set, with 4 repeats of the same feature, for a total of 72 trials). More specifically, three stimuli were displayed on each trial, with one at the top and two at the bottom of the screen in the same procedure as we have used previously27. The 3D shapes were visually displayed as a photo, whereas sounds were displayed on screen in a box that could be played over headphones when clicked with the mouse. The participant made an initial judgment by selecting the more similar stimulus on the bottom relative to the stimulus on the top. Afterwards, the participant made a similarity rating between each bottom stimulus with the top stimulus from 0 being no similarity to 5 being identical. This procedure ensured that ratings were made relative to all other stimuli in the set.”– pg. 28

      “Pairwise similarity task and results. In the initial stimulus validation experiment, participants provided pairwise ratings for 5 sounds and 3 shapes. The shapes were equated in their subjective similarity that had been selected from a well-characterized perceptually uniform stimulus space27 and the pairwise ratings followed the same procedure as described in ref 27. Based on this initial experiment, we then selected the 3 sounds from the that were most closely equated in their subjective similarity. (a) 3D-printed shapes were displayed as images, whereas sounds were displayed in a box that could be played when clicked by the participant. Ratings were averaged to produce a similarity matrix for each participant, and then averaged to produce a group-level similarity matrix. Shown as triangular representational geometries recovered from multidimensional scaling in the above, shapes (blue) and sounds (orange) were approximately equated in their subjective similarity. These features were then used in the four-day crossmodal learning task. (b) Behavioral results from the four-day crossmodal learning task paired with multi-echo fMRI described in the main text. Before crossmodal learning, there was no difference in similarity between shape and sound features associated with congruent objects compared to incongruent objects – indicating that similarity was controlled at the unimodal feature-level. After crossmodal learning, we observed a robust shift in the magnitude of similarity. The shape and sound features associated with congruent objects were now significantly more similar than the same shape and sound features associated with incongruent objects (p < 0.001), evidence that crossmodal learning changed how participants experienced the unimodal features (observed in 17/18 participants). (c) We replicated this learning-related shift in pattern similarity with a larger sample size (n = 44; observed in 38/44 participants). *** denotes p < 0.001. Horizontal lines denote the comparison of congruent vs. incongruent conditions. – Supplemental Figure S1

      (2) The experiences through which participants learned/experienced the shapes and sounds were unclear. The methods mention that they had one minute to explore/palpate each shape and that these experiences were interleaved with other tasks, but it is not clear what the other tasks were, how many such exploration experiences occurred, or how long the total learning time was. The manuscript also mentions that participants learn the shape-sound associations with 100% accuracy but it isn't clear how that was assessed. These details are important partly b/c it seems like very minimal experience to change neural representations in the cortex.

      We apologize for the lack of detail and agree with the reviewer’s suggestions – we now include much more information in the methods section. Each behavioral day required about 1 hour of total time to complete, and indeed, participants rapidly learned their associations with minimal experience. For example:

      “Behavioral Tasks. On each behavioral day (Day 1 and Day 3; Figure 2), participants completed the following tasks, in this order: Exploration Phase, one Unimodal Feature 1-back run (26 trials), Exploration Phase, one Crossmodal 1-back run (26 trials), Exploration Phase, Pairwise Similarity Task (24 trials), Exploration Phase, Pairwise Similarity Task (24 trials), Exploration Phase, Pairwise Similarity Task (24 trials), and finally, Exploration Phase. To verify learning on Day 3, participants also additionally completed a Learning Verification Task at the end of the session. – pg. 27

      “The overall procedure ensured that participants extensively explored the unimodal features on Day 1 and the crossmodal objects on Day 3. The Unimodal Feature and the Crossmodal Object 1-back runs administered on Day 1 and Day 3 served as practice for the neuroimaging sessions on Day 2 and Day 4, during which these 1-back tasks were completed. Each behavioral session required less than 1 hour of total time to complete.” – pg. 27

      “Learning Verification Task (Day 3 only). As the final task on Day 3, participants completed a task to ensure that participants successfully formed their crossmodal pairing. All three shapes and sounds were randomly displayed in 6 boxes on a display. Photos of the 3D shapes were shown, and sounds were played by clicking the box with the mouse cursor. The participant was cued with either a shape or sound, and then selected the corresponding paired feature. At the end of Day 3, we found that all participants reached 100% accuracy on this task (10 trials).” – pg. 29

      (3) I didn't understand the similarity metric used in the multivariate imaging analyses. The manuscript mentions Z-scored Pearson's r, but I didn't know if this meant (a) many Pearson coefficients were computed and these were then Z-scored, so that 0 indicates a value equal to the mean Pearson correlation and 1 is equal to the standard deviation of the correlations, or (b) whether a Fisher Z transform was applied to each r (so that 0 means r was also around 0). From the interpretation of some results, I think the latter is the approach taken, but in general, it would be helpful to see, in Methods or Supplementary information, exactly how similarity scores were computed, and why that approach was adopted. This is particularly important since it is hard to understand the direction of some key effects.

      The reviewer is correct that the Fisher Z transform was applied to each individual r before averaging the correlations. This approach is generally recommended when averaging correlations (see Corey, Dunlap, & Burke, 1998). We are now clearer on this point in the manuscript:

      “The z-transformed Pearson’s correlation coefficient was used as the distance metric for all pattern similarity analyses. More specifically, each individual Pearson correlation was Fisher z-transformed and then averaged (see 61).” – pg. 32

      (C) From Figure 3D, the temporal pole mask appears to exclude the anterior fusiform cortex (or the ventral surface of the ATL generally). If so, this is a shame, since that appears to be the locus most important to cross-modal integration in the "hub and spokes" model of semantic representation in the brain. The observation in the paper that the perirhinal cortex seems initially biased toward visual structure while more superior ATL is biased toward auditory structure appears generally consistent with the "graded hub" view expressed, for instance, in our group's 2017 review paper (Lambon Ralph et al., Nature Reviews Neuroscience). The balance of visual- versus auditory-sensitivity in that work appears balanced in the anterior fusiform, just a little lateral to the anterior perirhinal cortex. It would be helpful to know if the same pattern is observed for this area specifically in the current dataset.

      We thank the reviewer for this suggestion. After close inspection of Lambon Ralph et al. (2017), we believe that our perirhinal cortex mask appears to be overlapping with the ventral ATL/anterior fusiform region that the reviewer mentions. See Author response image 1 for a visual comparison:

      Author response image 1.

      The top four figures are sampled from Lambon Ralph et al (2017), whereas the bottom two figures visualize our perirhinal cortex mask (white) and temporal pole mask (dark green) relative to the fusiform cortex. The ROIs visualized were defined from the Harvard-Oxford atlas.

      We now mention this area of overlap in our manuscript and link it to the hub and spokes model:

      “Notably, our perirhinal cortex mask overlaps with a key region of the ventral anterior temporal lobe thought to be the central locus of crossmodal integration in the “hub and spokes” model of semantic representations.9,50 – pg. 20

      (D) While most effects seem robust from the information presented, I'm not so sure about the analysis of the perirhinal cortex shown in Figure 5. This compares (I think) the neural similarity evoked by a unimodal stimulus ("feature") to that evoked by the same stimulus when paired with its congruent stimulus in the other modality ("object"). These similarities show an interaction with modality prior to cross-modal association, but no interaction afterward, leading the authors to suggest that the perirhinal cortex has become less biased toward visual structure following learning. But the plots in Figures 4a and b are shown against different scales on the y-axes, obscuring the fact that all of the similarities are smaller in the after-learning comparison. Since the perirhinal interaction was already the smallest effect in the pre-learning analysis, it isn't really surprising that it drops below significance when all the effects diminish in the second comparison. A more rigorous test would assess the reliability of the interaction of comparison (pre- or post-learning) with modality. The possibility that perirhinal representations become less "visual" following cross-modal learning is potentially important so a post hoc contrast of that kind would be helpful.

      We apologize for the lack of clarity. We conducted a linear mixed model to assess the interaction between modality and crossmodal learning day (before and after crossmodal learning) in the perirhinal cortex as described by the reviewer. The critical interaction was significant, which is now clarified in the text as well as in the rescaled figure plots.

      “To investigate this effect in perirhinal cortex more specifically, we conducted a linear mixed model to directly compare the change in the visual bias of perirhinal representations from before crossmodal learning to after crossmodal learning (green regions in Figure 5a vs. 5b). Specifically, the linear mixed model included learning day (before vs. after crossmodal learning) and modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object). Results revealed a significant interaction between learning day and modality in the perirhinal cortex (F1,775 = 5.56, p = 0.019, η2 = 0.071), meaning that the baseline visual shape bias observed in perirhinal cortex (green region of Figure 5a) was significantly attenuated with experience (green region of Figure 5b). After crossmodal learning, a given shape no longer invoked significant pattern similarity between objects that had the same shape but differed in terms of what they sounded like. Taken together, these results suggest that prior to learning the crossmodal objects, the perirhinal cortex had a default bias toward representing the visual shape information and was not representing sound information of the crossmodal objects. After crossmodal learning, however, the visual shape bias in perirhinal cortex was no longer present. That is, with crossmodal learning, the representations within perirhinal cortex started to look less like the visual features that comprised the crossmodal objects, providing evidence that the perirhinal representations were no longer predominantly grounded in the visual modality.” – pg. 13

      We note that not all effects drop in Figure 5b (even in regions with a similar numerical pattern similarity to PRC, like the hippocampus – also see Supplemental Figure S5 for a comparison for patterns only on Day 4), suggesting that the change in visual bias in PRC is not simply due to noise.

      “Importantly, the change in pattern similarity in the perirhinal cortex across learning days (Figure 5) is unlikely to be driven by noise, poor alignment of patterns across sessions, or generally reduced responses. Other regions with numerically similar pattern similarity to perirhinal cortex did not change across learning days (e.g., visual features x crossmodal objects in A1 in Figure 5; the exploratory ROI hippocampus with numerically similar pattern similarity to perirhinal cortex also did not change in Supplemental Figure S4c-d).” – pg. 14

      (E) Is there a reason the authors did not look at representation and change in the hippocampus? As a rapid-learning, widely-connected feature-binding mechanism, and given the fairly minimal amount of learning experience, it seems like the hippocampus would be a key area of potential import for the cross-modal association. It also looks as though the hippocampus is implicated in the localizer scan (Figure 3c).

      We thank the reviewer for this suggestion and now include additional analyses for the hippocampus. We found no evidence of crossmodal integrative coding different from the unimodal features. Rather, the hippocampus seems to represent the convergence of unimodal features, as evidenced by …[can you give some pithy description for what is meant by “convergence” vs “integration”?]. We provide these results in the Supplemental Information and describe them in the main text:

      “Analyses for the hippocampus (HPC) and inferior parietal lobe (IPL). (a) In the visual vs. auditory univariate analysis, there was no visual or sound bias in HPC, but there was a bias towards sounds that increased numerically after crossmodal learning in the IPL. (b) Pattern similarity analyses between unimodal features associated with congruent objects and incongruent objects. Similar to Supplemental Figure S3, there was no main effect of congruency in either region. (c) When we looked at the pattern similarity between Unimodal Feature runs on Day 2 to Crossmodal Object runs on Day 2, we found that there was significant pattern similarity when there was a match between the unimodal feature and the crossmodal object (e.g., pattern similarity > 0). This pattern of results held when (d) correlating the Unimodal Feature runs on Day 2 to Crossmodal Object runs on Day 4, and (e) correlating the Unimodal Feature runs on Day 4 to Crossmodal Object runs on Day 4. Finally, (f) there was no significant pattern similarity between Crossmodal Object runs before learning correlated to Crossmodal Object after learning in HPC, but there was significant pattern similarity in IPL (p < 0.001). Taken together, these results suggest that both HPC and IPL are sensitive to visual and sound content, as the (c, d, e) unimodal feature-level representations were correlated to the crossmodal object representations irrespective of learning day. However, there was no difference between congruent and incongruent pairings in any analysis, suggesting that HPC and IPL did not represent crossmodal objects differently from the component unimodal features. For these reasons, HPC and IPL may represent the convergence of unimodal feature representations (i.e., because HPC and IPL were sensitive to both visual and sound features), but our results do not seem to support these regions in forming crossmodal integrative coding distinct from the unimodal features (i.e., because representations in HPC and IPL did not differentiate the congruent and incongruent conditions and did not change with experience). * p < 0.05, ** p < 0.01, *** p < 0.001. Asterisks above or below bars indicate a significant difference from zero. Horizontal lines within brain regions in (a) reflect an interaction between modality and learning day, whereas horizontal lines within brain regions in reflect main effects of (b) learning day, (c-e) modality, or (f) congruency.” – Supplemental Figure S4.

      “Notably, our perirhinal cortex mask overlaps with a key region of the ventral anterior temporal lobe thought to be the central locus of crossmodal integration in the “hub and spokes” model of semantic representations.9,50 However, additional work has also linked other brain regions to the convergence of unimodal representations, such as the hippocampus51,52,53 and inferior parietal lobes.54,55 This past work on the hippocampus and inferior parietal lobe does not necessarily address the crossmodal binding problem that was the main focus of our present study, as previous findings often do not differentiate between crossmodal integrative coding and the convergence of unimodal feature representations per se. Furthermore, previous studies in the literature typically do not control for stimulus-based factors such as experience with unimodal features, subjective similarity, or feature identity that may complicate the interpretation of results when determining regions important for crossmodal integration. Indeed, we found evidence consistent with the convergence of unimodal feature-based representations in both the hippocampus and inferior parietal lobes (Supplemental Figure S4), but no evidence of crossmodal integrative coding different from the unimodal features. The hippocampus and inferior parietal lobes were both sensitive to visual and sound features before and after crossmodal learning (see Supplemental Figure S4c-e). Yet the hippocampus and inferior parietal lobes did not differentiate between the congruent and incongruent conditions or change with experience (see Supplemental Figure S4).” – pg. 20

      (F) The direction of the neural effects was difficult to track and understand. I think the key observation is that TP and PRh both show changes related to cross-modal congruency - but still it would be helpful if the authors could articulate, perhaps via a schematic illustration, how they think representations in each key area are changing with the cross-modal association. Why does the temporal pole come to activate less for congruent than incongruent stimuli (Figure 3)? And why do TP responses grow less similar to one another for congruent relative to incongruent stimuli after learning (Figure 4)? Why are incongruent stimulus similarities anticorrelated in their perirhinal responses following cross-modal learning (Figure 6)?

      We thank the author for identifying this issue, which was also raised by the other reviewers. The reviewer is correct that the key observation is that the TP and PRC both show changes related to crossmodal congruency (given that the unimodal features were equated in the methodological design). However, the structure of the integrative code is less clear, which we now emphasize in the main text. Our findings provide evidence of a crossmodal integrative code that is different from the unimodal features, and future studies are needed to better understand the structure of how such a code might emerge. We now more clearly highlight this distinction throughout the paper:

      “By contrast, perirhinal cortex may be involved in pattern separation following crossmodal experience. In our task, participants had to differentiate congruent and incongruent objects constructed from the same three shape and sound features (Figure 2). An efficient way to solve this task would be to form distinct object-level outputs from the overlapping unimodal feature-level inputs such that congruent objects are made to be orthogonal from the representations before learning (i.e., measured as pattern similarity equal to 0 in the perirhinal cortex; Figure 5b, 6, Supplemental Figure S5), whereas non-learned incongruent objects could be made to be dissimilar from the representations before learning (i.e., anticorrelation, measured as patten similarity less than 0 in the perirhinal cortex; Figure 6). Because our paradigm could decouple neural responses to the learned object representations (on Day 4) from the original component unimodal features at baseline (on Day 2), these results could be taken as evidence of pattern separation in the human perirhinal cortex.11,12 However, our pattern of results could also be explained by other types of crossmodal integrative coding. For example, incongruent object representations may be less stable than congruent object representations, such that incongruent objects representation are warped to a greater extent than congruent objects (Figure 6).” – pg. 18

      “As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation. Furthermore, these anterior temporal lobe structures may be involved with integrative coding in different ways. For example, the crossmodal object representations measured after learning were found to be related to the component unimodal feature representations measured before learning in the temporal pole but not the perirhinal cortex (Figure 5, 6, Supplemental Figure S5). Moreover, pattern similarity for congruent shape-sound pairs were lower than the pattern similarity for incongruent shape-sound pairs after crossmodal learning in the temporal pole but not the perirhinal cortex (Figure 4b, Supplemental Figure S3a). As one interpretation of this pattern of results, the temporal pole may represent new crossmodal objects by combining previously learned knowledge. 8,9,10,11,13,14,15,33 Specifically, research into conceptual combination has linked the anterior temporal lobes to compound object concepts such as “hummingbird”.34,35,36 For example, participants during our task may have represented the sound-based “humming” concept and visually-based “bird” concept on Day 1, forming the crossmodal “hummingbird” concept on Day 3; Figure 1, 2, which may recruit less activity in temporal pole than an incongruent pairing such as “barking-frog”. For these reasons, the temporal pole may form a crossmodal object code based on pre-existing knowledge, resulting in reduced neural activity (Figure 3d) and pattern similarity towards features associated with learned objects (Figure 4b).” – pg. 18

      This work represents a key step in our advancing understanding of object representations in the brain. The experimental design provides a useful template for studying neural change related to the cross-modal association that may prove useful to others in the field. Given the broad variety of open questions and potential alternative analyses, an open dataset from this study would also likely be a considerable contribution to the field.

    1. Author Response

      Reviewer #3 (Public Review):

      Gavanetto et al. propose an interesting method to identify membrane proteins based on the analysis of single-molecule AFM (smAFM) force-extension traces obtained from native plasma membranes. In the proposed pipeline, the authors use smAFM to non-specifically probe isolated plasma membranes by recording a large number (millions) of force-extension traces. While, as expected, most of them lack any binding or represent spurious events, the authors use an unsupervised clustering algorithm to identify groups of force-extension curves with a similar mechanical pattern, suggesting that each cluster corresponds to a unique protein species that can be fingerprinted by its specific force-extension pattern. By implementing a Bayesian framework, the authors contrast the identified groups with proteomics databases, which provide the most likely proteins that correspond to the identified force-extension clusters. A set of control experiments complements the manuscript to validate the proposed methodology, such as the application of their pipeline using purified samples or overexpressing a specific protein species to enrich its population.

      The primary strength of the manuscript is its originality, as it proposes a novel application of smAFM as a protein-detection method that can be applied in native samples. This methodology combines ingredients from conventional mass spectrometry and cryoEM; the contour length released upon extending a protein is a direct measure of its sequence extension (related to its mass), but the force pattern contains insightful information about the protein's structure. In this sense, the authors' proposal is very smart. However, the relationship between protein structure and mechanics is far from straightforward, and here perhaps lies one of the main limitations of the proposed method. This is particularly true for the case of membrane proteins, where we cannot talk about protein unfolding in its classical sense but rather about pullout events which is likely what each peak corresponds to (indeed, the authors speak throughout the paper about unfolding events, which I believe is not the correct term).

      We fully agree with the semantics concern of reviewer #3 about the term unfolding. A membrane protein when pulled with the tip of the AFM is pulled out of the membrane (see 2 in the image below) and, simultaneously, the segment that is pulled out unfolds (see 3). To our knowledge, force peaks corresponding to a contour length equal to 2 where not consistently observed or reported (when e.g. a transmembrane alpha helix is out of the membrane but folded).

      Since the field evolved with the practice of using the term ‘unfolding’ even for membrane proteins (see for instance (Kessler and Gaub, 2006; Oesterhelt et al., 2000; Yu et al., 2017) and many others), we would prefer to stick with this term.

      In the context of membrane proteins the term unfolding therefore refers to at least the tertiary structure of the protein, because it is not clear when and at which timescale the secondary structures really unfolds.

      We pointed this out in Line 131 (and following Lines).

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well performed study to demonstrate the antiviral function and viral antagonism of the dynein activating adapter NINL. The results are clearly presented to support the conclusions.

      This reviewer has only one minor suggestion to improve the manuscript.

      Add a discussion (1) why the folds of reduction among VSV, SinV and CVB3 were different in the NINL KO cells and (2) why the folds of reduction of VSV in the NINL KO A549 and U-2 OS cells.

      Thank you for this suggestion. We have amended the results section to include additional information about these observations and possible explanations for these results.

      Reviewer #2 (Public Review):

      This manuscript is of interest to readers for host-viral co-evolution. This study has identified a novel human-virus interaction point NINL-viral 3C protease, where NINL is actively evolving upon the selection pressure against viral infect and viral 3Cpro cleavage. This study demonstrates that the viral 3Cpros-mediated cleavage of host NINL disrupts its adaptor function in dynein motor-mediated cargo transportation to the centrosome, and this disruption is both host- and virus-specific. In addition, this paper indicates the role of NINL in the IFN signaling pathway. Data shown in this manuscript support the major claims.

      In this paper, the authors have identified a novel host-viral interaction, where viral 3C proteases (3Cpro) cleave at specific sites on a host activating adaptor of dynein intracellular transportation machinery, ninein-like protein (NINL or NLP in short) and inhibit its role in the antiviral innate immune response.

      The authors firstly found that, unlike other activating adaptors of dynein intracellular transportation machinery, NINL (or NLP) is rapidly evolving. Thus, the authors hypothesized that this rapid evolution of NINL was caused by its interaction with viral infection. The authors found that viruses replicated higher in NINL knock-out (KO) cells than in wild-type (WT) cells and the replication level was not attenuated upon IFNa treatment in NINL KO cells, unlike in WT cells. Next, the authors investigated the role of NINL in type I IFN-mediated immune response and found that the induction of Janus kinase/signal transducer and activation of transcription (JAK/STAT) genes were attenuated in NINL KO cells upon IFNa treatment. The author further showed that the reduction of replication IFNa sensitive Vaccinia virus mutant upon IFNa treatment was decreased in NINL KO A549 cells compared to WT cells. The authors further showed that the virus antagonized NINL function by cleaving it with viral 3Cpro at its specific cleavage sites. NINL-peroxisome ligation-based cargo trafficking visualization assay showed that the redistribution of immobile membrane-bound peroxisome was disrupted by cleavage of NINL or viral infection.

      This paper has revealed a novel host-virus interaction, and an antiviral function of a rapidly evolving activating adaptor of dynein intracellular transportation machinery, NINL. The major conclusions of this paper are well supported by data, but several aspects can be improved.

      1) It would be necessary to include a couple of other pathways involved in innate immune response besides JAK/STAT pathway.

      We are very interested in this question as well. Our RNAseq data (Supplementary file 4 and Figure 3 – Figure supplement 4) suggest that there are several transcriptional changes that result from NINL KO. Our goal in this manuscript was to focus on IFN signaling in order to understand this specific effect of NINL KO since it might have wide-ranging consequences on viral replication. While we agree that broadening our studies to other signaling pathways, including other pathways involved in innate immune response, is a good idea, we feel that those experiments would take longer than two months to perform and therefore fall outside of the scope of this paper.

      2) The in-cell cleavages of NINL by viral 3Cpros were well demonstrated and supported by data of high quality. A direct biochemical demonstration of the cleavage is needed with purified proteins.

      We agree with the reviewer that a direct biochemical cleavage assay would further demonstrate that viral 3Cpros cleave NINL specifically. However, our attempts to purify full-length NINL have been unsuccessful due to solubility issues (see example gel below), which is not surprising given that NINL is a >150 kDa human protein that has multiple surfaces that bind to other human proteins. As such, we focused our efforts on in-cell cleavage assays using specificity controls for cleavage. Specifically, we used catalytically inactive CVB3 3Cpro to show a dependence on protease catalytic activity and a variety of NINL constructs in which the glutamine in the P1 position is replaced by an arginine to show site specificity of cleavage. Notably, the cleavage sites in NINL that we mapped using this mutagenesis were predicted bioinformatically from known sites of 3Cpro cleavage in viral polyproteins, further indicating that cleavage is 3Cpro-dependent. We believe these results thus demonstrate that cleavage of NINL is dependent on viral protease activity and occurs in a sequence-specific manner. In light of the difficulty of purifying full-length NINL that would make biochemical experiments very challenging and likely take longer than two months to perform, we believe that our in cell data should be sufficient to demonstrate activity-dependent site-specific cleavage of NINL by viral 3Cpros.

      Sypro stained SDS-PAGE gel showing supernatant (S) and insoluble pellet (P) fractions across multiple purifications with altered buffer conditions.

      3) The author used different cell types in different assays. Explain the rationale with a sentence for each assay.

      Throughout this work, we choose to use a variety of cell lines for specific purposes. A549 cells were chosen as our main cell line as they are widely used in virology, are susceptible to the viruses we used, are responsive to interferon, and express both NINL and our control NIN at moderate levels. In the case of our virology and ISG expression data, we performed the same experiments with NINL KOs in other cell lines confirm that the phenotypes we observed in A549 cells could be attributed to the absence of NINL rather than off-target CRISPR perturbations or cell-line specific effects. All cleavage experiments were performed in HEK293T for their ease of transfection and protein expression. The inducible peroxisome trafficking assays were performed in U-2 OS cells as their morphology is ideal for observing the spatial organization of peroxisomes via confocal microscopy, and based on the fact that we had recapitulated the virology results and ISG expression results in those cells. At the suggestion of the reviewer, we have amended the text to include rationales where appropriate.

      4) While cell-based assays well support the conclusions in this paper, further demonstration in vivo would be helpful to provide an implication on the pathogenicity impact of NINL.

      We agree. However, we believe that examining the impact of the loss of or antagonism of NINL on the pathogenesis of infectious diseases in an in vivo model is outside the scope of this study.

      In summary, this manuscript contributes to a novel antiviral target. In addition, it is important to understand the host-virus co-evolution. The use of the evolution signatures to identify the "conflict point" between host and virus is novel.

    1. Author Response

      Reviewer #1 (Public Review):

      Cui and colleagues have performed a longitudinal analysis of blood cell counts in a cohort of ALS patients. The major findings include increases in neutrophils and monocytes that negatively correlated with ALSFRS-R score, but not disease progression rate. Increases in NK and central memory TH2 T cells correlated with a lower risk of death, while increased CD4 CD45RA effector memory and CD8 T cells were correlated with a higher risk of death.

      Strengths of the study include the sample size and effort to broadly include data.

      Thank you for the positive comment.

      Limitations of the study include indication bias, as the authors acknowledge, because the timing of the blood draws is not predefined. The specific review for possibility of infection does not, in this reviewer's opinion, sufficiently address this potential for bias. Also concerning is the fact that half the subjects have only a single measurement, and how well the findings generalize to more or late measurements is not clear. Similarly, the number of later measurements driving some of the main findings is much lower, further raising concern about the potential bias. Given these issues, one really would want to see disease controls, and how the different cell counts change in another disease. Finally, there is not discussion about how or whether treatments, or changes in treatment, could influence observed counts.

      We agree with the reviewer regarding indication bias and that is precisely why we performed the sensitivity analyses including 1) restricting the analysis to the first cell measure of each patient and 2) excluding cell measures with signs of ongoing infection at the time of blood draw. Reassuringly, both analyses provided rather similar results as those of the main analysis. We also agree with the reviewer regarding the varying numbers of measurements between patients. This is an unavoidable challenge to any longitudinal study of ALS patients, primarily due to the high mortality rate of this patient group. We have now added this limitation to the discussion:

      “First, the main cohort was heterogeneous in terms of the numbers of cell measurements and the time intervals between measurements, as the timing of blood sampling was not predefined. Indication bias due to, for example, ongoing infections might therefore be a concern. The sensitivity analysis excluding all samples taken at the time of infections provided however rather similar results. Further, the longitudinal analysis of cell counts should be interpreted with caution because not all patients contributed repeated cell measurements. This is however an unavoidable problem for any longitudinal study of ALS patients, given the high mortality rate of this patient group. Regardless, when focusing on the first cell measures, we obtained similar results as in the main analysis.”

      We further agree with the reviewer regarding the use of disease control. We have access to a cohort of patients with relapsing-remitting MS (RRMS) treated by rituximab (n=34), who had been measured with all the studied cell populations at the start of treatment and the 6-month follow-up. These cell measurements were processed during the same time-period using the identical setup at Karolinska University Hospital as the ones studied in the present study. In brief, we found different longitudinal changes of the studied immune cell populations between RRMS patients and ALS patients (please see below figure for details). The declining B cells are most likely due to rituximab treatment.

      Given the largely different disease mechanisms, phenotypes, and treatments between RRMS and ALS, we are not confident that RRMS would be a good disease control for the present study. We are certainly willing to reconsider our position if the reviewer and editors would disagree with us. We have regardless now added discussion about this in the manuscript:

      “It would therefore be interesting to compare ALS with other diseases, especially other neurodegenerative diseases, regarding the studied cell counts, in terms of both their longitudinal trajectories during disease course and their prognostic values in predicting patient outcome.”

      Finally, we agree that it is interesting to consider treatment in the analysis of cell counts. Among the ALS patients of the main cohort, majority (89.6%) were treated with Riluzole. We have now added a supplementary figure to demonstrate the leukocyte counts before and after start of Riluzole treatment. The corresponding analysis is however not possible for the FlowC cohort as majority of the patients started Riluzole treatment around time of diagnosis and almost all measurements were taken after Riluzole treatment. Th17 of CD4+ CM cells CD4+ EMRA cells CD8+ T cells Naïve CD8+ T cells CD8+ EM cells CD8+ CM cells CD8+ EMRA cells CD4+ HLA-DR+ CD38- cells CD4+ HLA-DR+ CD38+ cells CD8+ HLA-DR+ CD38- cells CD8+ HLA-DR+ CD38+ cells.

      We have now added this analysis to Methods and Results, including a new Figure 1—figure supplement 2.

      “To evaluate whether ALS treatment would influence the cell counts, we further visualized the temporal patterns of differential leukocyte counts before and after Riluzole treatment.”

      “The levels of leukocytes, neutrophils and monocytes increased, whereas the levels of lymphocytes decreased, after Riluzole treatment, compared with before such treatment (Figure 1—figure supplement 2).”

      Reviewer #2 (Public Review):

      Cui et al. investigated the correlation of immune profiles in ALS patients to functional status (by ALSFRS-R score), disease progression (rate of ALSFRS-R decline) and/or risk of death (or invasive ventilation use). The study longitudinally assessed basic immune profiles from a large cohort of ALS patients (n=288). Additionally, they deeply immunophenotyped a subset of ALS patients (n=92) to examine immune cell subtypes on ALS status, progression rate, and survival. The longitudinal design, deep immunophenotyping, and large cohort are significant strengths. Using various statistical models, the authors found leukocyte, neutrophil, and monocyte counts increased gradually over time as ALSFRS-R score declined. Within lymphocyte subpopulations, increasing natural killer cells and Th2-diffrentiated CD4+ central memory T cell counts correlated with a lower risk of death. Increasing CD4+ effector memory cells re-expressing CD45RA T cell and CD8+ T cell levels associated with a higher risk of death. These findings have broad implications for ALS pathogenesis and the development of immune-based ALS therapies tailored to specific immune cell populations.

      Thank you for the very positive comments.

    1. Author Response:

      Reviewer #1:

      In this manuscript Hill et al, analyze immune responses to vaccination of adults with the seasonal influenza vaccine. They perform a detailed analysis of the hemagglutinin-specific binding antibody responses against several different strains of influenza, and antigen-specific CD4+ T cells/T follicular cells, and cytokines in the plasma. Their analysis reveals that: (i) tetramer positive, HA-specific T follicular cells induced 7 days post vaccination correlate with the binding Ab response measured 42 days later; (ii) the HA-specific T fh have a diverse TCR repertoire; (iii) Impaired differentiation of HA-specific T fh in the elderly; and (iv) identification of an "inflammatory" gene signature within T fh in the elderly, which is associated with the impaired development of HA-specific Tfh.

      The paper addresses a topic of considerable interest in the fields of human immunology and vaccinology. In general the experiments appear well performed, and support the conclusions. However, the following points should be addressed to enhance the clarity of the paper, and add support to the key conclusions drawn.

      We thank the reviewer for their supportive evaluation of the manuscript, and have provided the details of how we have addressed each the points raised below.

      1) Abstract: "(cTfh) cells are the best predictor of high titre antibody responses.." Since the authors have not done any blind prediction using machine learning tools with independent cohort, the sentence should be rephrased thus: "cTfh) cells are were associated with high titre antibody responses."

      We agree that this phrasing better reflects the presented data. The sentence in the abstract (page 2) now reads “we show that formation of circulating T follicular helper (cTfh) cells was associated with high titre antibody responses.”

      2) Figure 1A: Please indicate the age range of the subjects.

      Figure 1 has been updated to include the age range of the subjects.

      3) Almost all the data in the paper shows binding Ab titers. Yet, typically HAI titers of MN titers are used to assess Ab responses to influenza. Fig 1C shows HAI titers against the H1N1 Cal 09 strain. Can the authors show HAI titers for Cal 09 and the other A and B strains contained in the 2 vaccine cohorts? Do such HAI titers correlate with the tetramer positive cells, similar to the correlations show in Fig 2e.

      In this manuscript we have deliberately focussed on the immune response to the H1N1 Cal09 strain, as it is the only influenza strain in the vaccine common to both cohorts. The HAI titre for this strain is now shown as supplementary figure 4. In addition, the class II tetramers were specifically selected to recognise unique epitopes in the Cal 09 strain (J. Yang, {..} W. W. Kwok, CD4+ T cells recognize unique and conserved 2009 H1N1 influenza hemagglutinin epitopes after natural infection and vaccination. Int Immunol 25, 447-457, 2013) because of this we do not think it is appropriate to correlate HAI titres for the non-Cal 09 strains with tetramer positive cells. We agree that showing the correlation of cTfh and other immune parameters with the HAI titres for Cal 09 is important and have included this as supplementary figure 7. The new data and text are presented below:

      Figure 1-figure supplement 4: HAI responses before and after vaccination A) Log2 HAI titres at baseline (d0), d7 and d42 for cohort 1 (n=16) and B) cohort 2 (n = 21). C) Correlation between HAI and A.Cali09 IgG as measured by Luminex assay for cohort 1 and 2 combined. p-values determined using paired Wilcoxon signed rank-test, and Pearson’s correlation.

      Text changes. Page 4. “The increase in anti-HA antibody titre was coupled with an increase in hemagglutination inhibitory antibodies to A.Cali09, the one influenza A strain contained in the TIVs that was shared across the two cohorts and showed a positive correlation with the A.Cali09 IgG titres measured by Luminex assay (Fig. 1C, Figure 1-figure supplement 4).”

      Figure 2-figure supplement 1: Correlations between HAI assay titres and selected immune parameters. Correlation between vaccine-induced A.Cali09 HAI titres at d42 with selected immune parameters in both Cohort 1 and Cohort 2 (n=37). Dot color corresponds to the cohort (black = Cohort 1, grey = Cohort 2). Coefficient (Rho) and p-value determined using Spearman’s correlation, and line represents linear regression fit.

      Results text Changes: Page 5. “Similar trends were seen when these immune parameters were correlated to HAI titres against A/Cali09 (Fig Figure 2-figure supplement 1).”

      4) Fig 2d to i: what % of all bulk activated Tfh at day 7 are tetramer positive? The tetramer positive T cells constitute roughly 0.094% of all CD4 T cells (Fig 2d), of which 1/3rd are CXCR5+, PD1+ (i.e. ~0.03% of CD4 T cells). What fraction of all activated Tfh is this subset of tetramer positive cells? Presumably, there will also be Tfh generated against other viral proteins in the vaccine, and these will constitute a significant fraction of all activated Tfh.

      This is an important point, as the tetramers only recognise one peptide epitope of the Cal.09 HA protein, so there will be many other influenza reactive CD4+ T cells that are responding to other Cal 09 epitopes as well as other proteins in the vaccine. The analysis suggested by the reviewer shows that the frequency of Tet+ cells amongst bulk cTfh cells ranges from 0.14%-1.52% in cohort 1, and from 0.022-2.7% in cohort 2. These data have been included as Figure Figure 1-figure supplement 6C, D in the revised manuscript. In addition, Tet+ cells as a percentage of bulk cTfh cells were reduced in older people compared to younger adults. This data has been included in Figure 5-figure supplement 1C in the revised manuscript.

      Figure 1-figure supplement 6: Percentage of cTfh cells that are Tet+ and CXCR3 and CCR6 expression on HA-specific CD4+ T cells. A) Representative flow cytometry gating strategy for CXCR5+PD-1+ cTfh cells on CD4+CD45RA- T cells, and the proportion of HA-specific Tet+ cells within the CXCR5+PD-1+ cTfh cell gate. B) Percentage Tet+ cells within the CXCR5+PD-1+ cTfh cell population. Within-cohort age group differences were determined using the Mann-Whitney U test.

      Results text, page 4: These antigen-specific T cells had upregulated ICOS after immunisation, indicating that they have been activated by vaccination (Fig. 1F, G). In addition, a median of one third of HA-specific T cells upregulated the Tfh markers CXCR5 and PD1 on d7 after immunisation (Fig. 1H, I). The tetramer binding cells represented between 0.022-2.7% of the total CXCR5+PD-1+ bulk population (Fig Figure 1-figure supplement 6A, B).

      Figure 5-figure supplement 1C: Age-related differences in cytokines and HA-specific CD4+ T cell parameters. C) Percentage Tet+ cells within the CXCR5+PD-1+ cTfh cell population. Within-cohort age group differences were determined using the Mann-Whitney U test.

      Results text, page 8: Across both cohorts, the only CD4+ T cell parameters consistently reduced in older individuals at d7 were the frequency of polyclonal cTfh cells and HA-specific Tet+ cTfh cells, with the strongest effect within the antigen-specific cTfh cell compartment (Fig. 5H-J, Figure 5-figure supplement 1C).

      Reviewer #2:

      Hill and colleagues present a comprehensive dataset describing the recall and expansion of HA-specific cTFH cells following influenza immunisation in two cohorts. Using class II tetramers, IgG titres against a large panel of HA antigens, and quantification of plasma cytokines, they find that activated and HA-specific cTFH cells were a strong predictor of the IgG response against the vaccine after 6 weeks. Using RNAseq and TCR clonotype analysis, they find that, in 10/15 individuals, the HA-specific cTFH response at day 7 post-vaccination is recalled from the available CD4 T cell memory pool present prior to vaccination. Post-vaccination HA-specific cTFH cells exhibited a transcriptional profile consistent with lymph node-derived GC TFH, as well as evidence of downregulation of IL-2 signaling pathways relative to pre-vaccine CD4 memory cells.

      The authors then apply these findings to a comparison of vaccine immunogenicity between younger (18-36) and older (>65) adults. As expected, they found lower levels of vaccine-specific IgG responses among the older cohort. Analysis of HA-specific T cell responses indicated that tet+ cTFH fail to properly develop in the older cohort following vaccination. Further analysis suggests that development of HA-specific cTFH in older individuals is not caused by a lack of TCR diversity, but is associated with higher expression of inflammation-associated transcripts in tet+ cTFH.

      Overall this is an impressive study that provides clarity around the recall of HA-specific CD4 T cell memory, and the burst of HA-specific cTFH cells observed 7 days post-vaccination. The association between defective cTFH recall and lower IgG titres post-vaccination in older individuals provides new targets for improving influenza vaccine efficacy in this age group. However, as currently presented, the model of impaired cTFH differentiation in the older cohort and the link to inflammation is somewhat unclear. There are several issues that could be clarified to improve the manuscript in its current form:

      We thank the reviewer for their supportive and comprehensive summary of our work. We agree that the link between impaired inflammation and cTfh differentiation is correlative, we have added new data to address this, including mechanistic data to support chronic IL-2 signalling as antagonistic to cTfh development, as well as providing new analyses to address the other points raised.

      1) It is somewhat unclear the extent to which the reduction in HA-specific cTFH in the older cohort is also related to an overall reduction in T cell expansion - cohort 1 shows a significant reduction in total tet+ CD4 T cells post-vaccination as well as in the cTFH compartment, and while this difference may not reach statistical significance, a similar trend is shown for cohort 2.

      We agree that a possible interpretation is a global failure in T cell expansion in the older individuals. To determine whether there is a relationship between the degree of Tet+ CD4+ T cell expansion and cTfh cell differentiation with age, we performed correlation analyses. There is no correlation between the expansion of Tet+ cells and the frequency of cTfh cells formed seven days after immunisation in either age group. This suggests that the impaired cTfh cell differentiation in older persons is most likely caused by factors other than the capacity of CD4+ T cells to expand after vaccination. These data have been added as Figure 5-figure supplement 1D, and included in the results text on page 8.

      Figure 5-figure supplement 1D: Age-related differences in cytokines and HA-specific CD4+ T cell parameters. D) Correlation between Tet+ cells (d7-d0, % of CD4+) and cTfh (d7-d0, % of TET+) in both cohorts for each age-group (18- 36 y.o n=37, 65+ y.o. n= 39). Dot color corresponds to the cohort (black = Cohort 1, grey = Cohort 2). Coefficient (Rho) and p-value determined using Spearman’s correlation, and line represents linear regression fit.

      Text changes, Page 8: There was no consistent difference in the total d7 Tet+ HA-specific T cell population with age for both cohorts (Fig. 5H) and we observed no age-related correlation between the ability of an individual to differentiate Tet+ cells into a cTfh cell and the overall expansion of Tet+ HA-specific T cell population (Figure 5-figure supplement 1D). Thus, our data suggests that the poor vaccine antibody responses in older individuals is impacted by impaired cTfh cell differentiation (Fig. 5J) rather than size of the vaccine-specific CD4+ T cell pool.

      2) Transcriptomic analysis indicates that HA-specific cTFH in the older cohort show impaired downregulation of inflammation, TNF and IL-2-related signaling pathways. The authors therefore conclude that excess inflammation can limit the response to vaccination. In its current presentation, the data does not necessarily support this conclusion. While it is clear that downregulation of TNF and IL-2 signalling pathways occur during cTFH/TFH differentiation, there is no evidence presented to support the idea that (a) vaccination results in increased pro-inflammatory cytokine production in lymphoid organs in older individuals or that (b) these pro-inflammatory cytokines actively promote CXCR5-, rather than cTFH, differentiation of existing memory T cells.

      We agree with the reviewer that the data presented in figure 7 are correlative, rather than causative. Unfortunately, we do not have access to secondary lymphoid tissues from younger and older people after vaccination to test point (a) above. In order to test the hypothesis that increased inflammatory cytokine production in lymphoid organs limits Tfh cell differentiation we have used Il2cre/+; Rosa26stop-flox-Il2/+ transgenic mice. In this mouse model, IL-2-dependent cre- recombinase activity facilitates the expression of low levels of IL-2 in cells that have previously expressed IL-2. This creates a scenario in which cells that physiologically express IL-2 cannot turn its expression off therefore increasing expression IL-2 after antigenic stimulation (mice reported in Whyte et al., bioRxiv, 2020, doi: https://doi.org/10.1101/2020.12.18.423431).

      Twelve days after influenza A infection, Il2cre/+; Rosa26stop-flox-Il2/+ transgenic mice have fewer Tfh cells in the draining mediastinal lymph node and in the spleen (Fig. 8A-C), this is accompanied by a reduction in the magnitude of the GC B cell response (Fig. 8D-E). These data provide a proof of concept that sustained IL-2 production limit the formation of Tfh cells, consistent with the negative correlation of an IL-2 signalling gene signature and cTfh cell formation in humans (Figure 7). These new data support the conclusion that excess IL-2 signalling can limit the Tfh cell response. These data are presented in Figure 8, and are discussed on page 12 in the results, and pages 12-13 in the discussion.

      Figure 8: Increased IL-2 production impairs Tfh cell formation and the germinal centre response. Assessment of the Tfh cell and germinal centre response in Il2cre/+; Rosa26stop-flox-Il2/+ transgenic mice that do not switch off IL-2 production, and Il2cre/+; Rosa26+/+ control mice 12 days after influenza A infection. Flow cytometric contour plots (A) and quantification of the percentage of CXCR5highPD-1highFoxp3-CD4+ Tfh cells in the mediastinal lymph node (B) and spleen (C). Flow cytometric contour plots (D) and quantification of the percentage of Bcl6+Ki67+B220+ germinal centre B cells in the mediastinal lymph node (E) and spleen (F). The height of the bars indicates the median, each symbol represents one mouse, data are pooled from two independent experiments. P-values calculated between genotype-groups by Mann Whitney U test.

      Results text, page 12: Sustained IL-2 production inhibits Tfh cell frequency and the germinal centre response. To test the hypothesis that cytokine signalling needs to be curtailed to facilitate Tfh cell differentiation turned to a genetically modified mouse model in which cells that have initiated IL-2 production cannot switch it off, Il2cre/+; Rosa26stop-flox-Il2/+ mice (37). Twelve days after influenza infection Il2cre/+; Rosa26stop-flox-Il2/+ mice have fewer Tfh cells in the draining lymph node and spleen (Fig. 8A-C), which is associated with a reduced frequency of germinal center B cells (Fig. 8D-F). This provides a proof of concept that proinflammatory cytokine production needs to be limited to enable full Tfh cell differentiation in secondary lymphoid organs.

      Discussion text, pages 12, 13: These enhanced inflammatory signatures associated with poor antibody titre in an independent cohort of influenza vaccinees. The dampening of Tfh cell formation by enhanced cytokine production was confirmed by the use of genetically modified mice where IL-2 production is restricted to the appropriate anatomical and cellular compartments, but once initiated cannot be inactivated. Together, this suggests that formation of antigen-specific Tfh cells is essential for high titre antibody responses, and that excessive inflammatory factors can contribute to poor cTfh cell responses.

    1. Author Response

      Reviewer #1 (Public Review):

      In the article "Neuroendocrinology of the lung revealed by single cell RNA sequencing", Kuo et. al. described various aspects of pulmonary neuroendocrine cells (PNECs) including the scRNA-seq profile of one human lung carcinoid sample. Overall, although this manuscript does not have any specific storyline, it is informative and would be an asset for researchers exploring various new roles of PNECs.

      Thank you for appreciating the significance of the data presented. Our storyline focuses on the newly uncovered molecular diversity of PNECs and the extraordinary repertoire of peptidergic signals they express and cell types these signals can directly target in (and outside) the lung, in mice and human, and in health and disease (human carcinoid tumor).

      Major comments:

      The major concern about the work is most results are preliminary, and at a descriptive level, conclusions or sub-conclusions are derived from scRNA-seq analysis only, lacking in-depth functional analysis and validation in other methods or systems. There are many open-end results that have been predicted by the authors based on their scRNA-seq data analysis without functional validation. In order to give them a constructive roadmap, it would be better to investigate literature and put them in a potential or probable hypothesis by citing the available literature. This should be done in each section of the result part. The paper lacks a main theme or specific biology question to address. In addition, the description about the human lung carcinoid by scRNA-seq is somehow disconnected from the main study line. Also, these results are derived from the study on only one single patient, lacking statistical power.

      We agree that much of the data and analysis presented in the paper is descriptive and hypothesis-generating for PNECs, however we do not consider it preliminary. We focused on validating two key conclusions from the scRNA-seq analysis: PNECs are extraordinarily diverse molecularly (as validated by multiplex in situ hybridization and immunostaining) and they express many different combinations of peptidergic signals (and appear to package them in separate vesicles). From the lung expression profiles of the cognate receptors, we also predicted the direct lung targets of the dozens of new PNEC peptidergic signals we uncovered, and validated the cell target (PSN4, a recently identified subtype of pulmonary sensory neuron) of one of the newly identified PNEC signals (the classic hormone angiotensin) by confirming expression of the cognate receptor gene in PSN4 neurons that innervate PNECs and showing that the hormone can directly activate PSN4 neurons. The characterized human carcinoid provided evidence that during tumorigenesis, the amplified PNECs retain a memory (albeit imperfect) of the molecular subtype of PNEC from which they originated. As suggested by the Reviewer, we have provided more background in Results by adding additional citations from the literature to clarify the rationale for each analysis and what was known prior to the analysis. We feel that our paper provides a broad foundation for exploring the diversity and signaling functions of PNECs, and although each molecular type of PNEC and new PNEC peptidergic signal we uncovered and potential target cell in (and outside) the lung warrants follow up (as do the sensory and other properties of PNECs we inferred from their expression profiles), such studies will require the effort of many individuals in many labs studying both normal and disease physiology in mouse and human, and exploiting the data, hypotheses, approaches, and framework we provide.

      Reviewer #2 (Public Review):

      Pulmonary neuroendocrine cells (PNECs) are known to monitor oxygen levels in the airway and can serve as stem cells that repair the lung epithelium after injury. Due to their rarity, however, their functions are still poorly understood. To identify potential sensory functions of PNECs, the authors have used single-cell RNA-sequencing (scRNA-seq) to profile hundreds of mouse and human PNECs. They report that PNECs express over 40 distinct peptidergic genes, and over 150 distinct combinations of these genes can be detected. Receptors for these neuropeptides and peptide hormones are expressed in a wide range of lung cell types, suggesting that PNECs may have mechanical, thermal, acid, and oxygen sensory roles, among others. However, since some of these cognate receptors are not expressed in the lung, PNECs may also have systemic endocrine functions. Although these data are largely descriptive, the results represent a significant resource for understanding the potential roles of PNECs in normal biology as well as in pulmonary diseases and cancer and are likely to be relevant for understanding neuroendocrine cells in other tissue contexts.

      However, there are several aspects of the data analysis that are unclear and require clarification, most notably the definition of a neuroendocrine cell (points #1 and #2 below).

      1) Figure S1 shows the sorting strategy used for isolation of putative PNECs from Ascl1CreER/+; Rosa26ZsGreen/+ mice, and distinguishes neuroendocrine cells defined as ZsGreen+ EpCAM+ and "neural" cells defined as ZsGreen+ EpCAM-; the figure legend also refers to the ZsGreen+ EpCAM- cells as "control" cells. However, the table shown in panel D indicates that the NE population combines 112 ZsGreen+ EpCAM+ cells together with 64 ZsGreen+ EpCAM- cells to generate the 176 cells used for subsequent analyses. Why are these ZsGreen+ EpCAM- cells initially labeled as neural or control, but are then defined as neuroendocrine? If these do not express an epithelial marker, can they be rigorously considered as neuroendocrine?

      As explained above in the response to Essential Revision point 1, we define pulmonary neuroendocrine cells (PNECs) throughout the paper by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). The confusion here arises from the two previously known markers (Ascl1 lineage marker ZsGreen, EpCAM) we used for flow sorting to enrich for these rare cells for transcriptomic profiling (Fig. S1). Although most of the cells with PNEC transcriptomic profiles were from the ZsGreenhi EpCAMhi sorted population (as expected), some were from the ZsGreenhi EpCAMlo sorted population. The latter resulted from the high EpCAM gating threshold we used during flow sorting, which excluded some PNECs with intermediate levels of surface EpCAM. Indeed, nearly all PNECs (> 95%) expressed EpCAM by scRNAseq, and there was no difference in EpCAM transcript levels or transcriptomic clustering of PNECs that were from the ZsGreenhi EpCAMhi vs. ZsGreenhi EpCAMlo sorted populations, as we now show in the new panels (C', C'') added to Fig S1C. This point is now clarified in the legend to Fig. S1C, and it nicely demonstrates that transcriptomic profiling is a more robust method of identifying PNECs than flow sorting based on two classical markers.

      2) Similarly, in the human scRNA-seq analysis, how were PNECs defined? The methods description states that these cells were identified by their expression of CALCA and ASCL1, but does not indicate whether they also expressed epithelial markers.

      Human PNECs were identified in the single cell transcriptomic analysis by the same strategy described above for mouse PNECs: by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). In addition to expression of classic and new markers, the human PNEC cluster defined by scRNA-seq indeed showed the expected expressed of epithelial markers (e.g, EPCAM, see dotplot below), like other epithelial cells.

      3) The presentation of sensitivity and specificity in Figure 1 is confusing and potentially misleading. According to Figure 1B, Psck1 and Nov are two of the top-ranked differentially expressed genes in PNECs with respect to both sensitivity and specificity. However, the specificity of these two genes appears to be lower than that of Scg5, Chgb, and several other genes, as suggested in Figure 1C and Figure S1E. In contrast, Chgb appears to have higher specificity and sensitivity than Psck1 in Figures 1C and E but is not shown in the list of markers in Figure 1B.

      As explained above in the response to Essential Revision point 2, because different marker features are important for different applications, we have provided several different graphical formats (Figs. 1B,C, Fig. S1E) and a table (Table S1) to aid in selection of the optimal markers for each application. Fig. 1B shows the most sensitive and specific PNEC markers identified by ratio of the natural logs of the average expression of the marker in PNECs vs. non-PNEC epithelial cells (Table S1), and we have added a two-dimensional plot of this sensitivity and specificity for a large set of PNEC markers (new panel E of Fig. S1). The violin plots in Fig. 1C allow visual comparison of expression of selected markers across PNECs and 40 other lung cell types including non-epithelial cells (from our extensive mouse lung atlas in Travaglini, Nabhan et al, Nature 2020). Pcsk1 and Nov score high in the analysis of Fig. 1B because they are highly sensitive and specific markers within the pulmonary epithelium, and they are also valuable markers because they are highly expressed in PNECs. However, they appear slightly less specific in the violon plots of Fig. 1C (Pcsk1) and Fig. S1F (Nov) because of expression (though at much lower levels) in individual lung cell types outside the epithelium: Pcsk1 is expressed also at low levels in some Alox5+ lymphocytes, and Nov is expressed at low levels in some smooth muscle cells. Chgb is a new PNEC marker that did not make the cutoff for the list in Fig. 1B because it is expressed in a slightly higher percentage of non-PNEC epithelial cells than the markers shown, which ranked slightly above it by this metric (see Table S1).

      4) The expression of serotonin biosynthetic genes in mouse versus human PNECs deserves some comment. The authors fail to detect the expression of Tph1 and Tph2 in any of the mouse PNECs analyzed, but TPH1 is expressed in 76% of the human PNECs (Table S8). Is it possible that Tph1 and Tph2 are not detected in the mouse scRNA-seq data due to gene drop-out? If serotonin signaling by mouse PNECs is due to protein reuptake, as implied on p. 5, is there a discrepancy between serotonin expression as detected by smFISH versus immunostaining?

      It is always possible that the failure to detect expression of Tph1 and Tph2 in the mouse scRNA-seq dataset is due to technical dropout, however when we analyzed this in our other mouse PNEC scRNA-seq dataset obtained using a microfluidic platform and also deeply-sequenced (Ouadah et al, Cell 2019), we found similar values as in the previously analyzed dataset: no Tph2 expression was detected and only 3% (3 of 92) of PNECs had detected Tph1 expression, whereas 24% (22 of 92) had detected expression of serotonin re-uptake transporter Slc6a4. Because our mouse and human scRNA-seq datasets were prepared similarly and sequenced to a similar depth (105 to 106 reads/cell), the difference observed in Tph1/TPH1 expression between mouse (0-3% PNECs) and human (76% PNECs) is more likely a true biological difference. We also analyzed serotonin levels in mouse PNECs by immunohistochemistry (not shown) and detected serotonin in nearly all (~90%) embryonic PNECs but only ~10% of adult PNECs. Systematic follow up studies will be necessary to resolve the mechanism of serotonin biogenesis and uptake in PNECs, and the potential stage and species-specific differences in these processes suggested by this initial data.

      5) The smFISH and immunostaining analyses are often presented without any indication of the number of independent replicate samples analyzed (e.g., Figure 2B, Figure 3F, G).

      The number of samples analyzed have been added (the values for Fig. 2B are given in legend to Fig. 2C, the quantification of Fig. 2B).

      6) It would be helpful to provide a statistical analysis of the similarities and differences shown in the graphs in Figures 1E and G.

      We added a statistical analysis (Fisher's exact test, two-sided) of Fig. 1E comparing expression of each examined gene in the two scRNA-seq datasets (Table S4). We added a similar statistical analysis of Fig. 1G comparing the expression values of each examined gene by scRNA-seq vs smFISH (see Fig. 1G legend).

    1. Author Responses

      Reviewer #1 (Public Review):

      This study uses a nice longitudinal dataset and performs relatively thorough methodological comparisons. I also appreciate the systematic literature review presented in the introduction. The discussion of confound control is interesting and it is great that a leave-one-site-out test was included. However, the prediction accuracy drops in these important leave-one-site-out analyses, which should be assessed and discussed further.

      Furthermore, I think there is a missed opportunity to test longitudinal prediction using only pre-onset individuals to gain clearer causal insights. Please find specific comments below, approximately in order of importance.

      We thank the reviewers for their positive remarks and for providing important suggestions to improve the analysis. Please see our detailed comments below.

      1) The leave-one-site-out results fail to achieve significant prediction accuracy for any of the phenotypes. This reveals a lack of cross-site generalizability of all results in this work. The authors discuss that this variance could be caused by distributed sample sizes across sites resulting in uneven folds or site-specific variance. It should be possible to test these hypotheses by looking at the relative performance across CV folds. The site-specific variance hypothesis may be likely because for the other results confounds are addressed using oversampling (i.e., sampling with replacement) which creates a large sample with lower variance than a random sample of the same size. This is an important null finding that may have important implications, so I do not think that it is cause for rejection. However, it is a key element of this paper and I think it should be assessed further and discussed more widely in the abstract and conclusion.

      We thank the reviewer for raising this point and providing specific suggestions. As mentioned by the reviewer, the leave-one-site-out results showed high-variance across sites, that is, across cross validation (CV) folds. Therefore, as suggested by the reviewer, we further investigated the source of this variance by observing how the model accuracies correlates with each site and its sample sizes, ratio of AAM-to-controls, and the sex distribution in each site. We ranked the sites from low to high accuracy and observed different performance metrics such as sensitivity and specificity:

      As shown, the models performed close-to-chance for sites ‘Dublin’, ‘Paris’ and ‘Berlin’ (<60% mean balanced accuracy) in the leave-one-site-out experiment, across all time-points and metrics. Notably, the order of the performance at each site does not correspond to the sample sizes (please refer to the ‘counts’ column in the above figure). It also does not correspond to the ratio of AAM-to-controls, or to the sex distribution.

      To further investigate this, we performed another additional leave-one-site-out experiment with all 8 sites. Here, we repeated the ML (Machine Learning) exploration by using the entire data, including the data from the Nottingham site that was kept aside as the holdout. Since there are 8 sites now, we used a 8-fold cross validation and observed how the model accuracy varied across each site:

      The results were comparable to the original leave-one-site-out experiment. Along with ‘Dublin’ and Berlin’, the models additionally performed poorly on the ‘Nottingham’ site. Results on ‘London’ and ‘Paris’ also fell below 60% mean balanced accuracy.

      Finally, we compared the above two results to the main experiment from the paper where the test samples were randomly sampled across all sites. The performance on test subjects from each site was compared:

      As seen, the models struggled with subjects from ‘Dublin’ followed by ‘Nottingham’ ‘London’ and ‘Berlin’ respectively, and performed well on subjects from ‘Dresden’, ‘Mannheim’, ‘Hamburg’ and ‘Paris’.

      Across all the three results discussed above, the models consistently struggle to generalize to subjects particularly from ‘Dublin’ and ‘Nottingham’. As already pointed out by the reviewer, the variance in the main experiment in the manuscript is lower because of the random sampling of the test set across all sites. Since these results have important implications, we have included them in the manuscript and also provided these figures in the Appendix.

      2) The authors state that "83.3% of subjects reported having no or just one binge drinking experience until age 14". To gain clearer insights into the causality, I recommend repeating the MRIage14 → AAMage22 prediction using only these 83% of subjects.

      We thank the reviewer for this valuable comment. As suggested by the reviewer, we now repeated the MRIage14 → AAMage22 analysis by including (a) only the subjects who had no binge drinking experiences (n=477) by age 14 and (b) subjects who had one or less binge drinking experiences (n=565). The results are shown below. The balanced accuracy on the holdout set were 72.9 +/- 2% and 71.1 +/- 2.3% respectively, which is comparable to the main result of 73.1 +/- 2%.

      These results provide further evidence that certain form of cerebral predisposition might be preceding the observed alcohol misuse behavior in the IMAGEN dataset. We discuss these results now in the Results section and the 2nd paragraph of Discussion.

      3) The feature importance results for brain regions are quite inconsistent across time points. As such, the study doesn't really address one of the main challenges with previous work discussed in the introduction: "brain regions reported were not consistent between these studies either and do not tell a coherent story". This would be worth looking into further, for example by looking at other indices of feature importance such as permutation-based measures and/or investigating the stability of feature importance across bootstrapped CV folds.

      The feature importance results shown in Figure 9 is intended to be illustrative and show where the most informative structural features are mainly clustered around in the brain, for each time point. We would like to acknowledge that this figure could be a bit confusing. Hence, we have now provided an exhaustive table in the Appendix, consisting of all important features and their respective SHAP scores obtained across the seven repeated runs. In addition, we address the inconsistencies across time points in the 3rd paragraph in the Discussion chapter and contrast our findings with previous studies. These claims can now be verified from the table of features provided in the Appendix.

      Addressing the reviewer's suggestions, we would like to point out that SHAP is itself a type of permutation-based measure of feature importance. Since it derives from the theoretically-sound shapley values, is model agnostic, and has been already applied for biomedical applications, we believe that running another permutation-based analysis would not be beneficial. We have also investigated the stability of our feature importance scores by repeating the SHAP estimation with different random permutations. This process is explained in the Methods section Model Interpretation.

      Additionally now, the SHAP scores across the seven repetitions are also provided in the Appendix table 6 for verification.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper tests the hypothesis that 1/f exponent of LFP power spectrum reflects E-I balance in a rodent model and Parkinson's patients. The authors suggest that their findings fit with this hypothesis, but there are concerns about confirmation bias (elaborated on below) and potential methodological issues, despite the strength of incorporating data from both animal model and neurological patients.

      First, the frequency band used to fit the 1/f exponent varies between experiments and analyses, inviting concerns about potentially cherry-picking the data to fit with the prior hypothesis. The frequency band used for fitting the exponent was 30-100 Hz in Experiment 1 (rodent model), 40-90 Hz in Experiment 2 (PD, levodopa), and 10-50 Hz in Experiment 3 (PD, DBS). Ad-hoc reasons were given to justify these choices, such as " to avoid a spectral plateau starting > 50 Hz" in Experiment 3. However, at least in Experiment 3 (Fig. 3), if the frequency range was shifted to 1-10 Hz, the authors would have uncovered the opposite effect, where the exponent is smaller for DBS-on condition.

      We agree that parameter choice is crucial, in particular, choice of the fitting range. In addition to the 40-90 Hz range (Figure 2C), we have performed aperiodic fitting for five other frequency ranges to test to what extent the reported results are sensitive to the selected frequency range (Figure S2A). This analysis showed that the results are robust when a broad frequency range from 30 to 95 Hz was chosen, which is consistent with what has been suggested by Gao et al., 2017 to make inferences on the E/I ratio.

      Accordingly, we have now repeated the analyses for the animal data with the same fitting range used for the ON-OFF medication comparison in humans. Along with Figure S2A where different frequency ranges were tested for data used in Figure 2, this shows that the results in Figure 1 and 2 hold up with higher aperiodic exponents when STN spiking is low and vice versa. Therefore, a broad fitting range from 30 to 90 Hz (excluding harmonics of mains interference) generates consistent results for both human and animal data.

      We opted against a fitting range from 1-10 Hz because of two restraints highlighted in Gerster et al., 2022. First, a fitting range starting at 1 Hz could have a larger y-intercept due to the presence of low-frequency oscillations. This could lead to a larger aperiodic exponent and could be misinterpreted as stronger neural inhibition. Therefore, the lower fitting bound should be chosen to best avoid known oscillations in the delta/theta range (Gerster et al., 2022). Second, frequencies should be chosen to avoid oscillations crossing fitting range limits. In Figure 3A, oscillations in the theta/alpha band both ON and OFF stimulation would complicate parameterisation and would likely result in spurious fits.

      We also tested the effect of changing the peak threshold, peak width limits and the aperiodic fitting mode on FOOOF parameterisation. Increasing and decreasing the peak threshold from its default value (at 2 standard deviations) did not change results (Figure S2B). Similarly, adapting the peak width limits did not affect the exponent difference between medication states (Figure S2C). Finally, choosing the ‘knee’ mode instead of ‘fixed’ resulted in fundamentally different aperiodic fits that did not differ anymore with medication (Figure S2D). This is most likely a consequence of the near linear PSD in log-log space from 40 to 90 Hz (Figure 2B). If there is no bend in the PSD, the FOOOF algorithm will be forced to assign a ‘random’ knee and the aperiodic fit will then mostly reflect the slope of the spectrum above the knee point.

      Second, there are important, fine-grained features in the spectra that are ignored in the analyses, which confounds the interpretation.

      One salient example of this is Fig. 2, where based on the plots in B, one would expect that the power of beta-band oscillations to be higher in the Med-On condition, as the oscillatory peaks rise higher above the 1/f floor and reach the same amplitude level as the Med-OFF condition (in other words, similar total power is subtracted by a smaller 1/f power in the Med-ON condition). But this impression is opposite to the model-fitting results in C, where beta power is lower in the Med-ON condition.

      We agree that PSDs over a broad frequency range (e.g. 5-90 Hz) typically do not have a single 1/f property. Instead, there can be multiple oscillatory peaks and ‘knees/bends’ in the aperiodic component. For these cases, fitting should be performed using the knee mode. To extract periodic beta power, we parameterise the PSD between 5 and 90 Hz and select the largest oscillatory component between 8 and 35 Hz (this range was extended to include the large oscillatory peaks in hemispheres 27 and 28 at ~ 10 Hz, see Figure R1). We now use the knee mode, to model the aperiodic component between 5 and 90 Hz when periodic beta power is calculated (see our previous comments). Figure R1 provides an overview of all PSDs ON and OFF medication, the aperiodic fits (5-90 Hz (knee) and 40-90 Hz (fixed)) and the detected beta peaks. In spite of this modification in our pipeline, periodic beta power is still larger OFF medication (Figure 2C), in keeping with previous studies (Kim et al., 2022; Kühn et al., 2006; Neumann et al., 2017; Ray et al., 2008). We acknowledge the reviewer’s point that the average spectra in Figure 2B are misleading in that respect and for clarity provide here all 30 spectra in both conditions. Note that the calculation of aperiodic exponents between 40 and 90 Hz is not affected by this change in our pipeline. Figures 2B, D+E were revised accordingly.

      We have repeated the analysis of our animal data using the ‘knee mode’ with a fitting range from 30 to 100 Hz. However, using the knee mode did not improve the goodness of fit or fitting error and, in fact, made them slightly worse (Figure S5). Based on this, we think the fixed mode would provide a more holistic model for the PSDs used in this analysis. We have now added this comparison in Figure S5 to justify the choice of the fixed mode.

      Figure R1. PSDs from all 30 hemispheres ON and OFF medication. Aperiodic fits are shown between 5-90 Hz (knee mode), which was used to calculate the power of beta peaks, and between 40-90 Hz (fixed mode), which was used to estimate the aperiodic exponent of the spectrum.

      Another example is Fig. 1C, where the spectra for high and low STN spiking epochs are identical between 10 and 20 Hz, and the difference in higher frequency range could be well-explained by an overall increase of broadband gamma power (e.g. as observed in Manning et al., J Neurosci 2012, Ray & Maunsell PLoS Biol 2011). This increase of broadband gamma power is trivially expected, as broadband gamma power is tightly coupled with population spiking rate, which was used to define the two conditions.

      We agree with the reviewer that in Figure 1C, high and low STN spiking states could well be separated by average gamma power (Figure 1E), too. However, the difference of aperiodic exponents is more prominent between both conditions (Figure 1D+E, based on p-values). What is more, in human LFP data recorded from clinical macroelectrodes, medication states can be reasonably well distinguished using the aperiodic exponent between 40-90 Hz (Figure 2C), but average gamma power does not separate both states (Figure S3A). This suggests that the aperiodic exponent reflects more than just power differences in the high gamma regions. In addition, power changes do not inevitably change the aperiodic exponent and vice versa as elaborated in (Donoghue et al., 2020).

      Manning et al., 2009 show that the power spectrum is shifted to higher power values at all observed frequencies (2-150 Hz) as firing rates increase. As the reviewer points out, power spectra of our data are almost identical between 10-20 Hz (despite the marked spiking differences) and only drift apart from > 20 Hz (Figure 1C). This is a relevant difference between our study and Manning et al., 2009 and suggests that power differences in the gamma range are not solely explained by differences in spiking. This is confirmed when cortical activity at different spikes/sec is modelled (Miller et al., 2009). The entire spectrum is shifted to higher power values if spiking rates increase.

      Ray & Maunsell, 2011 reported low (30-80 Hz) and high (> 80 Hz) gamma activity in the macaque visual cortex, with a positive correlation between spiking activity and high gamma activity. However, activities in the low gamma range (30-80 Hz), which largely overlaps with the frequency range in our study, does not necessarily correlate with firing rates.

      In conclusion, the link between gamma power and spiking activity is not as strong as alluded. Even if the change in spiking activities can lead to changes of both gamma power and the aperiodic exponent, the aperiodic exponent would still constitute a measure to separate E/I levels and medication states.

      The above consideration also speaks to a major weakness of the general approach of considering the 1/f spectrum a monolithic spectrum that can be captured by a single exponent. As the authors' Fig. 1C shows, there are distinct frequency regions within the 1/f spectrum that have different slopes. Indeed, this tripartite shape of the 1/f spectrum, including a "knee" feature around 40-70 Hz which is well visible here, was described in multiple previous papers (Miller et al., PLoS Comput Biol 2009; He et al., Neuron 2010), and have been successfully modeled with a neural network model using biologically plausible mechanisms (Chaudhuri et al., Cereb Cortex, 2017). The neglect of these fine-grained features confounds the authors' model fitting, because an overall increase in the broadband gamma power - which can be explained straightforwardly by the change in population firing rates - can result in the exponent, fit over a larger spectral frequency region, to decrease. However, this is not due to the exponent actually changing, but the overall increase of power in a specific sub-frequency-region of the broadband 1/f activity.

      We have now used the knee mode for aperiodic fits between 5 and 90 Hz when periodic beta power is calculated. We agree that this broad frequency range is unlikely to have a single 1/f component.

      We have also repeated the analysis of our animal data using the knee mode for aperiodic fits between 30 and 100 Hz (Figure S5). However, the goodness of fits had barely changed. In fact, the R2 and error become slightly worse. In addition, the knee parameter complicates interpretation of the aperiodic exponent and has to be considered along with the knee frequency. What is more, we do not see this bend around 40-70 Hz in all subjects. We show PSDs of representative LFP channels in Figure R2 and need to assert that the knee around 40-70 Hz is not a robust finding in our data set. Therefore, we chose the fixed mode for parameterisation within this frequency band.

      Please see our answer to the previous comment regarding the link between broad gamma power and changes in population firing rates.

      Figure R2. PSDs of representative PSD channels for each animal (data used in Figure 1C). The knee around 40-70 Hz is not a robust finding in all PSDs.

    1. Author Response:

      Reviewer #1:

      The manuscript by Lalanne and Li aims to provide an intuitive and quantitative understanding of the expression of translation factors (TFs) from first principles. The authors first find that the steady-state solutions for translation sub-processes are largely independent at optimality. With a coarse-grained model, the authors derive the optimal expression of translation factors for all important sub-processes. The authors show that intuitive scaling factors can explain the differential expression of translation factors.

      The results are impressive. However, as detailed in the major comments, the choice of some important parameters is not sufficiently justified in the current version. In particular, it is not clear to what extent parameter choice and rescaling was biased toward achieving a good agreement with the experimental data.

      Major comments:

      1) The work assumes that reaction times per TF are constant. That may be true at the highest growth rates, but it might not hold for conditions with lower growth rates. The data of Schmidt et al. (Nat. Biotechnol. 34, 104 (2016)) would allow to compare the predictions to proteome partitioning in E. coli across growth rates. It is ok to restrict the present work to maximal growth rates, but then this caveat should be made explicit. This last point also concerns ignoring the offset in the bacterial growth laws, which is only permissible at fast growth; that also should be stated more prominently in the manuscript; see also the legend of Fig. 1, "Our framework of flux optimization under proteome allocation constraint addresses what ribosome and translation factor abundances maximize growth rate".

      We see two distinct but related points made by the reviewer, which we address in turn.

      First, we thank the reviewer for highlighting the important and interesting point of the growth rate dependence of expression in components of the translational machinery, which encouraged us to investigate this aspect further. Leveraging other existing ribosome profiling datasets (which provide better quantitation than mass spectrometry data, see response to minor point #6 below) across multiple growth conditions and species, we compared the predicted optimal translation factor abundance in these conditions (using same formula for the optima). The new conditions and species now include E. coli at much slower growth rates, C. crescentus in two different media, and others. We found similar degrees of agreement between predicted and observed levels (shown in Figure 4-Figure supplement 1 ). One exception is aaRS in C. crescentus, and the discrepancy likely arises from a lack of quantification of tRNA abundance which is a parameter we use to predict the optimal aaRS levels.

      These additional data also provided another way to examine the model predictions. Specifically, we assessed the predicted square-root scaling of translation factor abundance with growth rate. While the expression stoichiometry remains constant across growth rates (see response to minor point #6 below), the overall abundance decreases following our predicted scaling (Figure 4-Figure supplement 2B). We now describe these new analyses and results in the main text (p. 7, line 216):

      "Analysis of tlF expression across slower growth conditions supports the derived square root dependence (Figure 4-Figure supplement 2)."

      The second point made by the reviewer pertains to the “offset in bacterial growth law” that corresponds to inactive ribosomes, which make up a substantial fraction of ribosomes at very slow growth rates. We note that the derivation of the optimality condition, equation 5, does not rely on all ribosomes being active. What is necessary is that that there is a direct proteomic trade-off between ribosomes and translation factors (see response to minor point 1 below). To rigorously place our work in the context of previous literature, we have replaced mention of ribosome with “active ribosome” (as well as in equation 1 and Figure 1), which we define as those functionally engaged in the translation cycle. We also formally include the proteome fraction of inactive ribosome in equations 2 and 3 leading to the optimality condition.

      2) The diffusion-limited regime considers only the free and idle reactants. For some translation factors, the free state only accounts for a small fraction of its total concentration. In this case, the diffusion-limited regime only explains a small fraction of the TFs. For example, most of EF-Ts may not be in its free state: in simulations with in vitro kinetics, free EF-Ts accounts for 6%-48% of its total concentration (Supplementary Data 3 in [21]). Can the authors use in vitro parameters (or other ways) to provide a rough estimate of the fraction of free TFs? Including this might allow to make quantitative statements about some of the deviations seen in Fig. 4, as most of the TFs are underestimated.

      We thank the reviewer for the suggestion that deviations between the diffusion-limited prediction and the observed abundance might be quantitatively explained by the finite catalytic activity of the respective factors. However, to do so requires accurate values of kcat, which are often not available. In the Supplement of the initial submission, we provided an example of the in vitro kcat being not compatible with the protein synthesis rates in vivo, which we have now moved to the main text (reproduced below).

      Another experimental approach that can feasibly be used to infer the bound fraction of translation factors in live cell is fluorescence microscopy of tagged proteins. Indeed, by quantifying the diffusive states of a tagged EF-Tu protein, Volkov et al (1) could estimate that <10% of EF-Tu was in its bound state, which is consistent with the agreement between our diffusion-limited prediction and observed abundance for that factor.

      We now discuss these possibilities and the facts about EF-Ts in a paragraph in the Discussion (p. 13, line 471):

      "Our optimization model can also be solved analytically in the non-diffusion-limited regime (Table 2), with the finite catalytic rate leading to an additional contribution of the form ∝ l 𝜆*/kcat. Recent detailed modeling of the EF-Ts cycle (Hu et al., 2020) estimated that a minor fraction (6 to 48%) of its abundance was in the free form in the cell, consistent with the large deviation we observe for this factor from our diffusion only prediction. However, the numerical values for these solutions are in general difficult to obtain because measurements of catalytic rates are sparse and often inconsistent with estimates of kinetics in live cells. As an example, the catalytic rates for aaRSs (Jeske et al., 2019) measured in vitro is ≈3 s-1 (median across different aaRSs), which is well below the minimal value of 15 s-1 required to sustain translation flux at the measured translation elongation rate (Appendix 5), suggesting substantial deviation between in vitro and in vivo kinetics. Although technically demanding, the fraction of free vs. bound factors can in principle be determined through live cell microscopy of tagged factors based on the partitioning the diffusive states of enzymes. Using that approach, (Volkov et al., 2018) estimated that EF-Tu was in its bound state <10% of the time (consistent with the agreement between our diffusion-limited prediction and the observed value for this factor)."

      3) "A factor-independent time τ_ind (e.g., peptidyl transfer), which does not come into play in our optimization framework, was added to account for additional steps making up the full elongation cycle." - what happened to this time? I couldn't find it anywhere else in the paper. What value was chosen, and by what rationale?

      We thank the reviewer for pointing out a lack of clarity in our presentation. The factor-independent time τind in fact did not appear in our optimization procedure at all (by virtue of obeying dτind/d𝜙TFi = 0 by definition), and was only included for generality to account for steps such as peptidyl transferase (extremely fast (2)). In line with the parsimony of our model, and to avoid any confusion, we have now removed this factor from our model and description altogether.

      4) Fig. 4: The agreement is very impressive, especially given the simplifying assumptions. However, there are some questions relating the choice of parameters.

      a) Were any parameters fitted? Which, how? What about τ_ind, for example (see above)?

      Our approach does not include any fitted parameter. We instead rely on biophysically measured quantities such as diffusion constants, protein sizes, tRNA abundances, cell doubling times (growth rates), and in vivo kinetic estimates. (In the line of Major Comment #3 above, we have removed τind for clarity.) We now include all quantities needed to predict the optimal translation factor abundances (using the formula listed in section “Summary of optimal solutions”, Table 2) in Appendix 5-Tables 1-3, including new Appendix 5-Tables 2-3, reproduced below.

      b) The "predicted" value for ribosomes is calculated from observed data (in a way described on p. S34 that I found incomprehensible, and would likely look very similar regardless of the predicted values for the TFs). According to the section "Equipartition between TF and corresponding ribosomes", the corresponding ribosomes can be quantified in the authors' scheme, too, by the method used for deriving optimal TF concentrations in equation 5. Why didn't the authors directly use the sum of these estimations as the optimal ribosome concentration in Fig. 4? In the current state, it does not seem fair to include the ribosome with the other predictions.

      We agree that the nature of the prediction for ribosomes was different than for other translation factors in our original manuscript in a way that might have lacked clarity. We now exclude ribosomes from Fig. 4 to avoid any possible confusion.

      It is interesting to directly estimate ribosome abundance using the equipartition principle. This estimation is however limited by the fact that the equipartition principle only accounts for ribosomes that are waiting for factor- dependent binding steps. Substantial fractions of ribosomes may be engaged at factor-free steps (e.g., peptidyl transfer catalyzed by ribosome itself) and factor-dependent catalytic steps after binding. Although the latter could be estimated using the observed tlF concentrations (by considering that the tlF in excess to the binding-limited predictions is sequestered in catalytic steps), the former is not estimated in our model. Furthermore, some other ribosomes may not be fully assembled yet or are inactive (3). Indeed, the predicted factor-dependent ribosome abundance using the equipartition principle with observed tlF abundances constitute a fraction (40%) of the measured total ribosome abundance.

      c) Predictions are for a specific growth rate (doubling time 21min). Was this growth rate also averaged over the three organisms? What were the individual values? These points would need to be discussed in the main text.

      The reviewer is correct. In the initial submission, we used the average growth rate of E. coli (doubling time 21.5±0.4 min), B. subtilis (doubling time 21±1 min), and V. natriegens (doubling time 19±1 min). A note has been added in the main text (p. 11, line 448):

      "We take the growth rate 𝜆* to be the average of the fast-growing species considered, corresponding to a doubling time of 21±1 min (E. coli: 21.5±1 min, B. subtilis: 21±1 min, V. natriegens: 19±1 min)."

      In addition, we now include predictions for different growth rates and compared them with several bacterial species grown in a wide range of conditions (Figure 4-Figure supplement 1) (see response to Major Comment #1 and to reviewer 2’s third request). These predictions and data are now included in Supplementary Files 1-4.

      5) In the same vein, in a footnote (!) to Table S4: "#For the ternary complex, the total mass of tRNA+EF-Tu was converted to an equivalent amino acid length." - I can see that this is important to get reasonable results, but it constitutes a major deviation from the strategy proclaimed throughout the main text: that the predicted effects result from a competition for fractions of the limited proteome. That rationale has to be changed (and explained in the main text), or the predictions in Fig. 4 should be based on calculations using only the protein part of TCs (i.e., EF-Tu).

      We are sorry for the confusion. The procedure of converting tRNA size to protein size was only used to estimate diffusion coefficients for the ternary complex (described in Appendix 5 Table 2), and not for the competition within the proteome. For factors for which no direct experimental estimates exist for in vivo diffusion coefficient, we used the relationship DA = (lTC/lA)1/3 DTC. The resulting estimated diffusion coefficients were then used to rescale the association rate inferred from in vivo measurements for the ternary complex (see response to point 6 below as well) to obtain association rates for other factors.

      6) S9: "we anchored our association rates to the estimated in vivo association rate for the ternary complex, 𝑘^𝑇𝐶 = 6.4 μM−1s−1 [13], and rescale the association rate by diffusion of related components" - in comparison, the diffusion limited k^TC is >100. If I understand this correctly, you simply rescale ALL on-rates by 100/6.4 = 15.6. If that is (qualitatively) correct, you would need to discuss this point (and the derivation of the scaling factor) explicitly in the main text.

      The reviewer is correct in his interpretation of our approach, and we are grateful for his remark as this led us to spot a mistake in our choice of parameter (capture radius R). Indeed, while the ternary complex as a largest physical dimension of about 10 nm (from structural data (4)), the appropriate capture radius is closer to 2 nm (size of the portion binding to the ribosome) (5). Correcting for the appropriate capture radius alone brings the estimate to 45 μM-1s-1 , which is however still several-fold higher than the measured value of 6.4 μM-1s-1. Whereas a part of this could be due to systematic overestimation of the diffusion coefficient, a large portion of the discrepancy is assuredly due to the many simplifying assumptions underlying the Smoluchowski estimate which serve to place an absolute upper bound on the reaction rate (perfectly/instantaneously absorbing spheres, and hence no notion of specific reaction position or molecular orientation).

      The estimate for capture radius R has been corrected (p. 47, line 1605) and a new sentence has now been included in the main text (p. 11, line 441):

      "Importantly, the absolute values of the optimal concentrations can be anchored by the association rate constant between TC and the ribosome obtained from translation elongation kinetic measurements in vivo (Dai et al., 2016). The latter was found to be several-fold smaller than the simplest and absolute upper bound of a Smoluchowski estimate of perfectly absorbing spheres (section Estimation of optimal abundances), and we assume that the rescaling factor is the same for all reactions."

    1. Author Response

      Reviewer #1 (Public Review):

      Iyer et al. address the problem of how cells exposed to a graded but noisy morphogen concentration are able to infer their position reliably, in other words how the positional information of a realistic morphogen gradient is decoded through cell-autonomous ligand processing. The authors introduce a model of a ligand processing network involving multiple ”branches” (receptor types) and ”tiers” (compartments where ligand-bound receptors can be located). Receptor levels are allowed to vary with distance from the source independently of the morphogen concentration. All rates, except for the ligand binding and unbinding rates, are potentially under feedback control. The authors assume that the cells can infer their position from the output of the signalling network in an optimal way. The resulting parameter space is then explored to identify optimal ”network architectures” and parameters, i.e. those that maximise the fidelity of the positional inference. The analysis shows how the presence of both specific and non-specific receptors, graded receptor expression and feedback loops can contribute to improving positional inference. These results are compared with known features of the Wnt signalling system in Drosophila wing imaginal disc.

      The authors are doing an interesting study of how feedback control of the signalling network reading a morphogen gradient can influence the precision of the read-out. The main strength of this work is the attention to the development of the mathematical framework. While the family of network architectures introduced here is not completely generic, there is enough flexibility to explore various features of realistic signalling systems. It is exciting to find that some network topologies are particularly efficient at reducing the noise in the morphogen gradient. The comparison with the Wnt system in Drosophila is also promising.

      Major comments:

      1) The authors assume that the cell estimates its position through the maximum a posteriori estimate, Eq.(5), which is a well-defined mathematical object; it seems to us however that whether the cell is actually capable of performing this measurement is uncertain (it is an optimal measurement in some sense, but there is no guarantee that the cell is optimal in that respect). Notably, this entails evaluating p(theta), which is a probability distribution over the entire tissue, so this estimate can not be done with purely local measurements. Can the authors comment on this and how the conclusions would change if a different position measurement was performed?

      This is indeed an important question. Our viewpoint is that if the cells were to use a maximum a posteriori (MAP) estimate (Eq. 5) to decode their positions, then what features of the channel architecture would lead to small errors in positional inference. Whether the maximum a posteriori estimate is employed by the cell, or some other estimate, is an important but difficult question to address. Our choice has been motivated by how this estimate has allowed the precise determination of developmental fates in the context of gap gene expression in Drosophila embryo [1, 2, 3]. We had earlier computed the inference error with a different estimate i.e.

      which computes the mean squared deviations of the inferred positions from the true position for each x, taking into account the entire distribution p(x∗|x). While the qualitative results are the same, the inference errors showed spurious jitters from outliers in sampling the noisy morphogen input distribution. This consistency might suggest that our qualitative results are insensitive to the choice of the estimate.

      Further, when evaluating the MAP estimate, the term p(θ) in the denominator serves as a normalisation factor to ensure p(x|θ) is a probability density. This is not strictly necessary for MAP estimation. Since p(θ) does not depend on x, the MAP estimate can be written as follows

      without the need for evaluating p(θ). In the case of a uniform prior, it would be equivalent to maximum likelihood estimate (MLE) i.e.

      2) One of the features of the signalling networks studied in the manuscript is the ability of the system to form a complex (termed a conjugated state, Q) made of two ligands L, one receptor and one nonsignalling receptor. While there are clear examples of a single ligand binding to two signalling receptors (e.g. Bmps), are there also known situations where such a complex with two ligands, one receptor, and one non-signalling receptor can form? In the Wnt example (Fig. 10a), it is not clear what this complex would be? In general, it would be great to have a more extended discussion of how the model hypothesis for the signalling networks could relate to real systems.

      This is a good suggestion. We have now added a discussion on the various possible realisations of the “conjugate state” Q in Section 3.6. We have also explored the various states in the context of different signalling contexts such as Dpp, Hh, Fgf in the Discussion section.

      The conjugated state ‘Q’ represents a combination of the readings from the two branches i.e. receptor types. This could be realised through processes like ligand exchange or complex formation, both in a shared spatial location such as a compartment. As discussed in the original manuscript (Section 3.6 of the revised manuscript), the ligand Wg in the Wg signalling pathway is internalised through two separate endocytic pathways associated with the receptor types - signalling receptor Frizzled (via Clathrin-mediated endocytosis (CME)) and non-signalling receptor HSPGs (via the CLIC/GEEC pathway (CLIC - (clathrin-independent carriers, GEEC - GPI-anchored protein-enriched early endosomal compartments)). Both pathways meet in a common early endosomal compartment where the ligands may be exchanged between the two receptors [4]. In a previous work by Hemalatha et al [4], we had shown that there are more Wg-DFz2 interactions in the endosomal compartment (measured through FRET) than on the cell surface. Therefore, the non-signalling receptors directing Wg through the CLIC/GEEC pathway titrate the amount of Wg interaction with the signalling receptor, DFz2.

      As mentioned in the original manuscript (Section 3.3 and subsection 4.2 of the Discussion in the revised manuscript), apart from Wg signalling, non-signalling receptors such as the HSPGs have also been proposed to act as co-receptors for Dpp, Hh, FGF (reviewed in [5, 6]). Although some ligands bind to the core protein of HSPG, the majority of the ligands bind to the negatively charged HS chains [7, 8]. Here, the coreceptors HSPGs aid in capturing diffusible ligands and presenting the same to signalling receptors (either on the cell surface or within endosomes).

      3) The authors consider feedback on reaction rates - it would seem natural to also consider feedback on the total number of receptors; notably, since there are known examples of receptors transcriptionally down-regulated by their ligands (e.g. Dpp/Tkv)? Also it is not clear in insets such as in Fig. 7b, if the concentration plotted corresponds to the concentration of receptors bound to ligands?

      As mentioned in the original manuscript (Section 2.2 of the revised manuscript), we have indeed considered control on reaction rates and receptors, although the control on the latter is done with the constraint of receptor profiles being monotonic. Further, while the control on reaction rates is considered via feedbacks explicitly, the control on receptors is done via an approach akin to the openloop control used in control theory. In reality, cellular control on receptors will involve transcriptional up- or down-regulation of receptor and thus warrant a feedback control approach – however, the timescales involved in such a control are different from the binding-unbinding and signalling timescales.

      Therefore, in the current work, we take the morphogen profile to be given i.e. independent of receptor concentrations, and we ask for the receptor concentrations that would help reduce the inference errors.

      Our predictions of increasing signalling receptor and decreasing non-signalling receptors in a twobranch channel architecture are consistent with the known transcriptional up-regulation of Dally/Dlp and down-regulation of Fz by Wg signalling [9].

      In a future work, we will extend the control on receptors to include feedbacks explicitly. Furthermore, the explicit feedback control on receptors may need to be considered concomitantly with the effect of receptors on morphogen dynamics (i.e. morphogen sculpting by receptors) along with the possibility of spatial correlations in receptor concentrations through neighbouring cell-cell interactions.

      As mentioned in the original manuscript (Section 2.2 of the revised manuscript), the variables ψ and φ stand for the total (bound + unbound) surface receptor concentrations of the signalling and the non-signalling receptors respectively. Therefore, the insets showing receptor profiles such as in Fig. 6b, 7b, and Appendix H Fig.8b,e correspond to the total surface receptor concentrations.

      4) The authors are clear about the fact that they consider the morphogen gradient to be fixed independently of the reaction network; however, that seems like a very strong assumption; in the Dpp morphogen gradient for instance over expression of the Tkv receptor leads to gradient shortening. Can the authors comment on this?

      This point is related to the earlier question 4. As discussed in the Discussion of the original manuscript (subsection 4.3 of the revised manuscript), we focus on finding the optimal receptor concentration profiles and reaction networks that enable precision and robustness in positional information from a given noisy morphogen profile. The framework and the optimisation scheme within it will prescribe different receptor profiles and reaction networks for different monotonically behaving, noisy morphogen profiles. It is possible that cells may achieve the optimal receptor concentrations via feedback control on production of the receptors.

      Broadly, morphogen dynamics depends on cell surface receptors, which could participate in both the inference and the sculpting of the morphogen profile, and factors independent of them such as extracellular degradation, transport and production, etc. In our present work, we have taken the receptors involved in sculpting and inference as being independent.

      In a more general case, feedback control on receptors will change the receptor concentrations as well as the morphogen profile. We are currently working on realising such a feedback control on receptors within the same broader information theoretic framework proposed in the current work.

      5) Fig. 10f is showing an exciting result on the change in endocytic gradient CV in the WT and in DN mutant of Garz. Can the authors check that the Wg morphogen gradient is not changing in these two conditions? And can they also show the original gradient, and not only its CV?

      The reviewer raises a legitimate concern – could the observed changes in CV upon perturbation of endocytic machinery be attributed to a systematic change in the mean levels of the endocytosed Wg alone? In the original manuscript (Appendix O Fig.17b,c of the revised manuscript), we show the normalised profiles of endocytic Wg in control and myr-Garz-DN cases. Here, in Fig.1 below, we show a comparison between the mean Wg concentrations (measured as fluorescence intensity) in control wing discs and discs wherein CLIC/GEEC endocytic pathway is removed using UAS-myr-Garz-DN. For clarity, we show the discs with largest and smallest fluorescence intensities from the control and myr-Garz-DN discs. It is hard to conclude that the mean concentrations are significantly different in the two cases.

      Reviewer #2 (Public Review):

      The work of Iyer et al. uses a computational approach to investigate how cells using multiple tiers of processing and multiple parallel receptor types allow more accurate reading of position from a noisy signal. Authors find that combining signaling and non-signaling types of receptors together with additional feedback increases the accuracy of positional readout against extrinsic noise that is conveyed in the morphogen signal. Further, extending the number of layers of signal processing counteracts the intrinsic stochasticity of the signal reading and processing steps. The mathematical formulation of the model is general but comprehensive in the way it handles the difference between branches and tiers for the processing of channels with feedbacks. The results of the model are presented from simple one-branch and one-tier architecture to two-branch and two-tier architecture with feedbacks. Interestingly authors find that adding more tiers results in only very small improvements in the accuracy of positional readout. The model is tested against a perturbation experiment that impairs one of the signaling branches in the Drosophila wing disc, but the comparison is only qualitative as further experiment-oriented work is planned in a separate paper.

      Strengths

      There is a clear statement of objectives, model, and how the model is evaluated. In particular, the objective is to find what number of receptor types and their concentrations for a given number of tiers and feedback types is resulting in the most accurate positional readout. The employed optimization procedure is capable to find signalling architectures that result in one cell diameter positional precision for most of the tissue with 3-4 cells at the tissue end that is most distant to the morphogen source. This demonstrates that employing additional complexity in signal processing results in a very accurate positional readout, which is comparable with estimates of positional precision obtained in other developmental systems (Petkova et al., Cell 2019, Zagorski et al., Science 2017).

      The optimal signalling architectures indicate that both signalling (specific) and non-signalling (nonspecific) receptors affect the precision of positional readout, but the contributions of each type of these receptors are qualitatively different. Even slight perturbation of signalling receptors drives the system out of optimum, resulting in a decrease in positional precision. In contrast, the non-signalling receptors could accommodate much larger perturbations. This observation could provide a biophysical explanation for how cross-talk between different morphogen species could be realized in a way that positional precision is kept at the optimum when morphogen signaling undergoes extrinsic and intrinsic perturbations.

      Last, the model formulation allows to specifically address perturbations of signalling and feedbacks, that could be explored to validate model predictions experimentally in Drosophila wing disc, but also in other developmental tissues. The authors present a proof-of-concept by obtaining consistent results of variation of output profiles in two-tier two-branch architectures with non-signaling branch removed and intensity profiles of Wg in wing disc where the CLIC/GEEC endocytic pathway was perturbed.

      Weaknesses

      The list of model parameters is long including more than 20 entries for two-tier two-branch architectures. This is expected, as the aim of the model is to describe the sophisticated signalling architecture mimicking the biological system. However, this also makes it very challenging or impossible to provide guiding principles or understanding of the system behaviour for the complete space of signalling architectures that optimize positional readout. Although, the employed optimization procedure finds solutions that exhibit very high positional accuracy, there is only very limited notion how these solutions depend on variation of different parameters. The authors do not address the following question, whether these solutions correspond to broad global optima in the space of all solutions, or were rather fine-tuned by the optimization procedure and are quite rare.

      It is unclear how contributions from the intrinsic noise affect the system behaviour compared to contributions from extrinsic noise. In principle, the two-branch one-tier architecture results in an already very accurate positional readout across the tissue. The adding of another tier seems to provide only a very weak improvement over a one-tier solution. It is possible that contributions from intrinsic noise for the investigated signalling architectures are only mildly affecting the system compared with contributions from extrinsic noise. Hence, it is difficult to assess whether the claim of reducing intrinsic noise by adding another tier is supported by the presented data, as the contributions from intrinsic noise could overall very weakly affect the positional readout.

      The optimal response of the channel to extrinsic and intrinsic noises is very distinct. As noted correctly by the reviewer, an additional tier provides only a marginal improvement in inference error due extrinsic noise (compare Fig.7 and Fig.8 in the revised manuscript). However, as shown in Fig.9c of the revised manuscript (same as in the original manuscript), adding an extra tier provides a substantial improvement in inference errors due to intrinsic noise.

      References

      [1] Gasper Tkacik, Julien O Dubuis, Mariela D Petkova, and Thomas Gregor. Positional information, positional error, and readout precision in morphogenesis: a mathematical framework. Genetics, 199:39– 59, 2015.

      [2] Mariela D Petkova, Gasper Tkacik, William Bialek, Eric F Wieschaus, and Thomas Gregor. Optimal decoding of cellular identities in a genetic network. Cell, 176:844–855, 2019.

      [3] Julien O Dubuis, Gaˇsper Tkaˇcik, Eric F Wieschaus, Thomas Gregor, and William Bialek. Positional information, in bits. Proceedings of the National Academy of Sciences, 110:16301–16308, 2013.

      [4] Anupama Hemalatha, Chaitra Prabhakara, and Satyajit Mayor. Endocytosis of wingless via a dynaminindependent pathway is necessary for signaling in drosophila wing discs. Proceedings of the National Academy of Sciences, 113:E6993–E7002, 2016.

      [5] Xinhua Lin. Functions of heparan sulfate proteoglycans in cell signaling during development. Development, 131:6009–6021, 2004.

      [6] Stephane Sarrazin, William C Lamanna, and Jeffrey D Esko. Heparan sulfate proteoglycans. Cold Spring Harbor perspectives in biology, 3(7):a004952, 2011.

      [7] Catherine A Kirkpatrick, Sarah M Knox, William D Staatz, Bethany Fox, Daniel M Lercher, and Scott B Selleck. The function of a drosophila glypican does not depend entirely on heparan sulfate modification. Developmental biology, 300(2):570–582, 2006.

      [8] Mariana I Capurro, Ping Xu, Wen Shi, Fuchuan Li, Angela Jia, and Jorge Filmus. Glypican-3 inhibits hedgehog signaling during development by competing with patched for hedgehog binding. Developmental cell, 14(5):700–711, 2008.

      [9] Kenneth M Cadigan, Matthew P Fish, Eric J Rulifson, and Roel Nusse. Wingless repression of drosophila frizzled 2 expression shapes the wingless morphogen gradient in the wing. Cell, 93(5):767–777, 1998.

    1. Author Response

      Reviewer #1 (Public Review):

      Strength: The study is summarizing a large cohort of human samples of blood, nasal swabs and nasopharyngeal aspirates. This is very uncommon as most of the time studies focus on the blood and serum of patients. Within the study, 3 monocyte and 3 DC subsets have been followed in healthy and Influenza A virus-infected persons. The study also includes functional data on the responsiveness of Influenza A virus-infected DC and monocyte populations. The authors achieved their aims in that they were able to show that the tissue microenvironment is important to understand subset specific migration and activation behavior in Influenza A virus infection and in addition that it matters with which kind of agent a person is infected. Thus, this study also impacts a better understanding of vaccine design for respiratory viruses.

      We thank Reviewer 1 for highlighting what we believe to be the greatest strengths of our study. The key feature of this study was to generate a comprehensive description of monocytes and dendritic cells (DC) in the human nasopharynx during influenza A virus infection, and to provide a comparison with healthy and convalescent individuals. Further, we wished to emphasize the value of studying the nasopharynx during respiratory viral infections, particularly in light of the ongoing COVID-19 pandemic. We describe a non-invasive method to (longitudinally) sample this anatomical compartment that allows retrieval of intact immune cells as well as mucosal fluid for soluble marker analysis. We also believe that the addition of proteomic profiles in the different compartments (new Figure 7) further highlights the importance of the tissue microenvironment.

      Weakness: In the described study, the authors used a different nomenclature to introduce the DC subsets. This is confusing and the authors should stick to the nomenclature introduced by Guilliams et al., 2014 (doi.org/10.1038/nri3712) and commented in Ginhoux et al., 2022 (DOI: 10.1038/s41577-022-00675-7 ) or at least should introduce the alternative names (cDC1, cDC2, expression markers XCR1, CD172a/Sirpa). Further, Segura et al., 2013 (doi: 10.1084/jem.20121103) showed that all three DC subpopulations were able to perform cross-presentation when directly isolated. Overall, a more up-to-date introduction would be useful.

      Reviewer 1 commented on the DC nomenclature used in the manuscript. We agree that our manuscript would benefit from appropriately updating the DC nomenclature. We therefore revised the text, and now we refer to the subsets previously described as CD1c+ and CD141+ myeloid DCs (MDC) as cDC2 and CDC1 subsets, respectively. We have also modified the text in the Introduction of the revised manuscript to reflect the same and give a more up-to-date introduction of DC subsets (marked-up version lines 75-81).

      As the data of this was already obtained in 2016-2018 it is clear that the FACS panel was not developed to study DC3. If possible, the authors might be able to speculate about the role of this subset in their data set. Moreover, there were other studies on SARS-CoV-2 infection and DC subset analyses in blood (line 87, and line 489) e.g. Winheim et al., (DOI: 10.1371/journal.ppat.1009742 ), which the authors should introduce and discuss in regard to their own data.

      As reviewer 1 accurately pointed out, the flow cytometry panel used in this study was indeed not developed to study the DC3 subset. The data was obtained in 2016-2018, and lack the typical markers used to identify the DC3 subset, such as CD163, BTLA and CD5 (Cytlak et al, https://doi.org/10.1016/j.immuni.2020.07.003, Villani et al, https://doi.org/10.1126/science.aah4573). Due to the constraints of the panel, we would not be able to accurately identify DC3s. However, in an attempt to dig deeper into the data that is available, we re-analyzed the data to identify CD14+CD1c+ cells among the lineage–HLADR+CD16–CD14+ cells, here collectively called “mo-DC”. This population is likely a combination of monocytes upregulating CD1c and bona fide DC3 expressing CD14. Accordingly, the gating strategy was updated in Supplementary figure 1 (marked-up version lines 192-194), and new data plot in Figure 2H (marked-up version lines 208-220) summarizes the changes observed in mo-DC numbers in IAV patients between blood and the nasopharynx. Parallel to the pattern seen in other DC subsets, mo-DC frequencies are reduced in blood and we observed an increase (not significant) in the nasopharynx.

      As CD88 was not included in the original panel, it was not possible to discriminate between bona fide monocytes and DC3s. We performed a staining of PBMCs (buffy coat) with CD88 (FITC) added to the original flow panel used in the study, to assess if CD88 can be helpful for future studies (Reviewer figure 1). The staining showed that some cells in the mo-DC population are CD88 positive, indicating a bona fide monocyte origin, whereas some are negative, indicating that they are bona fide DC3 expressing CD14. (Bourdely et al, https://doi.org/10.1016/j.immuni.2020.06.002).

      Reviewer figure 1. Expression of CD88 in the “mo-DC” population. Cells from a buffy coat were stained with the flow cytometry panel used in the manuscript, with the addition of CD88 (FITC). Within the CD14+CD1c+ population, the “mo-DC” population, we identified both CD88+ and CD88- cells.

      Reviewer 1 also suggested citing Winheim et al (https://doi.org/10.1371/journal.ppat.1009742), and we thank them for their suggestion. We have now cited Winheim et al, and two additional reports (Kvedaraite et al, https://doi.org/10.1073/pnas.2018587118 and Affandi et al, https://doi.org/10.3389/fimmu.2021.697840) describing a depletion of DC3s (and other DC subsets) from circulation, and functional impairment of DCs following SARS-CoV-2 infection. Further, Winheim et al observed an increased frequency of a CD163+CD14+ subpopulation within the DC3s, which correlated with systemic inflammatory responses in SARS-CoV-2 infection. We speculate that perhaps in IAV infection too, DC3s may follow the trend of other DC subsets and be found in increased numbers in the nasopharynx (marked-up version lines 75-81 and 543-552).

      Taken together, although the data are very important and very interesting, my overall impression of the manuscript is that in the era of RNA seq and scRNA seq analyses the study lacks a bit of comprehensiveness.

      The final comment from reviewer 1 is well taken, in that our study does not include RNA-seq analyses. Again, we ask Reviewer 1 to take into consideration the challenging material we worked with in our study in combination with the COVID-19 pandemic that subsequently has excluded recruitment of new influenza patients to the study. The cell numbers and viability in the nasopharyngeal aspirates limit what experimental approaches can be done simultaneously, and flow cytometry seemed to be the best approach for the study. However, we agree that in future studies, both our own and those of others in the field, will greatly benefit from single cell analysis of nasopharyngeal immune cells, and from generating transcriptomic or epigenetic profiles of these cells. Unfortunately, it is a limitation that we are currently unable to overcome within the scope of this revision. Despite this weakness, we agree with Reviewer 1 that the methods we developed and the data we generated are important and interesting.

      Moreover, we have added additional proteomics data from both NPA and plasma from influenza and COVID-19 patients, using the SomaScan platform (new Figure 7) (marked-up version lines 472-511, 738-755 and 768-792). We also included a supplementary table listing enriched pathway data from gProfiler. Briefly, our data showed sizeable changes within the blood and nasopharyngeal proteome during respiratory virus infection (IAV or SARS-CoV-2), as compared to healthy controls. Importantly, we found several differentially expressed proteins unique to the nasopharynx that were not seen in blood, and pathway analysis highlighted “host immune responses” and “innate immunity” pathways, containing TNF, IL-6, ISG15, IL-18R, CCL7, CXCL10 (IP-10), CXCL11, GZMB, SEMA4A, S100A8, S100A9. These findings are in line with our flow cytometry data, and support our hypothesis that the immunological response to viral infection in the upper airways differ from that in matching plasma samples. One of the main messages in this manuscript is the importance of looking at the site of infection, and not only at systemic immune responses to better understand respiratory viral infections in humans. We believe that the addition of the proteomics data serves to further highlight this point.

      Reviewer #2 (Public Review):

      This study aims to describe the distribution and functional status of monocytes and dendritic cells in the blood and nasopharyngeal aspirate (NPA) after respiratory viral infection in more than 50 patients affected by influenza A, B, RSV and SARS-CoV2. The authors use flow cytometry to define HLA-DR+ lineage negative cells, and within this gate, classical, intermediate and non-classical monocytes and CD1c+, CD141+, and CD123+ dendritic cells (DC). They show a large increase in classical monocytes in NPA and an increase in intermediate monocytes in blood and NPA, with more subtle changes in non-classical monocytes. Changes in intermediate monocytes were age-dependent and resolution was seen with convalescence. While blood monocytes tended to increase in blood and NPA, DC frequency was reduced in blood but also increased in NPA. There were signs of maturation in monocytes and DC in NPA compared with blood as judged by expression of HLA-DR and CD86. Cytokine levels in NPA were increased in infection in association with enrichment of cytokine-producing cells. Various patterns were observed in different viral infections suggesting some specificity of pathogen response. The work did not fully document the diversity of human myeloid cells that have arisen from single-cell transcriptomics over the last 5 years, notably the classification of monocytes which shows only two distinct subsets (intermediate cannot be distinguished from classical), distinct populations of DC1, DC2 and DC3 (DC2 and 3 both having CD1c, but different levels of monocyte antigens), and the lack of distinction provided by CD123 which also includes a precursor population of AXL+SIGLEC6+ myeloid cells in addition to plasmacytoid DC. Furthermore, some greater precision of the gating could have been achieved for the subsets presented. Specifically, CD34+ cells were not excluded from the HLA-DR+ lineage- gate, and the threshold of CD11c may have excluded some DC1 owing to the low expression of this antigen. Overall, the work shows that interesting results can be obtained by comparing myeloid populations of blood and NPA during viral infection and that lineage, viral and age-specific patterns are observed. However, the mechanistic insights for host defense provided by these observations remain relatively modest.

      We thank Reviewer 2 for their assessment of our manuscript and summarizing our key findings in their public review. As reviewer 2 noted, our study describes changes in frequencies of monocytes and DCs during acute IAV infection, in blood and in the nasopharynx. Additionally, we also demonstrate pathogen-specific changes in both compartments. Reviewer 2 also highlighted a drawback of our study- that the approach did not fully capture the breadth of monocyte and DC diversity as it currently stands. Despite this, the findings we presented here laid the groundwork for continued research and led to significant progress, including mechanistic insights (Falck-Jones et al, https://doi.org/10.1172/JCI144734 and Cagigi et al, https://doi.org/10.1172/jci.insight.151463, Havervall et al. https://doi.org/10.1056/nejmc2209651 and Marking et al. Lancet Infectious Diseases in press), in understanding the role of myeloid cells in the human airways during viral infections.

    1. Author Response

      Reviewer #1 (Public Review):

      In the article "Whole transcriptome-sequencing and network analysis of CD1c+ human dendritic cells identifies cytokine-secreting subsets linked to type I IFN-negative autoimmunity to the eye," Hiddingh, Pandit, Verhagen, et al., analyze peripheral antigen presenting cells from patients with active uveitis and control patients, and find several differentially expressed transcription factors and surface markers. In addition, they find a subset of antigen presenting cells that is decreased in frequency in patients with uveitis that in previous publications was shown to be increased in the eye of patients with active uveitis. The greatest strength of this paper is the ability to obtain such a large number of samples from active uveitis patients that are not currently on systemic therapy. While the validation experiments have methodologic flaws that decrease their usefulness, this study will still serve as a valuable resource in generating hypotheses about the pathogenesis of uveitis that can be tested in future projects.

      We thank the reviewer for the constructive comments and effort to review our work in detail.

      Since all CD36+CX3CR1+ cells are CD14+ (Figure 4D), how CX3CR1 ended up being differentially regulated in a similar way despite this population was excluded from 2nd bulk RNAseq data set should be commented on by the authors.

      We agree with reviewer that the CD14 surface expression in relation to the black-gene module and CD36+CX3CR1+ DC3s requires more detailed analysis. As described in the results section, genes in this module are linked to both CD1c+ DCs and inflammatory CD14+ monocytes, which we cannot distinguish by bulk RNA seq analysis. Therefore, we aimed to use an approach to demonstrate that the black module is a bona fide CD1c+ DC gene signature not dependent on CD14 surface expression: We showed that there was not difference in CD14+ cell fractions in the samples for RNA-seq between patient and control samples (see Fig. 1F). We now further investigated this by additional data and experiments. We now show in Figure 2 Supplement 2A that CD14 – as expected - does not correlate with the black module. To confirm this experimentally, we purified CD14+CD1c+ and CD14- CD1c+ DCs from 6 donors and subjected these to qPCR analysis to evaluate the expression of key genes from the black module (see revised Figure 2A). As illustrated in revised Figure 2 panel B, we show that the expression levels of genes, including CD36 and CX3CR1, are not significantly altered between CD14+/- CD1c+ DCs which supports that the identified gene module is also not dependent on CD14 surface expression by CD1c+ DCs. To assess if the expression of the black module was also independent of CD14 in inflammatory disease, we used RNA-seq data from FACS-sorted CD14+CD1c+ DCs and CD14-CD1c+ DCS from patients with SLE and Scleroderma (GSE13731) and confirm that the expression of the black module genes is independent from CD14 surface expression (see revised Figure 2 panel C). Finally, we removed CD14+ cells from the analysis in the 2nd bulk RNA-seq experiment to proof that indeed the black module could be perceived as being associated with uveitis independent of CD14+ expression which allowed attributing the black module to CD1c+ DCs by bulk RNA-seq analyses. Also, more detailed analysis by flow cytometry (Revised Figure 4) and scRNA-seq (Figure 6) confirm these findings. For example, we show that the CD36+ CX3CR1+ DC3s are in fact a subset of CD14+ CD1c+ DCs (Figure 2 – Supplement 2) and we show that eye-infiltrating CD1c+ DCs that harbor the black module gene signature show increased CD36 and CX3CR1, but not CD14 (Figure 6C). We have addressed all these experiments and data in the result section on page 12-13, 16,17, and in the discussion section on page 19. We hope the reviewer agrees that this has now been sufficiently addressed.

      Line 153: "...substantiates this gene set as a core transcriptional feature of human autoimmune uveitis." It would be difficult to argue that when only 137 of the 1236 DEGs from the first module are repeated in a validation data set that this is the core transcriptions set that defines the population in any uveitis. Further concerns include that the validation data set is not the same population, but rather a subset not containing CD14.

      We agree with the reviewer and have changed this in the result section to “substantiates this gene set as a robust and bona fide transcriptional feature of CD1c+ DCs in human non-infectious uveitis” at page 13. We agree that - as expected - the removal of CD14+ cells impacted the sensitivity of our analysis, but that this strategy was required to attribute the black module to CD1c+ DCs. Our data supports that the black module gene signature is not restricted to CD14+ CD1c+ DCs by demonstrating that its dysregulation in non-infectious uveitis can even be perceived in CD14- CD1c+ DCs. We show now that the replication of a fraction of genes of the black module is a consequence of sensitivity to detect differentially expressed genes (Figure 2 – Supplement 1C). – most likely due to lower cell number after sorting out CD14+ cells. We have outlined this in greater detail in the result section on page 13. We hope the reviewer agrees this has now been adequately described.

      Line 220: Notch-dll experiments: with the experiments presented it is not possible to say that the changes are due to maintenance of CD1c+ DCs without further experiments outlining what NOTCH2 signaling changes throughout time. Is the population fully developed in the first 7 days of culture prior to adding NOTCH2 or ADAM10 inhibitors? Is there more apoptosis in this pathway? Less proliferation? It would be more accurate to say that there are fewer cDC2s after 14 days of culture without speculating the cause. In this experiment it is unclear why the gate of CD141/CD1c was chosen, as this appears to be in the middle of the population. In normal PBMCs CD141+ DCs would be CD1c negative; therefore why exclude the CD141hiCD1c+ and CD141loCD1c+ populations?

      We agree with the reviewer that in the current state the additional Notch-DLL experiments are inconclusive. Based on the comments from this reviewer, we believe the most appropriate experiments would be to show changes in the surface protein expression of CD36, CX3CR1 and other key surface markers of the black module upon inhibition of NOTCH2 or ADAM10. To this end, we repeated the experiments with human CD34+ HPC-derived DCs cells to measure cell subset by flow cytometry using the same panel we used for the PBMCs. However, we experienced substantial autofluorescence of human CD34-HPC derived cultures (expected for the complex heterogeneous cellularity of these cultures and as previously reported for CD34+ cells (Donnenberg et al., Methods 2015) that introduced significant artifacts and interfere with optimal identification of CD1c+ DCs and their subsets (see example below). We were unable to control for this so far, unfortunately. Since we agree with the reviewer that in the current form the supplemental figure does not significantly contribute to the manuscript, we removed the supplemental figure entirely from the manuscript. We hope the reviewer agrees that we already provide several complementary lines of evidence that link NOTCH-RUNX3 signaling to the black module (Figure 3A-D), including RNA-seq data from NOTCH2-DLL experiments, and that the current data is sufficient to support the main conclusions of the manuscript. We hope the reviewer agrees with this proposal.

      Author response figure 1: Manual gating example of human CD34-HPC derived DCs shows substantial autofluorescence.

      Line 256: The hypothesis that the loss of CD36+CX3CR1+ cells was due to migration to the eye doesn't make sense based on volume and number of cells. 0.1% of all PBMC is ~1x107 cells, and distributed throughout the eye would give about 1.3x106 cells/mL of eye volume. This would make the eye turbid which is not consistent with birdshot chorioretinopathy and would be rare in HLA-B27 anterior uveitis and intermediate uveitis

      We agree with the reviewer and have changed this in the manuscript section to “We speculated that the decrease in blood CD36+CX3CR1+ CD1c+ DCs was in part the result of migration of these cells to peripheral tissues (lymph nodes) and that these cells may also infiltrate the eye during active uveitis.” On page 17.

      Line 267: Would have liked to see the gating of CX3CR1/CD36 cells be more consistent (there are overlapping CX3CR1+ and CX3CR1- populations in 5A, but in Figure 4 quadrants were used to define the populations when evaluating the numbers in uveitis and healthy controls. The populations in Figure 5 are more separated by CD36.

      We agree with the reviewer and have added a more detailed example of the gating strategy used to sort CD36/CX3CR1 subsets in Figure 5 – Supplement 1 including the expression of CX3CR1 and CD36 in the sorted populations.

      Line 269, IN VITRO stimulation: The experimental paradigm is set up to find a difference between cells but does not to test any biologically relevant scenario. By sorting on a surface marker, then stimulating with the ligand for that receptor, the result better proves that CD36 is important in TLR2 signaling than does it give any information on how these dendritic cells might behave in uveitis.

      We agree with the reviewer that the connection between the cytokine expression of the CD1c+ subsets and non-infectious uveitis may benefit from additional experimental data. To this end, we profiled available eye fluid biopsies and paired plasma by Olink proteomics to measure 92 immune mediators from patients and controls from this study (and several additional samples, including aqueous humor from non-inflammatory cataract controls – see revised Figure 5 panel D). This analysis shows that cytokines produced by CD36+CX3CR1+ DCs such as TNF-alpha and IL-6 are specifically increased in eye tissue of patients, but not in blood. We hope the reviewer agrees that we have provided additional experimental data that links the functional differences in DC subsets to cytokines implicated in the pathogenesis of non-infectious uveitis.

      Reviewer #3 (Public Review):

      First, a note on nomenclature. The authors use the term 'auto-immune' uveitis to encapsulate three different conditions -- HLA-B27 anterior uveitis, idiopathic intermediate uveitis, and birdshot choroidopathy. While I would agree with this terminology for the third set, there is substantial controversy as to whether HLA-B27 is truly autoimmune or autoinflammatory. Indeed, one major hypothesis is that this condition is driven by changes in gut microbiome. Intermediate uveitis is even more problematic; a substantial number of cases of this condition will turn out to be associated with demyelinating disease, which has recently been linked to Epstein Barr virus disease. To my knowledge in none of these diseases has a definitive autoantigen been identified nor passive transfer via transfusion shown; I would suggest the authors abandon this terminology and simply refer to the conditions as they are called.

      We would like to thank the reviewer for the constructive suggestions. We agree and have changed the term “autoimmune uveitis” to “non-infectious uveitis” throughout the manuscript.

      Further, it would have been very desirable to compare the DC transcriptome for the other class of uveitic disease -- infectious -- for acute retinal necrosis or similar. As well it would have been very useful to compare profiles to other, related immune-mediated diseases such as ankylosing spondylitis.

      We agree with the reviewer that comparison of DC transcriptomes is useful for interpretation of biological mechanisms involved. This is precisely the reason we use (in Figure 3) comparison of our DC transcriptomic data to well-controlled transgenic models and DC culture systems. This revealed NOTCH2-RUNX3 signaling driving the uveitis-associated CD1c+ DC signature. We have now included transcriptomic data from CD1c+ DC subsets of type I IFN diseases SLE and Systemic Sclerosis in Figure 2. Although we agree that comparison to infectious uveitis would be interesting, bulk RNA-seq data from CD1c+ DCs are – to the best of our knowledge – unfortunately not available.

      Finally, it must be noted that looking for systemic signals in dendritic gene expression may be a bit of a needle in the haystack approach. Presumably, the function of the dendritic cells in uveitis is largely centered on those cells in the eye. It would have been highly desirable to examine the expression profile of intraocular DCs in at least a subset of patients who may have come to surgery (for instance, steroid implantation or vitrectomy).

      We agree with the reviewer that analysis of blood requires enormous efforts and controls to dissect disease-relevant changes in gene profiles of cDC2 subsets. We therefore designed a strategy that focusses on replication of gene modules, use independent cohorts, and complementary immunophenotyping technologies to detect key changes in specific subsets of CD1c+ DCs in uveitis patients. To further extend these analyses, we have now also detailed our analysis of intraocular DCs using single-cell RNA seq of eye fluid biopsies (aqueous humor) of HLA-B27 anterior uveitis (identical to our “AU” group of patients). As shown in revised Figure 6, we detected eye-infiltrating CD1c+ DCs and were able to cluster cells positive for the uveitis-associated black module (revised Figure 6B), which showed – as expected - that “black-module+” CD1c+ DCs show higher expression for CD36, CX3CR1, and lower RUNX3, but not CD14 (revised Figure 6C)– closely corroborating our blood CD1c+ DC analyses. These DC3s were also found at higher frequency in the eye of patients with AU (Figure 6D). We hope the reviewer agrees we have sustainably improved the analysis of intraocular DCs and that this has now been sufficiently addressed.

      It is also problematic that no effort has been made to assess the severity of uveitis. Flares of disease can range from extremely mild to debilitating. Similarly, intermediate uveitis and BSCR can range greatly in severity. Without normalizing for disease severity it is difficult to fully understand the range of transcriptional changes between cases.

      In our view, a key limitation in determination of uveitis severity for molecular analysis is the fact that objective biomarkers that assess disease severity across uveitis entities are lacking. Currently, disease severity is dependent an array of clinical features (i.e, SUN criteria) which cannot be applied consistently to anterior, intermediate and posterior uveitis. For example, the severity of anterior uveitis is in part assessed by grading of inflammation in the anterior chamber, while the anterior chamber is (typically) not involved in Birdshot Uveitis (BU in this study). However, to allow the study of patients with high disease activity, we exclusively used systemic treatment-free patients that all had active uveitis at sampling at our academic institute, making the results highly relevant for understanding the pathophysiology of non-infectious uveitis. For this reviewer’s convenience, we have conducted additional analysis that includes key clinical parameters (anterior chamber cells, vitreous cells, and macular thickness for patients from cohort I). These data showed no clear clustering of patients based on any of the clinical parameters (revised Figure 1 -Supplement 2). We hope the reviewer agrees this has been addressed in sufficient detail.

      The use of principal component analysis for clustering may be underpowered; I would suggest the authors apply UMAP to determine if higher dimensional component analyses correlate with disease type.

      Upon request of the reviewer, we have conducted UMAP (with different tuning of hyperparameters) on the DEGs (cohort I, see image below). We believe that UMAP analysis did not provide additional insights or correlates with disease type. We hope the reviewer agrees.

      The false-discovery rate in large transcriptomic projects is challenging. While the authors are to be commended for employing a validation set, it would be useful to employ a Monte Carlo simulation in which groups are arbitrarily relabeled to determine the number of expected false discoveries within this data set (i.e. akin to Significance Analysis of Microarray techniques).

      We determined the adjusted P values via the DESeq2 package (for false-discovery rate of 5% and Benjamini-Hochberg Procedure). The results are shown in Supplemental File 1K-1M and analysis in Figure 1A.

      I do not fully understand the significance of the mouse CD11c-Runx3delta mice. It appears these data were derived from previous datasets or from bone marrow stromal line cultures. Did the authors attempt to generate autoimmune uveitis (i.e. EAU) in these animals? Without this the relevance for uveitis is unclear.

      We did not attempt to induce experimental autoimmune uveitis in CD11c-Runx3delta mice. We used transcriptomic data from dendritic cells purified from this model to show that loss of RUNX3 induces a gene signature highly reminiscent of the gene module identified in non-infectious uveitis patients. Using enrichment analysis, we show that the transcriptome of patients is highly enriched for this signature which indicates that the decreased RUNX3 observed in patients underlies the upregulation of CD36, CX3CR1 and other surface genes. In other words, we used data from transgenic models to dissect which of the altered transcription factors were driving this gene module and we identified the RUNX3-NOTCH2 axis as an important contributor.

    1. Author Response

      We thank the reviewers for their positive feedback and thoughtful suggestions that will improve our manuscript. Here we summarise our plan for immediate action. We will resubmit our manuscript once additional experiments have been performed to clarify all the major and minor concerns of the reviewers and the manuscript has been revised. At that point, we will respond to all reviewer’s points and highlight the changes made in the text.

      Reviewer #1 (Public Review):

      The authors have tried to correlate changes in the cellular environment by means of altering temperature, the expression of key cellular factors involved in the viral replication cycle, and small molecules known to affect key viral protein-protein interactions with some physical properties of the liquid condensates of viral origin. The ideas and experiments are extremely interesting as they provide a framework to study viral replication and assembly from a thermodynamic point of view in live cells.

      The major strengths of this article are the extremely thoughtful and detailed experimental approach; although this data collection and analysis are most likely extremely time-consuming, the techniques used here are so simple that the main goal and idea of the article become elegant. A second major strength is that in other to understand some of the physicochemical properties of the viral liquid inclusion, they used stimuli that have been very well studied, and thus one can really focus on a relatively easy interpretation of most of the data presented here.

      There are three major weaknesses in this article. The way it is written, especially at the beginning, is extremely confusing. First, I would suggest authors should check and review extensively for improvements to the use of English. In particular, the abstract and introduction are extremely hard to understand. Second, in the abstract and introduction, the authors use terms such as "hardening", "perturbing the type/strength of interactions", "stabilization", and "material properties", for just citing some terms. It is clear that the authors do know exactly what they are referring to, but the definitions come so late in the text that it all becomes confusing. The second major weakness is that there is a lack of deep discussion of the physical meaning of some of the measured parameters like "C dense vs inclusion", and "nuclear density and supersaturation". There is a need to explain further the physical consequences of all the graphs. Most of them are discussed in a very superficial manner. The third major weakness is a lack of analysis of phase separations. Some of their data suggest phase transition and/or phase separation, thus, a more in-deep analysis is required. For example, could they calculate the change of entropy and enthalpy of some of these processes? Could they find some boundaries for these transitions between the "hard" (whatever that means) and the liquid?

      The authors have achieved almost all their goals, with the caveat of the third weakness I mentioned before. Their work presented in this article is of significant interest and can become extremely important if a more detailed analysis of the thermodynamics parameters is assessed and a better description of the physical phenomenon is provided.

      We thank reviewer 1 for the comments and, in particular, for being so positive regarding the strengths of our manuscript and for raising concerns that will surely improve the manuscript. At this point, we propose the following actions to address the concerns of Reviewer 1:

      1) We will extensively revise the use of English, particularly, in the abstract and introduction, defining key terms as they come along in the text to make the argument clearer.

      2) We acknowledge the importance of discussing our data in more detail and we propose the following. We will discuss the graphs and what they mean as exemplified in the paragraph below.

      Regarding Figure 3 - As the concentration of vRNPs increases, we observe an increase in supersaturation until 12hpi. This means that contrary to what is observed in a binary mixture, in which the Cdilute is constant (Klosin et al., 2020), the Cdilute in our system increases with concentration. It has been reported that Cdilute increases in a multi-component system with bulk concentration (Riback et al., 2020). Our findings have important implications for how we think about the condensates formed during influenza infection. As the 8 different genomic vRNPs have a similar overall structure, they could, in theory, behave as a binary system between units of vRNPs and Rab11a. However, a change in Cdilute with concentration shows that our system behaves as a multi-component system. This means that the differences in length, RNA sequence and valency that each vRNP have are key for the integrity of condensates.

      3) The reviewer calls our attention to the lack of analysis of phase separations. We think that phase separation (or percolation coupled to phase separation) governs the formation of influenza A virus condensates. However, we think we ought to exert caution at this point as the condensates we are working with are very complex and that the physics of our system in cells may not be sufficient to claim phase separation without an in vitro reconstitution system. In fact, IAV inclusions contain cellular membranes, different vRNPs and Rab11a. So far, we can only speculate that the liquid character of IAV inclusions may arise from a network of interacting vRNPs that bridge several cognate vRNP-Rab11 units on flexible membranes, similarly to what happens in phase separated vesicles in neurological synapses. However, the speculative model for our system, although being supported by correlative light and electron microscopy, currently lacks formal experimental validation.

      For this reason, we thought of developing the current work as an alternative to explore the importance of the liquid material properties of IAV inclusions. By finding an efficient method to alter the material properties of IAV inclusions, we provide proof of principle that it is possible to impose controlled phase transitions that reduce the dynamics of vRNPs in cells and negatively impact progeny virion production. Despite having discussed these issues in the limitations of the study, we will make our point clearer.

      We are currently establishing an in vitro reconstitution system to formally demonstrate, in an independent publication, that IAV inclusions are formed by phase separation. For this future work, we teamed up with Pablo Sartori, a theorical physicist to derive in- depth analysis of the thermodynamics of the viral liquid condensates. Collectively, we think that cells have too many variables to derive meaningful physics parameters (such as entropy and enthalpy) as well as models and need to be complemented by in vitro systems. For example, increasing the concentration inside a cell is not a simple endeavour as it relies on cellular pathways to deliver material to a specific place. At the same time, the 8 vRNPs, as mentioned above, have different size, valency and RNA sequence and can behave very differently in the formation of condensates and maintenance of their material properties. Ideally, they should be analysed individually or in selected combinations. For the future, we will combine data from in vitro reconstitution systems and cells to address this very important point raised by the reviewer.

      From the paper on the section Limitations of the study: “Understanding condensate biology in living cells is physiologically relevant but complex because the systems are heterotypic and away from equilibria. This is especially challenging for influenza A liquid inclusions that are formed by 8 different vRNP complexes, which although sharing the same structure, vary in length, valency, and RNA sequence. In addition, liquid inclusions result from an incompletely understood interactome where vRNPs engage in multiple and distinct intersegment interactions bridging cognate vRNP-Rab11 units on flexible membranes (Chou et al., 2013; Gavazzi et al., 2013; Haralampiev et al., 2020; Le Sage et al., 2020; Shafiuddin & Boon, 2019; Sugita, Sagara, Noda, & Kawaoka, 2013). At present, we lack an in vitro reconstitution system to understand the underlying mechanism governing demixing of vRNP-Rab11a-host membranes from the cytosol. This in vitro system would be useful to explore how the different segments independently modulate the material properties of inclusions, explore if condensates are sites of IAV genome assembly, determine thermodynamic values, thresholds accurately, perform rheological measurements for viscosity and elasticity and validate our findings”.

      Reviewer #2 (Public Review):

      During Influenza virus infection, newly synthesized viral ribonucleoproteins (vRNPs) form cytosolic condensates, postulated as viral genome assembly sites and having liquid properties. vRNP accumulation in liquid viral inclusions requires its association with the cellular protein Rab11a directly via the viral polymerase subunit PB2. Etibor et al. investigate and compare the contributions of entropy, concentration, and valency/strength/type of interactions, on the properties of the vRNP condensates. For this, they subjected infected cells to the following perturbations: temperature variation (4, 37, and 42{degree sign}C), the concentration of viral inclusion drivers (vRNPs and Rab11a), and the number or strength of interactions between vRNPs using nucleozin a well-characterized vRNP sticker. Lowering the temperature (i.e. decreasing the entropic contribution) leads to a mild growth of condensates that does not significantly impact their stability. Altering the concentration of drivers of IAV inclusions impact their size but not their material properties. The most spectacular effect on condensates was observed using nucleozin. The drug dramatically stabilizes vRNP inclusions acting as a condensate hardener. Using a mouse model of influenza infection, the authors provide evidence that the activity of nucleozin is retained in vivo. Finally, using a mass spectrometry approach, they show that the drug affects vRNP solubility in a Rab11a-dependent manner without altering the host proteome profile.

      The data are compelling and support the idea that drugs that affect the material properties of viral condensates could constitute a new family of antiviral molecules as already described for the respiratory syncytial virus (Risso Ballester et al. Nature. 2021).

      Nevertheless, there are some limitations in the study. Several of them are mentioned in a dedicated paragraph at the end of a discussion. This includes the heterogeneity of the system (vRNP of different sizes, interactions between viral and cellular partners far from being understood), which is far from equilibrium, and the absence of minimal in vitro systems that would be useful to further characterize the thermodynamic and the material properties of the condensates.

      We thank reviewer 2 for highlighting specific details that need improving and raising such interesting questions to validate our findings. We will address all the minor comments of Reviewer 2. To address the comments of Reviewer 2, we propose the actions described in blue below each point raised that is written in italics.

      1) The concentrations are mostly evaluated using antibodies. This may be correct for Cdilute. However, measurement of Cdense should be viewed with caution as the antibodies may have some difficulty accessing the inner of the condensates (as already shown in other systems), and this access may depend on some condensate properties (which may evolve along the infection). This might induce artifactual trends in some graphs (as seen in panel 2c), which could, in turn, affect the calculation of some thermodynamic parameters.

      The concern of using antibodies to calculate Cdense is valid. We will address this concern by validating our results using a fluorescent tagged virus that has mNeon Green fused to the viral polymerase PA (PA-mNeonGreen PR8 virus). Like NP, PA is a component of vRNPs and labels viral inclusions, colocalising with Rab11 when vRNPs are in the cytosol without the need of using antibodies.

      This virus would be the best to evaluate inclusion thermodynamics, where it not an attenuated virus (Figure 1A below) with a delayed infection as demonstrated by the reduced levels of viral proteins (Figure 1B below). Consistently, it shows differences in the accumulation of vRNPs in the cytosol and viral inclusions form later in infection. After their emergence, inclusions behave as in the wild-type virus (PR8-WT), fusing and dividing (Figure 1C below) and displaying liquid properties. The differences in concentration may shift or alter thermodynamic parameters such as time of nucleation, nucleation density, inclusion maturation rate, Cdense, Cdilute. This is the reason why we performed the thermodynamics profiling using antibodies upon PR8-WT infection. For validating our results, and taking into account a possible delayed kinetics, and differenced that may occur because of reduced vRNP accumulation in the cytosol, this virus will be useful and therefore we will repeat the thermodynamics using it.

      As a side note, vRNPs are composed of viral RNA coated with several molecules of NP and each vRNP also contains 1 copy of the trimeric RNA dependent RNA polymerase formed by PA, PB1 and PB2. It is well documented that in the cytosol the vast majority of PA (and other components of the polymerase) is in the form of vRNPs (Avilov, Moisy, Munier, et al., 2012; Avilov, Moisy, Naffakh, & Cusack, 2012; Bhagwat et al., 2020; Lakdawala et al., 2014), and thus we can use this virus to label vRNPs on condensates to corroborate our studies using antibodies.

      Figure 1 – The PA- mNeonGreen virus is attenuated in comparison to the WT virus. A. Cells (A549) were infected or mock-infected with PR8 WT or PA- mNeonGreen (PA-mNG) viruses, at a multiplicity of infection (MOI) of 3, for the indicated times. Viral production was determined by plaque assay and plotted as plaque forming units (PFU) per milliliter (mL) ± standard error of the mean (SEM). Data are a pool from 2 independent experiments. B. The levels of viral PA, NP and M2 proteins and actin in cell lysates at the indicated time points were determined by western blotting. C. Cells (A549) were transfected with a plasmid encoding mCherry-NP and co-infected with PA-mNeonGreen virus for 16h, at an MOI of 10. Cells were imaged under time-lapse conditions starting at 16 hpi. White boxes highlight vRNPs/viral inclusions in the cytoplasm in the individual frames. The dashed white and yellow lines mark the cell nucleus and the cell periphery, respectively. The yellow arrows indicate the fission/fusion events and movement of vRNPs/ viral inclusions. Bar = 10 µm. Bar in insets = 2 µm.

      2) Although the authors have demonstrated that vRNP condensates exhibit several key characteristics of liquid condensates (they fuse and divide, they dissolve upon hypotonic shock or upon incubation with 1,6-hexanediol, FRAP experiments are consistent with a liquid nature), their aspect ratio (with a median above 1.4) is much higher than the aspect ratio observed for other cellular or viral liquid compartments. This is intriguing and might be discussed.

      IAV inclusions have been shown to interact with microtubules and the endoplasmic reticulum, that confers movement, and also undergo fusion and fission events. We propose that these interactions and movement impose strength and deform inclusions making them less spherical. To validate this assumption, we compared the aspect ratio of viral inclusions in the absence and presence of nocodazole (that abrogates microtubule-based movement). The data in figure 2 shows that in the presence of nocodazole, the aspect ratio decreases from 1.42±0.36 to 1.26 ±0.17, supporting our assumption.

      Figure 2 – Treatment with nocodazole reduces the aspect ratio of influenza A virus inclusions. Cells (A549) were infected PR8 WT and treated with nocodazole (10 µg/mL) for 2h time after which the movement of influenza A virus inclusions was captured by live cell imaging. Viral inclusions were segmented, and the aspect ratio measured by imageJ, analysed and plotted in R.

      3) Similarly, the fusion event presented at the bottom of figure 3I is dubious. It might as well be an aggregation of condensates without fusion.

      We will change this, thank you for the suggestion.

      4) The authors could have more systematically performed FRAP/FLAPh experiments on cells expressing fluorescent versions of both NP and Rab11a to investigate the influence of condensate size, time after infection, or global concentrations of Rab11a in the cell (using the total fluorescence of overexpressed GFP-Rab11a as a proxy) on condensate properties.

      We will try our best to be able to comply with this suggestion as we think it is important.

      Reviewer #3 (Public Review):

      This study aims to define the factors that regulate the material properties of the viral inclusion bodies of influenza A virus (IAV). In a cellular model, it shows that the material properties were not affected by lowering the temperature nor by altering the concentration of the factors that drive their formation. Impressively, the study shows that IAV inclusions may be hardened by targeting vRNP interactions via the known pharmacological modulator (also an IAV antiviral), nucleozin, both in vitro and in vivo. The study employs current state-of-the-art methodology in both influenza virology and condensate biology, and the conclusions are well-supported by data and proper data analysis. This study is an important starting point for understanding how to pharmacologically modulate the material properties of IAV viral inclusion bodies.

      We thank this reviewer for all the positive comments. We will address the minor issues brought to our attention entirely, including changing the tittle of the manuscript and we will investigate the formation and material properties of IAV inclusions in the presence and absence of nucleozin for the nucleozin escape mutant NP-Y289H.

      References

      Avilov, S. V., Moisy, D., Munier, S., Schraidt, O., Naffakh, N., & Cusack, S. (2012). Replication- competent influenza A virus that encodes a split-green fluorescent protein-tagged PB2 polymerase subunit allows live-cell imaging of the virus life cycle. J Virol, 86(3), 1433- 1448. doi:10.1128/JVI.05820-11

      Avilov, S. V., Moisy, D., Naffakh, N., & Cusack, S. (2012). Influenza A virus progeny vRNP trafficking in live infected cells studied with the virus-encoded fluorescently tagged PB2 protein. Vaccine, 30(51), 7411-7417. doi:10.1016/j.vaccine.2012.09.077

      Bhagwat, A. R., Le Sage, V., Nturibi, E., Kulej, K., Jones, J., Guo, M., . . . Lakdawala, S. S. (2020). Quantitative live cell imaging reveals influenza virus manipulation of Rab11A transport through reduced dynein association. Nat Commun, 11(1), 23. doi:10.1038/s41467-019-13838-3

      Chou, Y. Y., Heaton, N. S., Gao, Q., Palese, P., Singer, R. H., & Lionnet, T. (2013). Colocalization of different influenza viral RNA segments in the cytoplasm before viral budding as shown by single-molecule sensitivity FISH analysis. PLoS Pathog, 9(5), e1003358. doi:10.1371/journal.ppat.1003358

      Gavazzi, C., Yver, M., Isel, C., Smyth, R. P., Rosa-Calatrava, M., Lina, B., . . . Marquet, R. (2013). A functional sequence-specific interaction between influenza A virus genomic RNA segments. Proc Natl Acad Sci U S A, 110(41), 16604-16609. doi:10.1073/pnas.1314419110

      Haralampiev, I., Prisner, S., Nitzan, M., Schade, M., Jolmes, F., Schreiber, M., . . . Herrmann, A. (2020). Selective flexible packaging pathways of the segmented genome of influenza A virus. Nat Commun, 11(1), 4355. doi:10.1038/s41467-020-18108-1

      Klosin, A., Oltsch, F., Harmon, T., Honigmann, A., Julicher, F., Hyman, A. A., & Zechner, C. (2020). Phase separation provides a mechanism to reduce noise in cells. Science, 367(6476), 464-468. doi:10.1126/science.aav6691

      Lakdawala, S. S., Wu, Y., Wawrzusin, P., Kabat, J., Broadbent, A. J., Lamirande, E. W., . . . Subbarao, K. (2014). Influenza a virus assembly intermediates fuse in the cytoplasm. PLoS Pathog, 10(3), e1003971. doi:10.1371/journal.ppat.1003971

      Le Sage, V., Kanarek, J. P., Snyder, D. J., Cooper, V. S., Lakdawala, S. S., & Lee, N. (2020). Mapping of Influenza Virus RNA-RNA Interactions Reveals a Flexible Network. Cell Rep, 31(13), 107823. doi:10.1016/j.celrep.2020.107823

      Riback, J. A., Zhu, L., Ferrolino, M. C., Tolbert, M., Mitrea, D. M., Sanders, D. W., . . . Brangwynne, C. P. (2020). Composition-dependent thermodynamics of intracellular phase separation. Nature, 581(7807), 209-214. doi:10.1038/s41586-020-2256-2

      Shafiuddin, M., & Boon, A. C. M. (2019). RNA Sequence Features Are at the Core of Influenza a Virus Genome Packaging. J Mol Biol. doi:10.1016/j.jmb.2019.03.018

      Sugita, Y., Sagara, H., Noda, T., & Kawaoka, Y. (2013). Configuration of viral ribonucleoprotein complexes within the influenza A virion. J Virol, 87(23), 12879- 12884. doi:10.1128/JVI.02096-13

    1. Author Response

      Reviewer #2 (Public Review):

      1) The main limitation of this study is that the results are primarily descriptive in nature, and thus, do not provide mechanistic insight into how Ryr1 disease mutations lead to the muscle-specific changes observed in the EDL, soleus and EOM proteomes.

      An intrinsic feature of the high-throughput proteomic analysis technology is the generation of lists of differentially expressed proteins (DEP) in different muscles from WT and mutated mice. Although the definition of mechanistic insights related to changes of dozens of proteins is very interesting, it is a difficult task to accomplish and goes beyond the goal of the high-throughput proteomic analysis presented here. Nevertheless, the analysis of DEPs may indeed provide arguments to speculate on the pathogenesis of the phenotype linked to recessive RyR1 mutations. In the unrevised manuscript, we pointed out that the fiber type I predominance observed in congenital myopathies linked to recessive Ryr1 mutation are consistent with the high expression level of heat shock proteins in slow twitch muscles. However, as suggested by Reviewer 3, we have removed "vague statements" from the text of the revised manuscript, concerning major insights into pathophysiological mechanisms, since we are aware that the mechanistic information, if any, that we can extract from the data set, cannot go over the intrinsic limitation of the high-throughput proteomic technology.

      b) Results comparing fast twitch (EDL) and slow twitch (soleus) muscles from WT mice confirmed several known differences between the two muscle types. Similar analyses between EOM/EDL and EOM/soleus muscles from WT mice were not conducted.

      We agree with the point raised by the Reviewer. In the revised manuscript we have changed Figure 2. The new Figure 2 shows the analysis of differentially expressed proteins in EDL, soleus and EOMs from WT mice. We have also added 2 new Tables (new Supplementary Table 2 and 3) and have inserted our findings in the revised Results section (page, 7, lines 157-176, pages 8 and 9).

      c) While a reactome pathway analysis for proteins changes observed in EDL is shown in Supplemental Figure 1, the authors do not fully discuss the nature of the proteins and corresponding pathways impacted in the other two muscle groups analyzed.

      We have now included in the revised manuscript a new Figure 2 which includes the Reactome pathway analysis comparing EDL with soleus, EDL with EOM and soleus with EOM (panels C, F and I, respectively). We have also inserted into the revised manuscript a brief description of the pathways showing the greatest changes in protein content (page 7 line 156-175, pages 8 and 9). We agree that the data showing changes in protein content between the 3 muscle groups of the WT mice are important also because they validate the results of the proteomic approach. Indeed, the present results confirm that many proteins including MyHCIIb, calsequestrin 1, SERCA1, parvalbumin etc are more abundantly expressed in fast twitch EDL muscles compared to soleus. Similarly, our results confirm that EOMs are enriched in MyHC-EO as well as cardiac isoforms of ECC proteins. This point has been clarified in the revised version of the manuscript (page 8, lines 198-213; page 9 lines 214-228). Nevertheless, we would like to point out that the main focus of our study is to compare the changes of protein content induced by the presence of recessive RyR1 mutations.

      Reviewer #3 (Public Review):

      a) it would be useful to determine whether changes in protein levels correlated with changes in mRNA levels …….

      We performed qPCR analysis of Stac3 and Cacna1s in EDL, Soleus and EOM from WT mice (see Figure 1 below). The expression of transcripts encoding Cacna1s and Stac3 is approximately 9-fold higher in EDL compared to Soleus. The fold change of Stac3 and Cacna1s transcripts in EDL muscles is higher compared to the differences we observed by Mass spectrometry at the protein level between EDL and Soleus. Indeed, we found that the content of the Stac3 protein in EDL is 3-fold higher compared to that in soleus. Although there is no apparent linear correlation between mRNA and protein levels, we believe that a few plausible conclusions can be drawn, namely: (i) the expression level of both transcripts and proteins is higher EDL compared to EOM and soleus muscles, respectively, (ii) the expression level of transcripts encoding Stac3 correlate with those encoding Cacan1s and confirm proteomic data. In addition, the level of Stac3 transcript does not changes between WT and dHT, confirming our proteomic data which show that Stac3 protein content in muscles from dHT is similar to that found in WT littermates. Altogether these results support the concept that the differences in Stac3 content between EDL and soleus occur at both the protein and transcript levels, namely high Stac3 mRNA level correlates with higher protein content (EDL) and low mRNA levels correlated with low Stac3 protein content in Soleus muscles (see Figure 1 below).

      Figure 2: qPCR of Cacna1s and Stac3 in muscles from WT mice. The expression levels of the transcripts encoding Cacna1s and Stac3 are the highest in EDL muscles and the lowest in soleus muscles (top panels). There are no significant changes in their relative expression levels in dHT vs WT. Each symbol represents the value from of a single mouse. * p=0.028 Mann Whitney test qPCR was performed as described in Elbaz et al., 2019 (Hum Mol Genet 28, 2987-2999).

      ….and whether or not the protein present was functional, and whether Stac3 was in fact stoichiometrically depleted in relation to Cacna1s.

      We thought about this point but think that there are no plausible arguments to believe that Stac3 is not functional, one simple reason being that our WT mice do not have a phenotype which would be associated with the absence of Stac3 (Reinholt et al., PLoS One 8, e62760 2013, Nelson et al. Proc. Natl. Acad. Sci. USA 110:11881 2013).

      b) In the abstract, the authors stated that skeletal muscle is responsible for voluntary movement. It is also responsible for non-voluntary. The abstract needs to be refocused on the mutation and on what we learn from this study. Please avoid vague statements like "we provide important insights to the pathophysiological mechanisms..." mainly when the study is descriptive and not mechanistic.

      The abstract of the revised manuscript has been rewritten. In particular, we removed statements referring to important “pathophysiological mechanistic insight”.

      c) The author should bring up the mutation name, location and phenotype early in the introduction.

      In the revised manuscript we provide the information requested by the Reviewer (page 2 lines 36-38 and page 4, lines 98-102).

      d) This reviewer also suggests that the authors refocus the introduction on the mutation location in the 3D RyR1 structure (available cryo-EM structure), if there is any nearby ligand binding site, protomers junction or any other known interacting protein partners. This will help the reader to understand how this mutation could be important for the channel's function

      The residue Ala4329 is present inside the TMx (Auxiliary transmembrane helices) domain which spans from residue 4322 to 4370 and interposes structurally (des Georges A et al. 2016 Cell 167,145-57; Chen W, et al. 2020 EMBO Rep. 21, e49891). Although the structural resolution of the region has been improved (des Georges et al, 2016), parts of the domain still remain with no defined atomic coordinates, especially the region encompassing a.a. E4253 – F4540. Because of such undefined atomic coordinates of the region E4253-F4540, we are not able to determine the real orientation and the disposition of the amino acids in this region, including the A4329 residue. As reference, structure PDB: 5TAL of des Georges et al, 2016 was analyzed with UCSF Chimera (production version 1.16) (Pettersen et al. J. Comput. Chem. 25: 1605-1612. doi: 10.1002/jcc.20084).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors reveal dual regulatory activity of the complex nuclear receptor element (cNRE; contains hexads A+B+C) in cardiac chambers and its evolutionary origin using computational and molecular approaches. Building upon a previous observation that hexads A and B act as ventricular repressor sequences, in this study the authors identify a novel hexad C sequence with preferential atrial expression. The authors also reveal that the cNRE emerged from an endogenous viral element using comparative genomic approaches. The strength of this study is in a combination of in silico evolutionary analyses with in vivo transgenic assays in both zebrafish and mouse models. Rapid, transient expression assays in zebrafish together with assays using stable, transgenic mice demonstrate dual functionality of cNRE depending on the chamber context. This is especially intriguing given that the cNRE is present only in Galliformes and has originated likely through viral infection. Interestingly, there seem to be some species-specific differences between zebrafish and mouse models in expression response to mutations within the cNRE. Taken together, these findings bear significant implications for our understanding of dual regulatory elements in the evolutionary context of organ formation.

      We thank reviewer 1 for the thorough review and are very satisfied with his favorable view of our manuscript. We also thank reviewer 1 for suggestions and opportunities to further clarify some relevant issues.

      Reviewer #2 (Public Review):

      Nunes Santos et al. investigated the gene regulatory activity of the promoter of the quail myosin gene, SMyHC III, that is expressed specifically in the atria of the heart in quails. To do so, they computationally identified a novel 6-bp sequence within the promoter that is putatively bound by a nuclear receptor transcription factor, and hence is a putative regulatory sequence. They tested this sequence for regulatory activity using transgenic assays in zebrafish and mice, and subjected this sequence to mutagenesis to investigate whether gene regulatory effects are abrogated. They define this sequence, together with two additional known 6-bp regulatory sequences, as a novel regulatory sequence (denoted cNRE) necessary and sufficient for driving atrial-specific expression of SMyHC III. This cNRE sequence is shared across several galliform species but appears to be absent in other avian species. The authors find that there is sequence homology between the cNRE and several virus genomes, and they conclude that this regulatory sequence arose in the quail genome by viral integration.

      Strengths: The evolutionary origins of gene regulatory sequences and their impact on directing tissue-specific expression are of great interest to geneticists and evolutionary biologists. The authors of this paper attempt to bring this evolutionary perspective to the developmental biology question of how genes are differentially expressed in different chambers of the heart. The authors test for regulatory activity of the putative regulatory sequence they identified computationally in both zebrafish and mouse transgenic assays. The authors disrupt this sequence using deletions and mutagenesis, and introduce a tandem repeat of the sequence to a reporter gene to determine its consequences on chamber activity. These experiments demonstrate that the identified sequence has regulatory activity.

      We appreciate the thorough review of our manuscript and are very stimulated by the reviewer’s understanding of the contents we presented. We will take the liberty to comment after the reviewer’s considerations, in the hope to better answer the relevant points.

      Weaknesses: There are several decisions and assumptions that have been made by the authors, the reasons for which have not been articulated. Firstly, the rationale for the approach is not clear. The study is a follow-up to work previously performed by the authors which identified two 6-bp sequences important for controlling atrial-specific expression of the quail SMyHC III gene. This study appears to be motivated by the fact that these two sequences, bound by nuclear receptors, do not fully direct chamber-specific expression, and therefore this study aims to find additional regulatory sequences. It is assumed that any additional regulatory sequences should also be bound by nuclear receptors, and be 6-bp in length, and therefore the authors search for 6-bp sequences bound by nuclear receptors. It is not clear what the input sequence for this analysis was.

      Thank you for giving us the opportunity to clarify our rational. Our approach is justified by the natural progression in the understanding of the mechanisms involved in preferential atrial expression by the SMyHC III promoter. The groundwork was solidly laid down by Wang and colleagues (see references as below). They mapped potential atrial stimulators and ventricular repressors throughout the SMyHC III promoter using atrial and ventricular cultures, respectively. Wang and colleagues pinned down the relevant regulators. First between -840 and -680 bp upstream from the transcription start site, then inside this nucleotide stretch, then in the 72-bp fragment contained between -840 and -680 bp, then identified the ventricular repressor in Hexads A and B inside the 72-bp sequence (see references below). We, in this manuscript, contributed with the identification of Hexad C (immediately downstream of Hexads A and B) as a potential nuclear receptor binding site and as a bona fide atrial activator. In summary, our work represents a logical conclusion of previous work by Wang and colleagues. We continued the process of narrowing down sequences previously proven to contain atrial activators (that were unknown before our present work) and ventricular repressors (that were already described).

      Why did we use nuclear receptors as models for the putative cardiac chamber regulators binding to the cNRE? This is because previous work by Wang et al., 1996, 1998, 2001 and by Bruneau et al., 2001 showed that the 5’ portion of the cNRE (Hexads A and B) is indeed a hub for the integration of signals conveyed by nuclear receptors. Originally, Wang et al., 1996 showed that the VDR response element is a ventricular repressor acting via the 5’ portion of the cNRE. In a subsequent manuscript, Wang et al., 1998 showed that both RAR and VDR bind the 5’ portion of the cNRE. Bruneau et al., 2001 showed, by crossing IRX4 knockout mice with SMyHC III-HAP mice (Xavier-Neto et al., 1999), that IRX4 plays the role of a repressor of SMyHC III-HAP expression. Finally, Wang et al., 2001 showed that IRX4 interacts with RXR bound to the 5’ portion of the cNRE to inhibit ventricular expression.

      Why was the 3’ Hexad included as a research subject? Very early on in our work it was noted that 3’ of the original VDR response element (Hexads A and B), described by Wang et al., 1996 and 1998 as a ventricular repressor, there was a sequence (Hexad C) with almost equal binding potential to nuclear receptors as Hexads A and B (as initially judged on the basis of comparisons with canonical nuclear receptor binding sequences, but later on confirmed by in silico profiling of nuclear receptor binding, see below). This discovery prompted us to design point mutants in the 3’ portion of the cNRE to investigate whether Hexad C contained relevant regulators of heart chamber expression. These analyses revealed a strong atrial activator in the mouse (the missing atrial activator from Wang et al., 1996, 1998, 2001).

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Xavier-Neto, J., Neville, C. M., Shapiro, M. D., Houghton, L., Wang, G. F., Nikovits, W. Jr, Stockdale, F. E., and Rosenthal, N. (1999). A retinoic acid-inducible transgenic marker of sino-atrial development in the mouse heart. Development 126, 2677-2687.

      Bruneau, B. G., Bao, Z. Z., Fatkin, D., Xavier-Neto, J., Georgakopoulos, D., Maguire, C. T., Berul, C. I., Kass, D. A., Kuroski-de Bold, M. L., de Bold, A. J., Conner, D. A., Rosenthal, N., Cepko, C. L., Seidman, C. E., and Seidman, J. G. (2001). Cardiomyopathy in Irx4-deficient mice is preceded by abnormal ventricular gene expression. Mol. Cell Biol. 21, 1730-1736.

      Wang, G. F., Nikovits, W. Jr., Bao, Z.Z., and Stockdale, F.E. (2001). Irx4 forms an inhibitory complex with the vitamin D and retinoic X receptors to regulate cardiac chamber-specific slow MyHC3 expression. J Biol Chem. 276, 28835-28841.

      The methods section mentions the cNRE sequence, but this is their newly defined regulatory sequence based on the newly identified 6-bp sequence. It is therefore unclear why Hexad C was identified to be of interest, and not the GATA binding site for example, and whether other sequences in the promoter might have stronger effects on driving atrial-specific expression.

      As far as the existence of binding sites other than Hexads A, B, and C, we cannot, formally, exclude the possibility that there may be other relevant regulators of the SMyHC III gene. But we note that the sequences that we utilized were previously mapped through deletion mutant promoter approach by Wang et al., 1996 as the most powerful atrial activator(s) and ventricular repressor(s). We addressed these concerns in a new session entitled “Limitations of our work”.

      Concerning GATA regulation, Wang et al., 1996, 1998 characterized a GATA-4 site that drives generalized (atrial and ventricular) cardiac expression in quail cultures. However, we were unable to identify any relevant changes in cardiac expression in mutant GATA SMyHC III-HAP transgenic mouse lines produced with the same mutated promoter sequences described by Wang et al., 1996, 1998.

      Finding Hexad C as an atrial activator was an experimental finding. We identified it as such because we had two important inputs. First, in 1997, we consulted with Ralff Ribeiro, a specialist on nuclear receptors and he pointed out that downstream of the Hexad A + Hexad B VDRE/RARE (the ventricular repressor), there was a sequence with good potential for a nuclear receptor binding motif. This was exactly Hexad C. Then, we confirmed its potential for nuclear receptor binding by nuclear receptor profiling. After these two pieces of evidence, we thought that there was enough evidence to justify a mutant construct (Mut C). The experimental results we obtained in transgenic mice and zebrafish are consistent with the hypothesis that Hexad C does contain the long sought atrial activator predicted by Wang et al., 1996 in atrial cultures. This seems to be the most important atrial activator (a seven-fold activator) predicted by a deletion approach to be located between -840 and 680 bp in Wang et al., 1996.

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Indeed, the zebrafish transgenic assays use the 32 bp cNRE, while in the mouse transgenic assays, a 72 bp region is used. This choice of sequence length is not justified.

      As stated above, our rational was built as a continuation of the thorough work by Wang and colleagues in progressively narrowing down the location of relevant atrial stimulators and ventricular repressors. Throughout our work, we sought to obtain maximal coherence with previous studies (see references below) and to simultaneously probe cNRE function at an increased resolution. For that, we utilized previously described mutant SMyHC III promoter constructs (Wang et al., 1996) and introduced novel site-directed dinucleotide substitution mutants of individual Hexads in the SMyHC III promoter.

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Xavier-Neto, J., Neville, C. M., Shapiro, M. D., Houghton, L., Wang, G. F., Nikovits, W. Jr, Stockdale, F. E., and Rosenthal, N. (1999). A retinoic acid-inducible transgenic marker of sino-atrial development in the mouse heart. Development 126, 2677-2687.

      Bruneau, B. G., Bao, Z. Z., Fatkin, D., Xavier-Neto, J., Georgakopoulos, D., Maguire, C. T., Berul, C. I., Kass, D. A., Kuroski-de Bold, M. L., de Bold, A. J., Conner, D. A., Rosenthal, N., Cepko, C. L., Seidman, C. E., and Seidman, J. G. (2001). Cardiomyopathy in Irx4-deficient mice is preceded by abnormal ventricular gene expression. Mol. Cell Biol. 21, 1730-1736.

      Wang, G. F., Nikovits, W. Jr., Bao, Z.Z., and Stockdale, F.E. (2001). Irx4 forms an inhibitory complex with the vitamin D and retinoic X receptors to regulate cardiac chamber-specific slow MyHC3 expression. J Biol Chem. 276, 28835-28841.

      The decisions about which bases to mutate in the three hexads are also not clear. Why are the first two bases mutated in Hexad B and C and the whole region mutated in Hexad A? Is there a reason to believe these bases are particularly important?

      As for the reasons behind mutation of the first two bases in Hexad B and Hexad C, there were two:

      One reason is because these point mutations in Hexads B and C were planned after the publication of Wang et al., 1996, which defined the major role of Hexad A in ventricular repression. After this discovery, we decided that a higher level of resolution in our mutation approach would be a better way to search for additional regulators of SMyHC III expression, including the atrial regulator that was readily apparent from the results shown in Wang et al., 1996, but had not yet been described.

      The second reason is because the two first nucleotides (purines) in a nuclear-receptor binding hexad are critical for the interaction between target DNA and transcription factors of the nuclear receptor family. Substituting pyrimidines for purines in the two first positions of an hexad drastically reduces the affinity of a nuclear response element, and that is why we chose to use TT substitutions in our mutant constructs. Please refer to: Umesono et al., Cell, 1991 65: 12551266 for a review; Mader et al., J Biol Chem, 1993 268:591-600 for a mutation study; Rastinejad et al., EMBO J., 2000 19:1045-1054 for a crystallographic study (as well as additional references listed below).

      Mader, S., Chen, J. Y., Chen, Z., White, J., Chambon, P., and Gronemeyer, H. (1993). The patterns of binding of RAR, RXR and TR homo- and heterodimers to direct repeats are dictated by the binding specificites of the DNA binding domains. EMBO J. 12, 50295041.

      Ribeiro, R. C., Apriletti, J. W., Yen, P.M., Chin, W. W., and Baxter, J. D. (1994). Heterodimerization and deoxyribonucleic acid-binding properties of a retinoid X receptor-related factor. Endocrinology.135, 2076-2085.

      Zhao, Q., Chasse, S. A., Devarakonda, S., Sierk, M. L., Ahvazi, B., and Rastinejad, F. (2000). Structural basis of RXR-DNA interactions. J. Mol. Biol. 296, 509-520.

      Shaffer, P. L. and Gewirth, D. T. (2002). Structural basis of VDR-DNA interactions on direct repeat response elements. EMBO J. 21, 2242-2252.

      The control mutant also has effects on the chamber distribution of GFP expression.

      We note that, in the mouse, MutS did not produce any major changes from the typical wild type phenotypes linked to SMyHC III-HAP transgenic hearts. We concluded, based on our data, that the spacing mutant worked reasonably well as a negative mutation control in mice. We agree that it would have been particularly elegant if a spacing mutant designed for the mouse context worked in the exact same way in the zebrafish. However, the fact that there are slight differences in behavior for the mutated “spacing” constructs in species separated by, millions of years of independent evolution is not really surprising, given that the amino acid sequence of transcription factors can diverge and co-evolve with binding nucleotides and end up drifting quite substantially from an ancestral setup. As we reiterate below, we consider more fundamental the fact that the cNRE is actually able to bias cardiac expression towards a model of preferential atrial expression, even in the context of species separated by millions of years of independent evolution.

      Two claims in the paper have weak evidence. Firstly, the conclusion that the cNRE is necessary and sufficient for driving preferential expression in the atrium. Deleting the cNRE does reduce the amount of atrial reporter gene expression but there is not a "conversion" from atrial to ventricular expression as mentioned in line 205. Similarly, a fusion of 5 tandem repeats of the cNRE can induce expression of a ventricular gene in the atria (I'm assuming a single copy is insufficient), but does not abolish ventricular expression.

      We agree that our labelling of the cNRE is perhaps too strong, and we have toned it down accordingly to incorporate the much more equilibrated concept that the cNRE biases cardiac expression towards a model of preferential atrial expression.

      However, after the corrections suggested, we believe our assertion is now justified. We show that in the mouse, removal of the cNRE is followed by a major reduction of atrial expression coupled to the release of a low, but quite clear level of expression in the ventricles, when compared to the transgenic mouse harboring the wild type SMyHC III promoter. Note that, as expected, the relative power of the cNRE to establish preferential atrial expression is higher in the mouse (a mammal) than it is in the zebrafish (a teleost), which is biologically sound, as mammals and avians are closer, phylogenetically, than teleosts and avians. Yet, the direction of change of expression in atria and ventricles was exactly as expected, if a given motif responsible for preferential atrial expression was removed (the cNRE in our case), that is: marked reduction in atrial expression and small (albeit clearly evident) release of ventricular expression. We believe that these directional changes observed in species separated by millions of years of independent evolution constitute very good biological evidence for the role of the cNRE in driving preferential atrial expression.

      Concerning the 5x fusion of cNREs, we chose to produce this multimer for safety purposes only, because we did not want to risk performing incomplete experiments and having to repeat them. However, more to the point, we later compared the efficiency of one (1) versus five (5) cNRE copies in a cell culture context and the results were not different.

      Secondly, the authors claim that the cNRE regulatory sequence arose from viral integration into the genomes of galliform species. While this is an attractive mechanism for explaining novel regulatory sequences, the evidence for this is based purely on sequence homology to viral genomes. And this single observation is not robust as the significance of the sequence matches does not appear to be adjusted for sequence matches expected by chance. The "evolutionary pathway" leading to the direction of chamber-specific expression in the heart as highlighted in the abstract has therefore not been demonstrated.

      We agree with the reviewer. Because of space constraints, we decided to omit a substantial part of our work from the initial submission of the manuscript. We now include the relevant data in the revised version. We thus mapped the phylogenetic origins of the SMyHC III family of slow myosins and then established how and when the cNREs became topologically associated with the SMyHC III gene. To do that, we repeat masked all available sequences from avian SMyHC III orthologs. As it will become clear below, the cNRE is a rare sequence, rather than a low complexity repeat. Our search for cNREs outside of the quail context (Coturnix coturnix) followed two independent lines. First, we took a scaled, evolution-oriented approach. Initially, we looked for cNREs in species close to the quail (i.e., Galliformes) and then progressively farther, to include derived (i.e., Passeriformes) and basal avians (i.e., Paleognaths) as well as external groups such as crocodilians. While pursuing this line of investigation, it became clear that the cNRE was a rare form of repetitive element, which showed a conserved topological relationship with the SMyHC III gene (i.e., cNREs flanked the SMyHC III genes at 5’ and 3’ regions). Using this topological relationship as a character, we determined when it appeared during avian evolution and then set out to establish the likely origins of this rare repetitive motif. This search for the origins of the cNRE entailed comparisons to databases of repetitive genome elements, until the extreme telomeric nature of the SMyHC III gene became evident. This finding directed us to the fact that the hexad nature of the cNRE is reminiscent of the hexameric character of telomeric direct repeats. Because direct telomeric repeats are exactly featured in the genomes of avian DNA viruses that can infect the germline and integrate into the avian genome, we focused our search for the cNRE on the members of the subfamily Alphaherpesvirinae (Morissette & Flamand, 2010). In this search, we utilized the human herpes simplex virus 1 (HSV1) as a general model for herpes viruses, and a set of four (4) members of the Alphaherpesvirinae family that specifically infect Galliformes (i.e., GaHV1, the virus responsible for avian infectious laryngotracheitis in chicken, GaHV2, the Marek’s disease virus, GaHV3, a non-pathogenic virus, and MeHV1, the non-pathogenic Meleagrid herpesvirus 1 capable of infecting chicken and wild turkey) (Waidner et al., 2009). The search for cNREs in Alphaherpesvirinae was successful. We found six (6) cNRE hits in HSV1, one (1) in GaHV1, and none in MeHV1, GaHV2, and GaHV3. Our evolution-directed approach thus led to the direct recognition that cNREs can be found in the genomes of a family of viruses that contain members that infect avians and integrate their double-stranded DNA into the host germline (Morissette & Flamand, 2010). Therefore, as a second independent approach, as pointed out by the reviewer, we set out to further extend this proof of concept by broadening our search to all known sequenced viruses and perform an unbiased, internally consistent, and quantitative analysis of cNRE presence in viral genomes, as already reported in the initial submission of this manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript Nunes Santos et al. use a combination of computation and experimental methods to identify and characterize a cis-regulatory element that mediates expression of the quail Slow Myosin Heavy Chain III (SMyHC III) gene in the heart (specifically in the atria). Previous studies had identified a cis-regulatory element that can drive expression of SMyHC III in the heart, but not specifically (solely) in the atria, suggesting additional regulatory elements are responsible for the specific expression of SMyHC III in the atria as opposed to other elements of the heart. To identify these elements Nunes Santos et al. first used a bioinformatic approach to identify potentially functional nuclear receptor binding sites ("Hexads") in the SMyHC III promoter; previous studies had already shown that two of these Hexads are important for SMyHC III promoter function. They identified a previously unknown third Hexad within the promoter, and propose that the combination of these three (called the complex Nuclear Receptor Element or cNRE) is necessary and sufficient for specific atrial expression of SMyHC III. Next, they use experimental methods to functionally characterize the cNRE including showing that the quail SMyHC III promoter can drive green fluorescent protein (GFP) expression the atrium of developing zebrafish embryos and that the cNRE is necessary to drive the expression of the human alkaline phosphatase reporter gene (HAP) in transgenic mouse atria. Additional experiments show that the cNRE is portable regulatory element that can drive atrial expression and demonstrate the importance of the three Hexad parts. These data demonstrating that the cNRE mediates atrial-specific expression is well-done and convincing. The authors also note the possibility that the cNRE might be derived from an endogenous viral element but further data are needed to support the hypothesis that the cNRE is of viral origin.

      Strengths:

      1) The experimental work demonstrating that the cNRE is a regulatory element that can mediate the atrial-specific expression of SMyHC III.

      We thank reviewer 3 for this thorough appreciation of our work and are pleased with the evaluation of our manuscript’s potential.

      Weaknesses:

      1) Justification for use of different regulatory elements in the zebrafish (32 bp cNRE) and the mouse transgenic assays (72 bp cNRE), and discussion of the impact of this difference on the results/interpretation.

      In general, throughout our work, we sought to obtain maximal coherence with previous studies (see references below) and to simultaneously probe cNRE function at an increased resolution. For that, we utilized previously described mutant SMyHC III promoter constructs (Wang et al., 1996, 1998) and introduced novel site-directed dinucleotide substitution mutants of individual Hexads in the SMyHC III promoter. Actually, the 72-bp construct is not a 72-bp construct. It is a 5’ deletion construct that removed 72 bp from the 840 bp wild type SMyHC III construct, transforming it into a 768-bp SMyHC III promoter construct. Any directional changes observed in cardiac expression by the 768 bp as compared to the wild type promoter was interpreted in the context as missing regulators present in this 5’ 72 bp.

      Wang et al., 1996 and 1998 had already shown that Hexads A and B contained a functional VDRE/RARE, which acted as a ventricular repressor. Using the 768-bp SMyHC III promoter in mouse transgenic lines was thus a natural investigation step for us to evaluate whether regulation of the SMyHC III promoter in the mouse was similar in mice as compared to quail cardiac cultures. As shown in the manuscript, deletion of the 72 bp resulted in the release of a low level of expression in ventricles, consistent with the removal of a ventricular repressor (already described by Wang et al., 1996). It also showed a marked reduction in atrial transgene stimulation, suggesting the elimination of a very important atrial activator.

      In 1996, Wang and colleagues mapped an atrial activator to the sequence interval of 160 bp, between -840 and -680 bp (Wang et al., 1996). In our mouse transgenics, we reduced this interval to a mere 72 bp, between -840 to -768 bp. This was very useful information. Wang et al., 1998 showed that HF-1a, M-CAT, and E-box sites located between -840 and -808 bp did not influence atrial expression, so now we had a potential interval of only 40 bp between -808 and -768 bp. Further, our transgenic mice indicated that the GATA site located 3’ from Hexads A, B, and C (GATA site changed to a Sal I site at positions -749 to -743 bp) did not work as a general activator, as in the quail. Thus, the only good candidate for the atrial activator in mice inside the 40-bp fragment between -808 and -768 bp was the cNRE, with its three Hexads, A, B and the novel Hexad C. Because Hexads A plus B composed a functional VDRE/RARE that played a role in ventricular repression in the quail, we hypothesized that the atrial activator would be present in Hexad C. We then mutated the two first purines in Hexad C (the most important ones for nuclear receptor binding, please refer to Umesono et al., Cell, 1991 65: 1255-1266 for a review; Mader et al., J Biol Chem, 1993 268:591-600 for a mutation study; Rastinejad et al., EMBO J., 2000 19:1045-1054 for a crystallographic study as well as additional references listed below) and performed the experiments that demonstrated a profound reduction in atrial expression in the mouse context, revealing the long-sought atrial activator.

      Mader, S., Chen, J. Y., Chen, Z., White, J., Chambon, P., and Gronemeyer, H. (1993). The patterns of binding of RAR, RXR and TR homo- and heterodimers to direct repeats are dictated by the binding specificites of the DNA binding domains. EMBO J. 12, 50295041.

      Ribeiro, R. C., Apriletti, J. W., Yen, P.M., Chin, W. W., and Baxter, J. D. (1994). Heterodimerization and deoxyribonucleic acid-binding properties of a retinoid X receptor-related factor. Endocrinology.135, 2076-2085.

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Zhao, Q., Chasse, S. A., Devarakonda, S., Sierk, M. L., Ahvazi, B., and Rastinejad, F. (2000). Structural basis of RXR-DNA interactions. J. Mol. Biol. 296, 509-520.

      Shaffer, P. L. and Gewirth, D. T. (2002). Structural basis of VDR-DNA interactions on direct repeat response elements. EMBO J. 21, 2242-2252.

      2) Is the cNRE really "necessary and sufficient"? I define necessary and sufficient in this context as a regulatory element that fully recapitulates the expression of the target gene, so if the cNRE was "necessary and sufficient" to direct the appropriate expression of SMyHC III it should be able to drive expression of a reporter gene solely in the atria. While deletion of the cNRE does reduce expression of the reporter gene in atria it is not completely lost nor converted from atrial to ventricular expression (as I understand the study design would suggest should be the effect), similarly fusion of 5 repeats of the cNRE induces expression of a ventricular gene in the atria but also does not convert expression from ventricle to atria. This doesn't seem to satisfy the requirements of a "necessary and sufficient" condition. Perhaps a discussion of why the expectations for "necessary and sufficient" are not met but are still consistent would be beneficial here.

      We agree with your reasoning. Our description of the cNRE was perhaps too strong, and we have toned it down accordingly in the revised manuscript to incorporate a much more equilibrated concept that the cNRE biases cardiac expression towards a model of preferential atrial expression. After these corrections, we believe our novel assertion is justified. We show that in the mouse, removal of the cNRE is followed by a major reduction of atrial expression coupled to the release of a low, but quite clear level of expression in the ventricles, when compared to the transgenic mouse harboring the wild type SMyHC III promoter. Note that, as expected, the relative power of the cNRE to establish preferential atrial expression is higher in the mouse (a mammal) than it is in the zebrafish (a teleost), which is biologically sound, as mammals and avians are closer, phylogenetically, than teleosts and avians. Yet, the direction of change of expression in atria and ventricles was exactly as expected, if a given motif responsible for preferential atrial expression was removed (the cNRE in our case), that is: marked reduction in atrial expression and small (albeit evident) release of ventricular expression. We believe that these directional changes observed in species separated by millions of years of independent evolution constitute very good biological evidence for the role of the cNRE in driving preferential atrial expression.

      3) The claim that the cNRE is derived from a viral integration is not supported by the data. Specifically, the cNRE has sequence similarity to some viral genomes, but this need not be because of homology and can also be because of chance or convergence. Indeed, the region of the chicken genome with the cNRE does have repetitive elements but these are simple sequence repeats, such as (CTCTATGGGG)n and (ACCCATAGAG)n, and a G-rich low complexity region, rather than viral elements; The same is true for the truly genome. These data indicate that the cNRE is not derived from an endogenous virus but is a repetitive and low complexity region, these regions are expected to occur more frequently than expected for larger and more complex regions which would cause the BLAST E value to decrease and appear "significant”, but this is entirely expected because short alignments can have high E values by chance. (Also note that E values do not indicate statistical significance, rather they are the number of hits one can "expect" to see by chance when searching specific database.)

      We do understand the criticism, but we would like to advance another concept, based on a series of results that we obtained using bioinformatics-oriented and evolution-oriented analyses. We performed a cNRE scan in the Gallus gallus genome (galGal5), using varying numbers of nucleotide mismatches. When we searched the galGaL5 genome with coordinates matching the localization of cNREs obtained using matchPattern with up to 8 mismatches, only thirty-one (31) and thirty-four (34) hits were found in the 5’ and 3’ strands, respectively. This indicates that a cNRE match is a rather uncommon finding in the Gallus gallus genome.

      A more systematic profiling of genome occurrence versus nucleotide mismatch indicated that a significant upward inflexion in the relationship between number of cNRE hits and divergence from the original cNRE version (Coturnix coturnix) is recorded only at 12 mismatches or greater. At 8 mismatches, the total number of cNREs on each DNA strand varied little among all avian species examined, remaining close to the average (31+/- 2,2 cNREs for the 5’ strand, range 1748; 34 +/- 3,3 for the 3’ strand, range 14-64). Consistent with the idea that the cNRE is a specific regulatory motif, rather than an abundant, low complexity sequence, there are only two cNRE occurrences in chromosome 19, which harbors AMHC1, the Gallus gallus ortholog of the Coturnix coturnix SMyHC III gene.

      Figure 1: Number of cNRE hits to galGal5 according to maximum mismatches allowed: the cNRE is not an abundant low complexity sequence, but rather a rare repetitive sequence with a clear cutoff level of mismatches allowed. Consistent with this, there are only two (2) cNRE sequences in chromosome 19, the chromosome that contains the AMHC1 gene (the chicken ortholog of the quail SMyHC III gene). ## [1] chr19 [16510, 16541] * | 5’-CAAGGACAAAGAGGGGACAAAGAGGCGGAGGT-3 ## [2] chr19 [32821, 32852] * ‘5’-CAAGGACAAAGAGTGGACAAAGAGGCAGACGT-3

      In the evolutionary strategy, which we now include, we first mapped the phylogenetic origins of the SMyHC III family of slow myosins and then established how and when the cNREs became topologically associated with the SMyHC III gene. To do that we repeat masked all available sequences from avian SMyHC III orthologs. As it will become clear below, the cNRE is a rare sequence, rather than a low complexity repeat. Our search for cNREs outside of the quail context (Coturnix coturnix) followed two independent lines. First, we took a scaled, evolution-oriented approach. Initially, we looked for cNREs in species close to the quail (i.e., Galliformes) and then progressively farther, to include derived (i.e., Passeriformes) and basal avians (i.e., Paleognaths) as well as external groups such as crocodilians. While pursuing this line of investigation, it became clear that the cNRE was a rare form of repetitive element, which showed a conserved topological relationship with the SMyHC III gene (i.e., cNREs flanked the SMyHC III genes at 5’ and 3’ regions). Using this topological relationship as a character, we determined when it appeared during avian evolution, and then set out to establish the likely origins of this rare repetitive motif. This search for the origins of the cNRE entailed comparisons to databases of repetitive genome elements, until the extreme telomeric nature of the SMyHC III gene became evident. This finding directed us to the fact that the hexad nature of the cNRE is reminiscent of the hexameric character of telomeric direct repeats. Because direct telomeric repeats are exactly featured in the genomes of avian DNA viruses that can infect the germline and integrate into the avian genome (Morissette & Flamand, 2010), we focused our search for the cNRE on the members of the subfamily Alphaherpesvirinae. In this search, we utilized the human herpes simplex virus 1 (HSV1) as a general model for herpes viruses and a set of four (4) members of the Alphaherpesvirinae family that specifically infect Galliformes (i.e., GaHV1, the virus responsible for avian infectious laryngotracheitis in chickens, GaHV2, the Marek’s disease virus, GaHV3, a non-pathogenic virus and MeHV1, the non-pathogenic Meleagrid herpesvirus 1 capable of infecting chicken and wild turkey) (Waidner et al., 2009). The search for cNREs in Alphaherpesvirinae was successful. We found six (6) cNRE hits in HSV1 and one (1) cNRE was detected in GaHV1, but none in MeHV1, GaHV2, and GaHV3.

      Our evolution-directed approach thus led to the direct recognition that cNREs up to a cutoff mismatch value of 11 can be found in the genomes of a family of viruses that contain members that infect avians and integrate their double-stranded DNA into the host germline. Therefore, as a second independent approach, we set out to extend this proof of concept by broadening our search to all known sequenced viruses to perform an unbiased, internally consistent, and quantitative analysis of cNRE presence in viral genomes, as already reported in the initial submission of this manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this study, Kuppan, Mitrovich, and Vahey investigated the impact of antibody specificity and virus morphology on complement activation by human respiratory syncytial virus (RSV). By quantifying the deposition of components of the complement system on RSV particles using high-resolution fluorescence microscopy, they found that antibodies that bind towards the apex of the RSV F protein in either the pre- or post-fusion conformation activated complement most efficiently. Additionally, complement deposition was biased towards globular RSV particles, which were frequently enriched in F in the post-fusion conformation compared to filamentous particles on which F exists predominantly in the pre-fusion conformation.

      Strengths:

      1) While many previous studies have examined the properties of antibodies that impact Fc-mediated effector functions, this study offers a conceptual advance in its demonstration that heterogeneity in virus particle morphology impacts complement activation. This novel finding will motivate further research on this topic both in the context of RSV and other viral infections.

      2) The use of site-specific labeling of viral proteins and high-resolution fluorescence microscopy represents a technical advance in monitoring interactions among different components of antiviral immune responses at the level of single virus particles.

      3) The paper is well written, data are clearly presented and support key claims of the paper with caveats appropriately acknowledged.

      We appreciate the reviewer’s supportive comments. In our revised manuscript, we have focused on improving clarity regarding the minor weaknesses noted below.

      Minor weaknesses:

      Working models and their implications could be clarified and extended. Specifically:

      1) The finding that globular particles enriched in F proteins in the post-fusion conformation (Fig 3F) are dominant targets of complement activation as measured by C3 deposition by not only post-F- but also pre-F-specific antibodies (Fig 4B, left) is interesting. This is despite the fact that, as expected, pre-F antibodies bind less efficiently to globular particles (Fig 4B, right). How do the authors reconcile these observations, given that C3 deposition seems to be IgG-concentration-dependent (Fig 2E)?

      The reviewer raises an excellent point: globular particles, which accumulate as the virus ages, contain more post-F and less pre-F than particles that have recently been shed from infected cells. These ‘aged’ particles nonetheless accumulate more C3 when incubated with pre-F mAbs than ‘younger’ particles, where the proportion of pre-F is higher. We attribute this to the lower surface curvature of globular particles: they accumulate more C3 in the presence of pre-F mAbs in spite of the reduced availability of pre-F epitopes. Figure 1C and 1F help to support this point. This data shows C3 deposition driven by different antibodies bound to particles enriched in either pre-F (Figure 1C) or post-F (Figure 1F). Importantly, for this experiment the conversion to post-F was driven in such a way that virion morphology is preserved (Figure 1E). In this case, we see a clear reduction in C3 deposition by pre-F mAbs on post-F particles (e.g. for CR9501, the percentage of C3-positive particles drops from 24% on pre-F virus to 6% on post-F-enriched virus). This demonstrates that, in the absence of other changes, conversion of pre-F to post-F reduces complement deposition by pre-F specific mAbs.

      Similarly, the reviewer correctly points out that reduced levels of antibody binding lead to lower levels of C3 deposition (Figure 2E); however, as in Figure 1, this data is collected from particles with the same morphologies. Thus, in the absence of additional factors, reduction in mAbs bound to pre-F leads to a reduction in C3 deposition driven by these mAbs. The fact that we observe the opposite trend when changes in particle morphology accompany changes in post-F abundance points to an important role for particle shape in activation of the classical pathway.

      2) Based on data in Figure 5-figure supplement 2, the authors argue that "large viruses are poised to evade complement activation when they emerge from cells as highly-curved filaments, but become substantially more susceptible as they age or their morphology is physically disrupted." Could the increase in C3 deposition be alternatively explained by a higher density of F proteins on larger particles instead of / in addition to a larger potential decrease in membrane curvature?

      We agree that the density of F on a virus – the number of F trimers per unit surface area - likely contributes to the efficiency of C3 deposition. In Figure 6 – figure supplement 2 (Figure 5 – figure supplement 2 in the original submission), we control for this potential effect by comparing viruses that have the same amount of F (as measured by fluorescence intensities of SrtA-labeled F) that are either in filamentous form or globular form (induced through osmotic swelling). The total amount of F per virus is preserved during swelling, and the membrane surface area will remain constant due to the limited ability of lipid bilayers to stretch7. As a result, the input material for these comparisons is the same in terms of F trimers per unit area, yet the C3:F ratio differs substantially. This leads us to conclude that the differences must be attributable to factors other than the density of F. Importantly, this does not mean that the amount of F per unit surface area does not matter for C3 deposition – only that this is not the effect we are observing here. We have added text (Line 299) to help clarify this point: “This effect is unlikely to arise due to changes in the abundance or density of F in the viral membrane, both of which will remain constant following swelling. Similarly, it does not appear to be purely related to size, as larger viral filaments show similar C3:F ratios as smaller viral filaments.”

      3) In the discussion, the authors acknowledge that the implications based on the findings are speculative. However, more clarity on the basis of these speculative models would be useful. For example, it is not clear how the findings directly inform the presented model of immunodominance hierarchies in infants.

      We agree that this was unclear in the original manuscript. We have rewritten paragraph 4 of the Discussion to clarify how our results may contribute to the changes in immunodominance that have been observed in RSV between infants and adults.

      Reviewer #2 (Public Review):

      This is an intriguing study that investigates the role of virus particle morphology on the ability of the first few components in the complement pathway to bind and opsonize RSV virions. The authors use primarily fluorescence microscopy with fluorescently tagged F proteins and fluorescently labeled antibodies and complement proteins (C3 and C4). They observed that antibodies against different epitopes exhibited different abilities to induce C3 binding, with a trend reflecting positioning of IgG Fc more distal to the viral membrane resulting in better complement "activation". They also compared the ability of C3 to deposit on virus produced from cells +/- CD55, which inhibits opsonization, and showed knockout led to greater C3 binding, indicating a role for this complement "defense protein" in RSV opsonization. They also examined kinetics of complement protein deposition (probed by C4 binding) to globular vs filamentous particles, observing that deposition occurred more rapidly to non-filaments.

      A better understanding of complement activation in response to viruses can lead to a more comprehensive understanding of the immune response to antigen both beneficial and detrimental, when dysfunctional, during infection as well as mechanisms of combating the viral infection. The study provides new mechanistic information for understanding the properties of an enveloped virus that can influence complement activation, at least in an in vitro setting. It remains to be determined whether these effects manifest in the considerably more complex setting of natural infection or even in the presence of a polyclonal antibody mixture.

      The studies are elegantly designed and carefully executed with reasonable checks for reproducibility and controls, which is important especially in a relatively complex and heterogeneous experimental system.

      We thank the reviewer for the insightful comments. We have revised the manuscript to help to clarify points of confusion and to address some of the technical points raised here.

      Specific points:

      1) "Complement activation" involves much more than C3 or C4 binding. Better to use more specific terminology relating to the observable (i.e. fluorescently labeled complement component binding)

      We agree with the reviewer. We have revised the manuscript throughout to make our language more accurate and precise.

      2) What is the rationalization for concentrations of antibodies used? What range was tested, and how dependent on antibody concentration were the observed complement deposition trends? How do they relate to physiological concentrations, and how would the presence of a more complex polyclonal response that is typically present (e.g. as the authors noted, the serum prior to antibody depletion already mediates complement activation) affect the complement activation trends? The neat, uniform display of Fc for monoclonals that were tested is likely to be quite garbled in more natural antibody response situations. This should be discussed.

      We have added discussion of antibody concentrations and possible differences between monoclonal and polyclonal responses to the revised manuscript. Below, we address the specific questions raised here by the reviewer.

      We chose to use antibody concentrations that are comparable to the concentrations of dominant clonotypes in post-vaccination serum1. Our goal in selecting relatively high antibody concentrations for our experiments was to focus on understanding the capacity of an antibody to drive complement deposition when it has reached maximum densities on RSV particles. This is discussed starting on Line 125 of Results, and in paragraph 2 of Discussion. Experiments testing a range of antibody concentrations would be valuable, but are likely to strongly reflect differences in the binding affinities of these antibodies, which have been characterized previously.

      Although we have not performed titrations for each of the antibodies tested due to the large number of conditions needed and the limited throughput of our experimental approach, the manuscript does present a dilution series for CR9501, the IgG1 mAb with the greatest potency in driving C3 deposition among those tested here. This data (shown in Figure 3E & F in the revised manuscript) shows that as the amount of antibody added in solution decreases over a 16-fold range, C3 deposition decreases as well. The decrease in C3 deposition is roughly commensurate with the reduction in antibody binding, reaching levels that are just above background at an antibody concentration of ~0.6μg/ml (1:800 dilution). We think it is likely that other activating antibodies would show similar trends, while antibodies that do not activate the classical pathway at saturating concentrations would be unlikely to do so across a range of lower concentrations.

      We agree with the reviewer that complement deposition driven by polyclonal antibodies is more complex than the monoclonal responses studied here. As discussed in paragraph 2 of our revised Discussion, one effect that polyclonal serum might have is to increase the density of Fcs on the virus by providing antibody mixtures that bind to multiple non-overlapping antigenic sites. We speculate that this would generally increase complement deposition, provided that sufficient antibodies are present that bind to productive antigenic sites (e.g. sites 0/ , II, and V).

      Finally, we note that we observe a similar phenomenon where globular particles are preferentially opsonized with C3 in our experiments with polyclonal serum where IgG and IgM have not been depleted (Figure R1). The major limitation of this data – which is resolved by using monoclonal antibodies – is the difficulty of determining to what extent this bias arises due to the epitopes targeted by the polyclonal serum versus the intrinsic sensitivity of the virus particles.

      Figure R1: RSV opsonized with polyclonal human serum. A similar bias towards globular particles (white dashed circles) is observed as in experiments with monoclonal antibodies.

      3) Are there artifacts or caveats resulting from immobilization of virus particles on the coverslips?

      As pointed out by the reviewer, a few possible artifacts or caveats could arise due to the immobilization of viruses on coverslips. These include (1) spurious binding of C1 or other complement components to the immobilizing antibody (3D3); (2) reduced access to viral antigens as a result of immobilization; and (3) inhibition of antibody-induced viral aggregation. We are able to rule out issues associated with (1), because we do not see attachment of C1 or C3 to the coverslip (i.e. outside regions occupied by virus particles). This is consistent with the fact that the antibodies are immobilized on the surface via a C-terminal biotin attached to the heavy chain, which would limit access for C1 binding and prevent the formation of Fc hexamers.

      Immobilization on coverslips could reduce the accessibility of a portion of the virus for binding by antibodies and complement proteins. This could effectively shield a portion of the viral surface from assembly of an activating complex, which we estimate requires ~35nm of clearance above the targeted epitope on F8. Importantly, the fraction of the viral surface area that would be shielded would vary for filaments and spheres; to determine if this could influence our results, we calculated the expected magnitude of this effect (Figure R2). To do this, we modeled the virus as being tethered to the surface via a 25nm linkage. This accounts for the length of the biotinylated PEG (~5-15nm for PEG2K, depending on the degree of extension), streptavidin (~5nm), and the anti-G antibody (~10-15nm including the biotinylated C-terminal linker). Although limited structural information is available for RSV G, the ~100 residue, heavily glycosylated region between the viral membrane and the 3D3 epitope likely extends above the height of F (~12nm). Our model assumes that a shell of thickness d surrounding the virus is necessary for antibody-C1 complexes to fit without clashing with the surface (this shell is shaded in gray in the schematic from Figure R2). Tracing the angles at which this shell clashes with the coverslip allows us to calculate the fraction of total surface area that is inaccessible for activation of the classical pathway. The results are plotted on the right side of Figure R2. The relative surface area accessible to a 35nm activating antibody-C1 complex differs between a filament and a sphere of equivalent surface area by about 15%. We conclude that this difference is modest compared to the ~5-fold difference in deposition kinetics we observe between viral filaments and spheres (Figure 4), or the 3- to 10-fold difference in relative C3 deposition we observe on larger filamentous particles after conversion to spheres (Figure 6 – figure supplement 2C).

      Finally, by performing experiments on immobilized viruses, we eliminate the possibility for antibody-dependent particle aggregation. While this was necessary for us to get interpretable results, the formation of viral aggregates could affect the dynamics and extent of complement deposition. For example, activation of the classical pathway on one particle in an aggregate could spread to non-activating particles through a “bystander effect”, as has been reported in other contexts9. We are interested in this question and have begun preliminary experiments in this direction; however, we believe that a definitive answer is outside the scope of this current work. To alert readers to this consideration, we have added this to paragraph 2 of the revised Discussion (Line 359).

      Figure R2: Estimating the surface accessibility of RSV particles bound to coverslips. Definition of variables: af: radius of cylindrical RSV filament; as: radius of spherical RSV particle of equivalent surface area (see Figure 6 – figure supplement 2A); d: distance needed above the viral surface to accommodate IgG-C1 activating complexes; h: height of viral surface above the coverslip; L: length of the viral filament.

      4) How is the "density of antigen" quantitated? What fraction of F or G is labeled? For fluorescence intensity measurements in general, how did the authors ensure their detection was in a linear sensitivity range for the detectors for the various fluorescent channels? Since quantitation of fluorescence intensities is important in this study, some discussion in methods would be valuable.

      We have performed this important additional characterization of our fluorescence system and our overall labeling and quantification strategy to address these concerns. The results of this characterization are now included in two new figure supplements in the revised manuscript (Figure 1 – figure supplements 2 & 3).

      5) The authors also show that the particle morphology, whether globular or filamentous, as well as relative size and resulting apparent curvature, correlate with ability of C3 to bind. Some link to the abundance of post-fusion F (post-F) is examined and discussed, but I found the back and forth discussion between morphology, C3 binding, and post-F abundance to be confusing and in need of clarification and streamlining. Is there a mechanistic link between morphology changes and post-F level increases? Are the two linked or coincidental (for example does pre-F interaction with matrix help stabilize that conformation, and if lost lead to spontaneous conversion to post-F?). Please clarify.

      Specifically, we have separated the discussion of pre-F versus post-F abundance and particle morphology into two different sections in Results, and we have rearranged Figures 4 and 5 (Figures 3 and 4 in the original submission) to improve clarity.

      Regarding the question of whether changes in morphology and the pre-F to post-F conversion are coincidental or mechanistically linked: the answer is not entirely clear, although we have collected new data that suggests a connection. We first want to note that the two effects are at least partly separable: brief treatment with a low osmolarity solution causes particle shape to change while preserving pre-F (Figure 6A & B), whereas treating with an osmotically balanced solution with low ionic strength converts pre-F to post-F without affecting virus shape (Figure 1E). However, we were motivated by the reviewer’s questions to look into this further. To determine if the change in viral shape may serve to destabilize the pre-F conformation over time, we compared the relative amounts of pre-F and post-F present in particles that were osmotically swollen to those that were not at 0h and at 24h. In these experiments, particles were swollen using a brief (~1 minute) exposure to low osmolarity conditions before returning them to PBS (Figure R3, left). As expected, we observe no immediate change in pre-F abundance following the brief osmotic shock (Figure R3, right: 0h time point), consistent with Figure 6B. After incubating the particles an additional 24h at 37oC, the post-F-to-pre-F ratio is ~3.5-fold higher in osmotically-swollen particles than in those where filamentous morphology was initially preserved (Figure R3, right: 24h time point). This supports the reviewer’s suggestion that interactions with the matrix may help to stabilize F in the prefusion conformation, since the conversion to post-F is faster when this interaction is disrupted. Whether or not this has any relevance for RSV entry into cells remains to be determined; however, it is worth noting that we observed no clear loss or gain of infectivity in RSV particles following osmotic swelling (Figure 6 – figure supplement 1A). Since this result may be of interest to readers, we have included this new data in Figure 6 – figure supplement 1B, and it is discussed briefly in Results (Line 250).

      Figure R3: Determining stability of pre-F following matrix detachment. Left: Experimental design. Right: Comparison of pre-F stability on untreated particles (gray) and particles subjected to brief osmotic swelling (magenta). Distributions show the ratio of post-F (ADI-14353) to pre-F (5C4) intensities per particle, combined for four biological replicates, sampled at 0h (immediately after swelling) and after an additional incubation at 37oC for 24h. Black points show median values for each individual replicate. P-values are determined from a two-sample T test.

      6) Since their conclusion is that curvature of the virus surface is a major influence on the ability of complement proteins to bind, I feel that some effort at modeling this effect based upon known structures is warranted. One might also anticipate then that there would be some epitope-dependent effect as a result of changes in curvature that may lead to an exaggeration of the epitope-specific effects for more highly curved particles perhaps than those with lower curvature? Is this true?

      The reviewer raises two excellent points: that it may be possible to gain insight into the mechanisms through which curvature dictates C1 binding and other aspects of complement activation through structural modeling, and that such a model may help to identify specific epitope effects that could contribute to curvature dependence.

      We developed simulations based on the geometry of RSV, F, and hexameric IgG to try to better understand how curvature may influence initiation of the classical pathway. This model is described in the Methods section (Modeling IgG hexamers on curved surfaces), and the results are discussed in the final two paragraphs of the Results section. In addition, we have included a new figure (Figure 7) to summarize the model’s predictions. This model corroborates the curvature sensitivity of IgG hexamer formation and suggests a possible intuitive explanation for our findings: high curvature effectively increases the distance between epitopes that sit high above the viral membrane, decreasing the likelihood of hexamer formation (Figure 7D). Regarding epitope specific effects, this model suggests that the further the epitope is above the viral membrane, the greater the effect that decreasing curvature will have. However, we find that epitopes closer to the membrane (e.g. those bound by 101F or ADI-19425) are overall very inefficient at activating the classical pathway, potentially due to steric obstruction of the formation of IgG hexamers. Thus, there may be an inherent tradeoff between overcoming steric obstruction (by binding to epitopes distal to the membrane) and sensitivity to surface curvature.

      It is important to note that this model is reductionist and does not include detailed structural information. Additional factors may be important for considering epitope-specific effects. For example, antibodies that bind equatorially on F (e.g. ADI-19425, which binds to antigenic site III), show minimal complement deposition in our experiments. However, particles whose curvature approaches the diameter of hexameric IgG or IgM (~20nm) may display these epitopes in a manner that is more accessible. If the curvature necessary to observe such an effect falls outside of the biologically accessible range, it would not be observable in our experiments. Nonetheless, it is possible that a different set of antibodies may drive complement deposition on highly-curved nanoparticle vaccines that are in development10. We have added this important point to the second paragraph of the Discussion.

      7) Line 265: it would be useful to confirm the increase C1 binding as a function of morphology as was done for antibody-angle of binding experiments.

      We believe that this data is shown in Figure 6B (Figure 5B in the original manuscript).

      Reviewer #3 (Public Review):

      Overall the manuscript is clearly written and the data are displayed well, with helpful diagrams in the figures to illustrate assays and RSV F epitopes. The engineering of the RSV strain to include a fluorescent reporter and tags on F and G that serve as substrates for fluorophore attachment is impressive and is a strength. The RSV literature is well cited and the interpretation of the results is consistent with structure/function data on RSV F and its interaction with antibodies. This reviewer is not an expert on the experiments performed in this manuscript, but they appear to be rigorously performed with appropriate controls. As such, the conclusions are justified by the data. One weakness is the extent to which the results regarding virion morphology are biologically relevant. Non-filamentous forms of the virion are generally obtained only in vitro as a result of virion purification or biochemical treatment. However, these results may be relevant for certain vaccine candidates, including the failed formalin-inactivated RSV vaccine that was evaluated in the late 1960s and caused vaccine-enhanced disease upon natural RSV infection.

      Thank you for these suggestions, which have helped us to better place our results regarding RSV morphology in the context of prior work. We agree with the reviewer that non-filamentous RSV particles are commonly obtained in vitro, and that this morphology does not reflect the structure of the virus as it is budding from infected cells. Our work has characterized the transition from filament to globular / amorphous form, with the finding that it can occur rapidly upon physical or chemical perturbations, as well as more gradually during natural aging: i.e. in the absence of handling or purification. We are also able to detect globular particles accumulating in cultured A549 cells, where no handling has occurred prior to observation (Figure 5 – figure supplement 1). While we do not currently know how well this reflects the tendency of RSV to undergo conversion from filament to sphere in vivo, we propose that it is plausible that such a transformation could occur. To distinguish between what we demonstrate and what we speculate, we write (Line 401): “Although more work is needed to understand the prevalence of globular particles during in vivo infection, our observations that these particles accumulate over time through the conversion of viral filaments – even under normal cell culture conditions - suggest that their presence in vivo is feasible, where the physical and chemical environment would be considerably harsher and more complex.”

      We agree with the reviewer that our results may have relevance towards understanding the failed formalin-inactivated vaccine trial. We have added this to paragraph 5 of the Discussion section.

    1. Author Response:

      Reviewer #2 (Public Review):

      1. The novelty of the current observation of two types of links is overstated, for example, in the abstract: "Our data reveal the existence of two molecular connectors/spacers which likely contribute to the nanometer scale precise stacking of the ROS disks" (Line 25). In fact, both of these links have been shown before (Usukura and Yamada, 1981; Roof and Heuser, 1982; Corless and Schneider, 1987; Corless et al., 1987; Kajimura et al., 2000). These previous studies deserve to be recognized. Of special note is the paper by Usukura and Yamada whose images of the disc rim connectors are by no means less convincing than shown in the current manuscript. On the other hand, the novelty and impact of the data related to peripherin appears to be understated, particularly in the abstract.

      We changed the abstract line 27 to: “Our data confirm the existence of two previously observed molecular connectors …”, cite the recommended references in the introduction (lines 54-55), the results (lines 131-132), and the discussion (lines 282/285). To highlight the previous reports, we rephrased the sentence in lines 132-133, “In agreement with these previous findings, we observed structures that connect membranes of two adjacent disks …”; the discussion is rephrased in lines 280-281, “Similar connectors have been observed previously ...” and “… and their statistical analysis confirmed the existence of two distinct connector species.”, and in lines 291-292, “Based on previous studies combined with our quantitative analysis, we put forward a hypothesis for the molecular identity of the disk rim connector which agrees in part with recent models”.

      1. Notably, ROM-1 has not been found in peripherin oligomers larger than octamers (e.g. Loewen and Molday, 2000 and subsequent studies by Naash and colleagues). This should be discussed in the context of the current model.

      We agree that this is an important aspect. We pick subvolumes along all disk rims, and on average we obtain the ordered scaffold as shown in the manuscript. We expect heterogeneity in the data because of the different degrees of oligomerization and the exclusion of ROM1 from higher oligomers. Our analysis required substantial classification to achieve convergence to a stable average, indeed indicating heterogeneity in the rim structure. However, we could not resolve additional structures to sufficient quality. It might be that this heterogeneity is what ultimately limits our achievable resolution. We added these thoughts in the discussion starting in lines 377-378, “PRPH2-Rom1 oligomers isolated from native sources exhibit varying degrees of polymerization (Loewen and Molday, 2000), and ROM1 is excluded from larger oligomers (Milstein et al., 2020). We could not resolve this heterogeneity as additional structures to sufficient quality by subvolume averaging, but in combination with the inherent flexibility of the disk rim, this heterogeneity might be the reason for the restricted resolution of our averages.”

      1. The following statement should be reconsidered given the established role of cysteine-150 in peripherin oligomerization: "We hypothesize that the necessary cysteine residues are located in the head domain of the tetramers (Figure 5B), ..." It has been firmly established that only one cysteine (C150) located in the intradiscal loop is not engaged in intramolecular interactions and is essential for peripherin oligomerization.

      Thank you for this advice. We agree and rephrased our discussion in lines 368-371, “The intermolecular disulfide brides are exclusively formed by the PRPH2-C150 and ROM1-C153 cysteine residues, which are located in the luminal domain (Zulliger et al., 2018). We hypothesize that these disulfide bonds (Figure 5B), are responsible for the contacts across rows (Figure 3) ...”

      1. Line 340: "A model involving V-shaped tetramers for membrane curvature formation was proposed recently (Milstein et al., 2020), but it comprises two rows of tetramers which are linked in a head-tohead manner. Our analysis instead resolves three rows organized side-by side in situ (Figure 5A)." I am confused by this statement: doesn't your model also show long rows connected head-to-head? The real difference is that Milstein and colleagues proposed four tetramers per rim whereas the current data reveal three.

      Thank you for pointing out this imprecise description. The model proposed by Milstein and the model in the old version of our manuscript, both propose linkage between tetramers via their disk luminal domains. In our manuscript, we refer to the luminal domain as the head domain. However, to our understanding, the Milstein model suggests two rows of tetramers, where one tetramer in the first row is rotated 180° with respect to a tetramer in the second row (therefore head-to-head), while our data indicate that the V-shaped repeats which we originally hypothesized to be tetramers are only rotated ~63° with respect to one another and are therefore rather oriented side-by-side:

      Fig. 2: Comparison of models for the organization of the ROS disk rim as proposed in in Milstein et al., 2020 (top panel)

      and in our work (lower panel). We now rephrased lines 383-385, “Instead, our analysis in situ resolves three rows of repeats which are also linked by the luminal domain but are rather organized side-by-side (Figure 5A).”

      1. Line 347: "Our data indicate that the luminal domains of tetramers hold the disk rim scaffold together (Figure 3C), which is supported by the fact that most pathological mutations of PRPH2 affect its luminal domain (Boon et al., 2008; Goldberg et al., 2001). It is possible that these mutations impair the formation of tetramers, rows of tetramers, and their disulfide bond-stabilized oligomerization. These alterations could impede or completely prevent disk morphogenesis which, in turn, would disrupt the structural integrity of ROS, compromise the viability of the retina and ultimately lead to blindness." This is not an original idea, as many studies showed that disruptions in peripherin oligomerization lead to anatomical defects in disc formation and subsequent photoreceptor cell death.

      Thank you for pointing this out. Our data are indeed in good agreement with the results made by many groups and further expand on them. We rephrased the manuscript in several places to clarify this relationship: in the abstract lines 32-34, “Our Cryo-ET data provide novel quantitative and structural information on the molecular architecture in ROS and substantiate previous results on proposed mechanisms underlying pathologies of certain PRPH2 mutations leading to blindness.”; in the introduction lines 78-79, “… allowed us to obtain 3D molecular-resolution images of vitrified ROS in a close-to-native state providing further evidence for previously suggested mechanisms leading to ROS dysfunction”; and in the discussion lines 393-397, “In good agreement with previous work, it is possible that these mutations impair the formation of complexes, and their disulfide bond-stabilized oligomerization (Chang et al., 2002; Conley et al., 2019; Zulliger et al., 2018). Hence, these alterations could impede or completely prevent disk morphogenesis …”. Also, additional relevant publications are cited in line 395.

      1. In regards to the distance between disc rims and plasma membrane, the authors cite the data obtained with frogs (10 nm) but not a more relevant, previously reported measurement in mice (Gilliam et al, 2012). The value of 18 nm reported in that study is much closer to the currently reported value.

      We appreciate the reference to this excellent paper. We added it in lines 335-337, “This value was derived from amphibians (Roof and Heuser, 1982) and deviates considerably from recent results (18 nm, (Gilliam et al., 2012)) and from our current measurements in mice (~25 nm).” Our aim was to point out that a model for ROS organization that is often cited and is otherwise well-founded (BatraSafferling et al., 2006) makes a wrong assumption about distance in the context of the mammalian systems. 7. The authors are (correctly) being very careful in assigning the molecular identity of disc interior connectors to PDE6. However, they are more confident in assigning the disc rim connectors to GARP2, which is reflected in the labeling of these links in figure

      1. Their arguments are valid, but these links are not attached to peripherin (a protein considered to be the membrane binding partner for GARPs), which is not immediately consistent with this hypothesis. Perhaps it would be fair to re-label the corresponding links in figure 5 as "disc rim connectors".

      That is an excellent and fair suggestion. We changed Figure 5 accordingly.

      1. On a similar note, the disc rim connectors seem to be located where ABCA4 is presumed to be localized within the rim, which may not be just a coincidence. The authors already have tomograms obtained from ABCA4 knockout animals. Is it possible to analyze whether these links are preserved in these tomograms?

      We agree, this is an important question to address. Unfortunately, neither the biological preparation nor the tomograms of the ABCA4 knockout were as good in quality as for the WT. Still, we frequently see connectors at the disk rim, especially after denoising of the tomograms.

      Fig. 3: connectors at disk rims in WT (left) and ABCA4 knockout mice (right).

      Sometimes it appears the connectors between adjacent disks are linked via an intradisk densities, which was already observed in Corless et al., 1987. We thought that these densities could be ABCA4 and tried to find them with two approaches in our WT tomograms (data not shown). In the first approach using a segmentation similar to what we did for the connectors between disks, we found an order of magnitude fewer intradisk connectors than (inter)disk rim connectors. In the second approach, we used the positions of segmented (inter)disk rim connectors and classified rotational averages which focused on the disk luminal space next to the contact point of a connector with the disk membrane. Again, less than 10% of the disk rim connector subvolumes were assigned to classes with an additional luminal density. Both experiments indicate that disk rim connectors sometimes occur with an additional luminal density. In total, we found less than 100 of these intradisk densities, an observation which seems to be preserved in WT and ABCA4 KO. Based on this small number of positions/locations, however, we cannot draw any conclusion. Therefore, we did not add this point to the manuscript.

    1. Author Response

      Public Evaluation Summary:

      The authors re-analyzed a previously published dataset and identify patterns suggestive of increased bacterial biodiversity in the gut may creating new niches that lead to gene loss in a focal species and promote generation of more diversity. Two limitations are (i) that sequencing depth may not be sufficient to analyze strain-level diversity and (ii) that the evidence is exclusively based on correlations, and the observed patterns could also be explained by other eco-evolutionary processes. The claims should be supported by a more detailed analysis, and alternative hypotheses that the results do not fully exclude should be discussed. Understanding drivers of diversity in natural microbial communities is an important question that is of central interest to biomedically oriented microbiome scientists, microbial ecologists and evolutionary biologists.

      We agree that understanding the drivers of diversity in natural communities is an important and challenging question to address. We believe that our analysis of metagenomes from the gut microbiomes is complementary to controlled laboratory experiments and modeling studies. While these other studies are better able to establish causal relationships, we rely on correlations – a caveat which we make clear, and offer different mechanistic explanations for the patterns we observe.

      We also mention the caveat that we are only able to measure sub-species genetic diversity in relatively abundant species with high sequencing depth in metagenomes. These relatively abundant species include dozens of species in two metagenomic datasets, and we see no reason why they would not generalize to other members of the microbiome. Nonetheless, further work will be required to extend our results to rarer species.

      Our revised manuscript includes two major new analyses. First, we extend the analysis of within-species nucleotide diversity to non-synonymous sites, with generally similar results. This suggests that evolutionarily older, less selectively constrained synonymous mutations and more recent non-synonymous mutations that affect protein structure both track similarly with measures of community diversity – with some subtle differences described in the manuscript.

      Second, we extend our analysis of dense time series data from one individual stool donor and one deeply covered species (B. vulgatus) to four donors and 15 species. This allowed us to reinforce the pattern of gene loss in more diverse communities with greater statistical support. Our correlational results are broadly consistent with the predictions of DBD from modeling and experimental studies, and they open up new lines of inquiry for microbiome scientists, ecologists, and evolutionary biologists.

      Reviewer #1 (Public Review):

      This paper makes an important contribution to the current debate on whether the diversity of a microbial community has a positive or negative effect on its own diversity at a later time point. In my view, the main contribution is linking the diversity-begets-diversity patterns, already observed by the same authors and others, to genomic signatures of gene loss that would be expected from the Black Queen Hypothesis, establishing an eco-evolutionary link. In addition, they test this hypothesis at a more fine-grained scale (strain-level variation and SNP) and do so in human microbiome data, which adds relevance from the biomedical standpoint. The paper is a well-written and rigorous analysis using state-of-the-art methods, and the results suggest multiple new experiments and testable hypotheses (see below), which is a very valuable contribution.

      We thank the reviewer for their generous comments.

      That being said, I do have some concerns that I believe should be addressed. First of all, I am wondering whether gene loss could also occur because of environmental selection that is independent of other organisms or the diversity of the community. An alternative hypothesis to the Black Queen is that there might have been a migration of new species from outside and then loss of genes could have occurred because of the nature of the abiotic environment in the new host, without relationship to the community diversity. Telling the difference between these two hypotheses is hard and would require extensive additional experiments, which I don't think is necessary. But I do think the authors should acknowledge and discuss this alternative possibility and adjust the wording of their claims accordingly.

      We concur with the reviewer that the drivers of the correlation between community diversity and gene loss are unclear. Therefore, we have now added the following text to the Discussion:

      “Here we report that genome reduction in the gut is higher in more diverse gut communities. This could be due to de novo gene loss, preferential establishment of migrant strains encoding fewer genes, or a combination of the two. The mechanisms underlying this correlation remain unclear and could be due to biotic interactions – including metabolic cross-feeding as posited by some models (Estrela et al., 2022; San Roman and Wagner, 2021, 2018) but not others (Good and Rosenfeld, 2022) – or due to unknown abiotic drivers of both community diversity and gene loss.”

      Additionally, we have revised Figure 1 to show that strain invasions/replacements, in addition to evolutionary change, could be an important driver of changes in intra-species diversity in the microbiome.

      Another issue is that gene loss is happening in some of the most abundant species in the gut. Under Black Queen though, we would expect these species to be most likely "donors" in cross-feeding interactions. Authors should also discuss the implications, limitations, and possible alternative hypotheses of this result, which I think also stimulates future work and experiments.

      We thank the reviewer for raising this point. It is unclear to us whether the more abundant species would be donors in cross-feeding interactions. If we understand correctly, the reviewer is suggesting that more abundant donors will contribute more total biomass of shared metabolites to the community. This idea makes sense under the assumption that the abundant species are involved in cross-feeding interactions in the first place, which may or may not be the case. As our work heavily relies on a dataset that we previously analyzed (HMP), we wish to cite Figure S20 in Garud, Good et al. 2019 PLoS Biology in which we found there are comparable rates of gene changes across the ~30 most abundant species analyzed in the HMP. This suggests that among the most abundant species analyzed, there is no relationship between their abundance and gene change rate.

      That being said, we acknowledge that our study is limited to the relatively abundant focal species and state now in the Discussion: “Deeper or more targeted sequencing may permit us to determine whether the same patterns hold for rarer members of the microbiome.”

      Regarding Figure 5B, there is a couple of questions I believe the authors should clarify. First, How is it possible that many species have close to 0 pathways? Second, besides the overall negative correlation, the data shows some very conspicuous regularities, e.g. many different "lines" of points with identical linear negative slope but different intercept. My guess is that this is due to some constraints in the pathway detection methods, but I struggle to understand it. I think the authors should discuss these patterns more in detail.

      We sincerely thank the reviewer for raising this issue, as it prompted us to investigate more deeply the patterns observed at the pathway level. In short, we decided to remove this analysis from the paper because of a number of bioinformatics issues that we realized were contributing to the signal. However, in support of BQH-like mechanisms at play, we do find evidence for gene loss in more diverse communities across multiple species in both the HMP and Poyet datasets. Below we detail our investigation into Figure 5b and how we arrived at the conclusion that is should be removed:

      (1) Regarding data points in Figure 5B where many focal species have “zero pathways”,we firstly clarify how we compute pathway presence and richness. Pathway abundance data per species were downloaded from the HMP1-2 database, and these pathway abundances were computed using HUMAnN (HMP Unified Metabolic Analysis Network). According to HUMAnN documentation, pathway abundance is proportional to the number of complete copies of the pathway in the community; this means that if at least one component reaction in a certain pathway is missing coverage (for a sample-species pair), the pathway abundance may be zero (note that HUMAnN also employs “gap filling” to allow no more than one required reaction to have zero abundance). As such, it is likely that insufficient coverage, especially for low-abundance species, causes many pathways to report zero abundance in many species in many samples. Indeed, 556 of the 649 species considered had zero “present” pathways (i.e. having nonzero abundance) in at least 400 of the 469 samples (see figure below).

      (2) We thank the reviewer for pointing out the “conspicuous regularities” in Figure 5B,particularly “parallel lines” of data points that we discovered are an artifact of the flawed way in which we computed “community pathway richness [excluding the focal species].” Each diagonal line of points corresponds to different species in the same sample, and because community pathway richness is computed as the total number of pathways [across all species in the sample] minus the number of pathways in the focal species, the current Figure 5B is really plotting y against X-y for each sample (where X is a sample’s total community pathway richness, and y is the pathway richness of an individual species in that sample). This computation fails to account for the possibility that a pathway in an excluded focal species will still be present in the community due to redundancy, and indeed BQH tests for whether this redundancy is kept low in diverse communities due to mechanisms such as gene loss.

      We attempted to instead plot community pathway richness defined as the number of unique pathways covered by all species other than the focal species. This is equivalent to [number of unique pathways across all species in a sample] minus the [number of pathways that are ONLY present in the focal species and not any other species in the sample]. However, when we recomputed community pathway richness this way, it is rare that a pathway is present in only one species in a sample. Moreover, we find that with the exception of E. coli, focal species pathway richness tended to be very similar across the 469 samples, often reaching an upper limit of focal species pathway richness observed. (It is unclear to what extent lower pathway richnesses are due to low species abundance/low sample coverage versus gene loss). This new plot reveals even more regularities and is difficult to interpret with respect to BQH. (Note that points are colored by species; the cluster of black dots with outlying high focal pathway richness corresponds to the “unclassified” stratum which can be considered a group of many different species.)

      Overall, because community pathway richness (excluding a focal species) seems to primarily vary with sample rather than focal species in this dataset when using the most simple/strict definition of community pathway richness as described above, it is difficult to probe the Black Queen Hypothesis using a plot like Figure 5B. As pointed out by reviewers, lack of sequencing depth to analyze strain-level diversity and accurately quantify pathway abundance, irrespective of species abundance, seems to be a major barrier to this analysis. As such, we have decided to remove Figure 5B from the paper and rewrite some of our conclusions accordingly.

      Finally, I also have some conceptual concerns regarding the genomic analysis. Namely, genes can be used for biosynthesis of e.g. building blocks, but also for consumption of nutrients. Under the Black Queen Hypothesis, we would expect the adaptive loss of biosynthetic genes, as those nutrients become provided by the community. However, for catabolic genes or pathways, I would expect the opposite pattern, i.e. the gain of catabolic genes that would allow taking advantage of a more rich environment resulting from a more diverse community (or at least, the absence of pathway loss). These two opposing forces for catabolic and biosynthetic genes/pathways might obscure the trends if all genes are pooled together for the analysis. I believe this can be easily checked with the data the authors already have, and could allow the authors to discuss more in detail the functional implications of the trends they see and possibly even make a stronger case for their claims.

      We thank the reviewer for their suggestion. As explained above, we have removed the pathway analysis from the paper due to technical reasons. However, we did investigate catabolic and biosynthetic pathways separately as suggested by the reviewer as we describe below:

      We obtained subsets of biosynthetic pathways and catabolic pathways by searching for keywords (such as “degradation” for catabolic) in the MetaCyc pathway database. After excluding the “unclassified” species stratum, we observe a total of 279 biosynthetic and 167 catabolic pathways present in the HMP1-2 pathway abundance dataset. Using the corrected definition of community pathway richness excluding a focal species, for each pathway type—either biosynthetic or catabolic—we plotted focal species pathway richness against community pathway richness including all pathways regardless of type:

      We observe the same problem where, within a sample, community pathway richness excluding the focal species hardly varies no matter which focal species it is, due to nearly all of its detected pathways being present in at least one other species; this makes the plots difficult to interpret.

      Reviewer #2 (Public Review):

      The authors re-analysed two previously published metagenomic datasets to test how diversity at the community level is associated with diversity at the strain level in the human gut microbiota. The overall idea was to test if the observed patterns would be in agreement with the "diversity begets diversity" (DBD) model, which states that more diversity creates more niches and thereby promotes further increase of diversity (here measured at the strain-level). The authors have previously shown evidence for DBD in microbiomes using a similar approach but focusing on 16S rRNA level diversity (which does not provide strain-level insights) and on microbiomes from diverse environments.

      One of the datasets analysed here is a subset of a cross-sectional cohort from the Human Microbiome Project. The other dataset comes from a single individual sampled longitudinally over 18 months. This second dataset allowed the authors to not only assess the links between different levels of diversity at single timepoints, but test if high diversity at a given timepoint is associated with increased strain-level diversity at future timepoints.

      Understanding eco-evolutionary dynamics of diversity in natural microbial communities is an important question that remains challenging to address. The paper is well-written and the detailed description of the methodological approaches and statistical analyses is exemplary. Most of the analyses carried out in this study seem to be technically sound.

      We thank the reviewer for their kind words, comments, and suggestions.

      The major limitation of this study comes with the fact that only correlations are presented, some of which are rather weak, contrast each other, or are based on a small number of data points. In addition, finding that diversity at a given taxonomic rank is associated with diversity within a given taxon is a pattern that can be explained by many different underlying processes, e.g. species-area relationships, nutrient (diet) diversity, stressor diversity, immigration rate, and niche creation by other microbes (i.e. DBD). Without experiments, it remains vague if DBD is the underlying process that acts in these communities based on the observed patterns.

      We thank the reviewer for their comments. First, regarding the issue of this being a correlative study, we now more clearly acknowledge that mechanistic studies (perhaps in experimental settings) are required to fully elucidate DBD and BQH dynamics. However, we note that our correlational study from natural communities is complementary to experimental and modeling studies, to test the extent to which their predictions hold in more complex, realistic settings. This is now mentioned throughout the manuscript, most explicitly at the end of the Introduction:

      “Although such analyses of natural diversity cannot fully control for unmeasured confounding environmental factors, they are an important complement to controlled experimental and theoretical studies which lack real-world complexity.”

      Second, to increase the number of data points analyzed in the Poyet study, we now include 15 species and four different hosts (new Figure 5). The association between community diversity and gene loss is now much more statistically robust, and consistent across the Poyet and HMP time series.

      Third, we acknowledge more clearly in the Discussion that other processes, including diet and other environmental factors can generate the DBD pattern. We also now stress more prominently the possibility that strain migration across hosts may be responsible for the patterns observed. For example, in Figure 1, we illustrate the possibility of strain migration generating the patterns we observe.

      Below we quote a paragraph that we have now added in the Discussion:

      "Second, we cannot establish causal relationships without controlled experiments. We are therefore careful to conclude that positive diversity slopes are consistent with the predictions of DBD, and negative slopes with EC, but unmeasured environmental drivers could be at play. For example, increased dietary diversity could simultaneously select for higher community diversity and also higher intra-species diversity. In our previous study, we found that positive diversity slopes persisted even after controlling for potential abiotic drivers such as pH and temperature (Madi et al., 2020), but a similar analysis was not possible here due to a lack of metadata. Neutral processes can account for several ecological patterns such as species-area relationships (Hubbell, 2001), and must be rejected in favor of niche-centric models like DBD or EC. Using neutral models without DBD or EC, we found generally flat or negative diversity slopes due to sampling processes alone and that positive slopes were hard to explain with a neutral model (Madi et al., 2020). These models were intended mainly for 16S rRNA gene sequence data, but we expect the general conclusions to extend to metagenomic data. Nevertheless, further modeling and experimental work will be required to fully exclude a neutral explanation for the diversity slopes we report in the human gut microbiome.”

      Finally, we now put more emphasis on the importance of migration (strain invasion) as a non-exclusive alternative to de novo mutation and gene gain/loss. This is mentioned in the Abstract and is also illustrated in the revised Figure 1.

      Another limitation is that the total number of reads (5 mio for the longitudinal dataset and 20 mio for the cross-sectional dataset) is low for assessing strain-level diversity in complex communities such as the human gut microbiota. This is probably the reason why the authors only looked at one species with sufficient coverage in the longitudinal dataset.

      Indeed, this is a caveat which means we can only consider sub-species diversity in relatively abundant species. Nevertheless, this allows us to study dozens of species in the HMP and 15 in the more frequent Poyet time series. As more deeply sequenced metagenomes become available, future studies will be able to access the rarer species to test whether the same patterns hold or not. This is now mentioned prominently as a caveat our study in the second Discussion paragraph:

      “First, using metagenomic data from human microbiomes allowed us to study genetic diversity, but limited us to considering only relatively abundant species with genomes that were well-covered by short sequence reads. Deeper or more targeted sequencing may permit us to determine whether the same patterns hold for rarer members of the microbiome. However, it is notable that the majority of the dozens of species across the two datasets analyzed support DBD, suggesting that the phenomenon may generalize.”

      We also note that rarefaction was only applied to calculate community richness, not to estimate sub-species diversity. We apologize for this confusion, which is now clarified in the Methods as follows:

      “SNV and gene content variation within a focal species were ascertained only from the full dataset and not the rarefied dataset.”

      Analyzing the effect of diversity at a given timepoint on strain-level diversity at a later timepoint adds an important new dimension to this study which was not assessed in the previous study about the DBD in microbiomes by some of the authors. However, only a single species was analysed in the longitudinal dataset and comparisons of diversity were only done between two consecutive timepoints. This dataset could be further exploited to provide more insights into the prevailing patterns of diversity.

      We thank the reviewer for raising this point. We now have considered all 15 species for which there was sufficient coverage from the Poyet dataset, which included four different stool donors. Additionally, in the HMP dataset, we analyze 54 species across 154 hosts, with both datasets showing the same correlation between community diversity and gene loss.

      Additionally, we followed the suggestion of the reviewer of examining additional time lags, and in Figure 5 we do observe a dependency on time. This is now described in the Results as follows:

      “Using the Poyet dataset, we asked whether community diversity in the gut microbiome at one time point could predict polymorphism change at a future time point by fitting GAMs with the change in polymorphism rate as a function of the interaction between community diversity at the first time point and the number of days between the two time points. Shannon diversity at the earlier time point was correlated with increases in polymorphism (consistent with DBD) up to ~150 days (~4.5 months) into the future (Figure S4), but this relationship became weaker and then inverted (consistent with EC) at longer time lags (Fig 5A, Table S8, GAM, P=0.023, Chi-square test). The diversity slope is approximately flat for time lags between four and six months, which could explain why no significant relationship was found in HMP, where samples were collected every ~6 months. No relationship was observed between community richness and changes in polymorphism (Table S8, GAM, P>0.05).”

      Finally, the evidence that gene loss follows increase in diversity is weak, as very few genes were found to be lost between two consecutive timepoints, and the analysis is based on only a single species. Moreover, while positive correlation were found between overall community diversity and gene family diversity in single species, the opposite trend was observed when focusing on pathway diversity. A more detailed analysis (of e.g. the functions of the genes and pathways lost/gained) to explain these seemingly contrasting results and a more critical discussion of the limitations of this study would be desirable.

      We agree that our previous analysis of one species in one host provided weak support for gene loss following increases in diversity. As described in the response above, we have now expanded this analysis to 15 focal species and 4 independent hosts with extensive time series. We now analyze this larger dataset and report the more statistically robust results as follows:

      “We found that community Shannon diversity predicted future gene loss in a focal species, and this effect became stronger with longer time lags (Fig 5B, Table S9, GLMM, P=0.006, LRT for the effect of the interaction between the initial Shannon diversity and time lag on the number of genes lost). The model predicts that increasing Shannon diversity from its minimum to its maximum would result in the loss of 0.075 genes from a focal species after 250 days. In other words, about one of the 15 focal species considered would be expected to lose a gene in this time frame.

      Higher Shannon diversity was also associated with fewer gene gains, and this relationship also became stronger over time (Fig 5C, Table S9, GLMM, P=1.11e-09, LRT). We found a similar relationship between community species richness and gene gains, although the relationship was slightly positive at shorter time lags (Fig 5D, Table S9, GLMM, P=3.41e-04, LRT). No significant relationship was observed between richness and gene loss (Table S9, GLMM, P>0.05). Taken together with the HMP results (Fig 4), these longer time series reveal how the sign of the diversity slope can vary over time and how community diversity is generally predictive of reduced focal species gene content.”

      As described in detail in the response to Reviewer 1 above, we found that the HUMAnN2 pathway analyses previously described suffered from technical challenges and we deemed them inconclusive. We have therefore removed the pathway results from the manuscript.

      Reviewer #3 (Public Review):

      This work provides a series of tests of hypothesis, which are not mutually exclusive, on how genomic diversity is structured within human microbiomes and how community diversity may influence the evolution of a focal species.

      Strengths:

      The paper leverages on existing metagenomic data to look at many focal species at the same time to test for the importance of broad eco-evolutionary hypothesis, which is a novelty in the field.

      Thank you for the succinct summary and recognition of the strengths of our work.

      Weaknesses:

      It is not very clear if the existing metagenomic data has sufficient power to test these models.

      It is not clear, neither in the introduction nor in the analysis what precise mechanisms are expected to lead to DBD.

      The conclusion that data support DBD appears to depend on which statistics to measure of community diversity are used. Also, performing a test to reject a null neutral model would have been welcome either in the results or in the discussion.

      In our revised manuscript, we emphasize several caveats – including that we only have power to test these hypotheses in focal species with sufficient metagenomic coverage to measure sub-species diversity. We also describe more in the Introduction how the processes of competition and niche construction can lead to DBD. We also acknowledge that unmeasured abiotic drivers of both community diversity and sub-species diversity could also lead to the observed patterns. Throughout the manuscript, we attempt to describe the results and acknowledge multiple possible interpretations, including DBD and EC acting with different strengths on different species and time scales. Our previous manuscript assessing the evidence for DBD using 16S rRNA gene amplicon data from the Earth Microbiome Project (Madi et al., eLife 2020) assessed null models based on neutral ecological theory, and found it difficult to explain the observation of generally positive diversity slopes without invoking a non-neutral mechanism like DBD. While a new null model tailored to metagenomic data might provide additional nuance, we think developing one is beyond the scope of the manuscript – which is in the format of a short ‘Research Advance’ to expand on our previous eLife paper, and we expect that the general results of our previously reported null model provide a reasonable intuition for our new metagenomic analysis. This is now mentioned in the Discussion as follows:

      “In our previous study, we found that positive diversity slopes persisted even after controlling for potential abiotic drivers such as pH and temperature (Madi et al., 2020), but a similar analysis was not possible here due to a lack of metadata. Neutral processes can account for several ecological patterns such as species-area relationships (Hubbell, 2001), and must be rejected in favor of niche-centric models like DBD or EC. Using neutral models without DBD or EC, we found generally flat or negative diversity slopes due to sampling processes alone and that positive slopes were hard to explain with a neutral model (Madi et al., 2020). These models were intended mainly for 16S rRNA gene sequence data, but we expect the general conclusions to extend to metagenomic data. Nevertheless, further modeling and experimental work will be required to fully exclude a neutral explanation for the diversity slopes we report in the human gut microbiome.”

    1. Author Response

      Reviewer #1 (Public Review):

      Although a bunch of studies have been carried out to see whether calcium supplementation is a prerequisite for the promotion of bone health or prevention of bone diseases, this is the first trial to see its effect on the population whose age is reaching peak bone mass. Outcomes are clear and justified by sound methodology. Also, the message from this systematic review could directly influence the clinical decision on who might gain benefit from calcium supplementation.

      We are very grateful for your considerate comments and your recognition of our work in this study. Your suggestions really helped us to improve the clarity of this manuscript.

      Strengths of this study are:

      1) This is the first systematic review by meta-analysis to focus on people at the age before achieving peak bone mass (PBM) and at the age around the PBM. 2) Detailed subgroup and sensitivity analyses drew consistent and clear results.

      Thank you very much for your comments. We are very grateful for your recognition of our work in this study.

      Limitations of this study are:

      1) Substantial intertrial heterogeneity should be considered in terms of dose effect of calcium supplementation and differences between both sexes etc.

      Thank you very much for your kind comments. We performed subgroup analyses to explore whether different doses of calcium supplementation had different effects, and the results are showed in Table 4a and 4b at the end of this Author Response. The results showed that the intertrial heterogeneity in the subgroup with doses of calcium supplementation greater than or equal to 1000 mg/day was significantly smaller than that in the subgroup with doses less than 1000 mg/day, suggesting that different doses of calcium supplementation across trials may be a potential source of the substantial intertrial heterogeneity.

      Similarly, we also performed subgroup analyses by sexes. Of all included trials, 23 trials focused on women only, and 20 trials involved both men and women participants, however these 20 trials did not report the results for men or women separately. We therefore divided the included trials into two subgroups: trials with women only and trials with both men and women. The corresponding results of subgroup analyses are showed in Table 5a and 5b at the end of this Author Response. The results showed that the subgroup with both men and women seemed to have less heterogeneous than the subgroup with women only, suggesting that sex may be a possible source of the observed heterogeneity.

      In addition, we were also aware of the large heterogeneity between trials and explored the possible sources through several additional approaches. Firstly, instead of using fixed-effects models, we have chosen random-effects models to summarize the effect estimates. Secondly, we performed meta-regression analyses by age, population regions, calcium doses, baseline intake and sample sizes to explain the intertrial heterogeneity. The results of meta-regression are provided in Table 6 at the end of this Author Response. The results suggested that this heterogeneity could be explained partially by differences in regions of participants.

      We have updated the results and discussions about potential sources of heterogeneity in the revised manuscript, as follows:

      In general, the heterogeneity between trials was obvious in the analysis for BMD (P<.001, I2=86.28%) and slightly smaller for BMC (P<.001, I2=79.28%). The intertrial heterogeneity was significantly distinct across the sites measured. Subgroup analyses and meta-regression analyses suggested that this heterogeneity could be explained partially by differences in age, duration, calcium dosages, types of calcium supplement, supplementation with or without vitamin D, baseline calcium intake levels, sex and region of participants. (See Lines 293-298 on Page 20 in the Main Text)

      Several limitations need to be considered. First, there was substantial intertrial heterogeneity in the present analysis, which might be attributed to the differences in baseline calcium intake levels, regions, age, duration, calcium doses, types of calcium supplement, supplementation with or without vitamin D and sexes according to subgroup and meta-regression analyses. To take heterogeneity into account, we used random effect models to summarize the effect estimates, which could reduce the impact of heterogeneity on the results to some extent. (See Lines 394-399 on Page 24 in the Main Text)

      2) Rarity of RCTs focused on the 20-35-year age group.

      Thank you very much for raising this point. We have comprehensively searched databases for eligible studies and found only three RCTs (Islam et al; Barger-Lux et al; Winters-Stone et al) focused on the 20-35-year age group. We did notice this fact as well. Because of this, we intend to perform a randomised controlled trial to evaluate the effects of calcium supplementation in this age group. In fact, this trial has already been started and is currently ongoing (Registration number: ChiCTR2200057644, http://www.chictr.org.cn/showproj.aspx?proj=155587).

      In this open-label, randomized controlled trial, we will randomly assign (1:1) 116 subjects (age 18-22 years) to receive either or not calcium supplementation with milk (500 mL/day, contains about 500 mg/d calcium) for 6 months. The primary outcomes are bone mineral density and bone mineral content at the lumbar spine, femoral neck and total hip. The secondary outcomes are clinical indicators related to bone health, such as serum osteocalcin, bone-specific alkaline phosphatase, urinary deoxypyridinoline, etc. We will conduct the current trial with great care and diligence and look forward to the results of this trial.

      Reviewer #2 (Public Review):

      This systematic review and meta-analysis titled 'The effect of calcium supplementation in people under 35 years old: A systematic review and meta-analysis of randomized controlled trials' provide good evidence for the importance of calcium supplementation at the age around the plateau of PBM. The statistical analyses were good overall and the manuscript was generally well written.

      We are very grateful for your considerate comments and for your recognition to our work in this study. Your suggestions really helped us to improve the clarity of this manuscript.

      One concern in this study is that RCTs included were substantially heterogenous in subjects, calcium types, duration, vitamin D supplements, etc. According to the inclusion criteria, RCTs with calcium or calcium plus vitamin D supplements with a placebo or no treatment were included in this study. However, no information about vitamin D supplementation was provided. Therefore, it seems unclear whether the effect of improving BMD or BMC is due to calcium alone or calcium plus vitamin D.

      We are extremely grateful for your great patience and for your kind suggestions. According to your suggestions, we have added the corresponding analyses regarding calcium supplementation with or without vitamin D supplementation. Among the included RCTs, 32 trials used calcium-only supplementation (without vitamin D supplementation) and 11 trials used calcium plus vitamin D supplementation. The detailed information are provided in the Table 1 and 2 at the end of this Author Response. We have added subgroup analyses by vitamin D supplementation as you suggested, and the corresponding results are provided in Table 3a and 3b at the end of this Author Response.

      When we pooled the data from the two subgroups separately, we found that calcium supplementation with vitamin D had greater beneficial effects on both the femoral neck BMD (MD: 0.758, 95% CI: 0.350 to 1.166, P < 0.001 VS. MD: 0.477, 95% CI: 0.045 to 0.910, P = 0.031) and the femoral neck BMC (MD: 0.393, 95% CI: 0.067 to 0.719, P = 0.018 VS. MD: 0.269, 95% CI: -0.025 to 0.563, P = 0.073) than calcium supplementation without vitamin D. However, for both BMD and BMC at the other sites (including lumbar spine, total hip, and total body), the observed effects in the subgroup without vitamin D supplementation appeared to be slightly better than in the subgroup with vitamin D supplementation. Therefore, these results suggested that calcium supplementation alone could improve BMD or BMC, although additional vitamin D supplementation may be beneficial in improving BMD or BMC at the femoral neck.

      We have added relevant parts in the main text of the revised manuscript. (See Lines 258-263 on Pages 12-13 and Lines 367-374 on Page 23 in the Main Text)

      As you mentioned, there exists large intertrial heterogeneity in this study, for which we compulsorily chose the random effect model, which was appropriate to get more conservative results. In addition, we did meta-subgroup analyses by calcium dose, sex, age, duration, regions, baseline calcium intake, types of calcium supplements, in order to explore possible sources of heterogeneity.

      The results of subgroup analyses by dose of calcium supplementation are showed in Table 4a and 4b at the end of this Author Response. For both BMD and BMC at the lumbar spine and whole body, the intertrial heterogeneity was significantly smaller in the subgroup with a calcium supplementation dose greater than or equal to 1000 mg/day than that in the subgroup with a calcium supplementation dose less than 1000 mg/day, suggesting that different doses of calcium supplementation may be a potential source of the heterogeneity.

      The results of subgroup analyses by sex are showed in Table 5a and 5b at the end of this Author Response. The intertrial heterogeneity was significantly smaller in the subgroup with both men and women than that in the subgroup with women only, also suggesting that sex could be a possible source of the heterogeneity.

      The results of subgroup analyses by age (pre-peak VS. peri-peak ) are showed in Table 7a and 7b at the end of this Author Response. The intertrial heterogeneity was significantly smaller in the peri-peak subgroup than that in the pre-peak subgroup, also suggesting that age may be a potential source of the heterogeneity.

      The results of subgroup analyses by intervention duration (pre-peak VS. peri-peak ) are showed in Table 8a and 8b at the end of this Author Response. For both BMD and BMC at the lumbar spine and total hip, the intertrial heterogeneity was smaller in the subgroup with a intervention period less than 18 months than that in the subgroup with a intervention period greater than or equal to 18 months, suggesting that intervention duration might be a potential source of the heterogeneity.

      Table 9a and 9b at the end of this Author Response showed the results of subgroup analyses by population region. The intertrial heterogeneity was significantly smaller in the Asian subgroup than that in the Western subgroup, also suggesting that population region may be a source of the heterogeneity.

      Table 10a and 10b at the end of this Author Response showed the results of subgroup analyses by dietary calcium intake levels at baseline. The intertrial heterogeneity was smaller in the subgroup with the dietary calcium intake level greater than or equal to 714 mg/day than that in the subgroup with the dietary calcium intake level lower than 714 mg/day, also suggesting that dietary calcium intake levels at baseline could be a potential source of the heterogeneity.

      Table 11a and 11b at the end of this Author Response showed the results of subgroup analyses by types of calcium supplements. For both BMD and BMC at the lumbar spine, the intertrial heterogeneity was smaller in the subgroup with calcium supplementation than that in the subgroup with dietary calcium, also suggesting that types of calcium supplements might be a source of the heterogeneity.

      In conclusion, the observed heterogeneity might be due to the differences in sex, age, regions of subjects, doses, intervention duration, and types of calcium supplementation, dietary calcium intake levels at baseline, and with or without vitamin D supplementation. We have updated the discussion on heterogeneity in the revised manuscript. (See Lines 394-397 on Pages 24 in the Main Text)

      Thanks again for your comments, we have tried to analyze and explain the large heterogeneity through a variety of approaches, however, there may still remain some inadequacies. Please tell us directly if it needs further corrections, we will be very grateful and appreciate it, and try our best to revise this part of heterogeneity.

      Reviewer #3 (Public Review):

      This paper will be welcome for clinicians and researchers related to the field. The authors, applying a well-structured meta-analysis, showed that calcium supplementation or calcium intake during 20-35 years is better than the <20 years. The clinical impact is directly associated with improving the bone mass of the femoral neck, and thus proposes a window of intervention for osteoporosis treatment. The manuscript is very well prepared and represents a thorough analysis of available randomized controlled clinical trials, but a few issues require additional consideration.

      We are very grateful for your considerate comments and for your recognition to our work in this study. Your comments are invaluable and have been very helpful in revising and improving our manuscript.

      After a careful read of the literature, it is important to highlight that the paper is a statistically robust study with a well-delineated meta-analysis of youth-adult subjects. But, I would like better to understand why the authors didn't use other datasets such as WHO Global Index Medicus (Index Medicus for Africa, the Eastern Mediterranean Region, South-East Asia, and Western Pacific, and Latin America and the Caribbean Literature on Health Sciences, Index Medicus), ClinicalTrials.gov, and the WHO ICTRP.

      Thank you so much for your thoughtful advice and your generosity in recommending these datasets to us. Based on your advice, we thoroughly searched these databases (the detailed search terms are provided in the Appendix File at the end of this Author Response). We have identified 23 potentially related studies and registered trials in these databases. After careful screening and review, however, no new studies were ultimately included in this meta-analysis. Some studies, which had not been completed, are recruiting subjects, and some studies were duplicates of the RCTs we had included. Finally, no new additional trials were included in our meta-analysis. The detailed screening process and the reasons for exclusion are showed in Figure 1. These three additional global databases will provide us with more comprehensive information for our future studies, thank you very much for your suggestions and guidance.

      Figure 1. Flow chart of search and selection

      References: 1. ID: emr-156089 (https://pesquisa.bvsalud.org/gim/resource/en/emr-156089) 2. ID: wpr-270003 (https://pesquisa.bvsalud.org/gim/resource/en/wpr-270003) 3. ID: lil-243754 (https://pesquisa.bvsalud.org/gim/resource/en/lil-243754) 4. ID: sea-23757 (https://pesquisa.bvsalud.org/gim/resource/en/sea-23757) 5. ID: NCT00067925 (https://clinicaltrials.gov/ct2/show/NCT00067925?term=NCT00067925&draw=2&rank=1) 6. ID: NCT00979511 (https://clinicaltrials.gov/ct2/show/NCT00979511?term=NCT00979511&draw=2&rank=1) 7. ID: NCT00065247 (https://clinicaltrials.gov/ct2/show/NCT00065247?term=NCT00065247&draw=2&rank=1) 8. Matkovic V, Landoll JD, Badenhop-Stevens NE, et al. Nutrition influences skeletal development from childhood to adulthood: a study of hip, spine, and forearm in adolescent females. J Nutr. 2004;134(3):701S-705S. doi:10.1093/jn/134.3.701S 9. Barger-Lux MJ, Davies KM, Heaney RP. Calcium supplementation does not augment bone gain in young women consuming diets moderately low in calcium. J Nutr. 2005;135(10):2362-2366. doi:10.1093/jn/135.10.2362 10. Cornes R, Sintes C, Peña A, et al. Daily Intake of a Functional Synbiotic Yogurt Increases Calcium Absorption in Young Adult Women. J Nutr. 2022;152(7):1647-1654. doi:10.1093/jn/nxac088 11. ID: NCT00063011 (https://clinicaltrials.gov/ct2/show/NCT00063011?term=NCT00063011&draw=2&rank=1) 12. ID: NCT00063024 (https://clinicaltrials.gov/ct2/show/NCT00063024?term=NCT00063024&draw=2&rank=1) 13. ID: NCT01857154 (https://clinicaltrials.gov/ct2/show/NCT01857154?term=NCT01857154&draw=2&rank=1) 14. ID: NCT00067600 (https://clinicaltrials.gov/ct2/show/NCT00067600?term=NCT00067600&draw=2&rank=1) 15. ID: NCT00063037 (https://clinicaltrials.gov/ct2/show/NCT00063037?term=NCT00063037&draw=2&rank=1) 16. ID: NCT00063050 (https://clinicaltrials.gov/ct2/show/NCT00063050?term=NCT00063050&draw=2&rank=1) 17. ID: TCTR20190624002 (https://trialsearch.who.int/Trial2.aspx?TrialID=TCTR20190624002) 18. ID: JPRN-UMIN000024182 (https://trialsearch.who.int/Trial2.aspx?TrialID=JPRN-UMIN000024182) 19. ID: NCT02636348 (https://trialsearch.who.int/Trial2.aspx?TrialID=NCT02636348) 20. ID: ACTRN 12612000374864 (https://trialsearch.who.int/Trial2.aspx?TrialID=ACTRN12612000374864) 21. ID: NCT01732328 (https://trialsearch.who.int/Trial2.aspx?TrialID=NCT01732328) 22. ID: ISRCTN28836000 (https://trialsearch.who.int/Trial2.aspx?TrialID=ISRCTN28836000) 23. ID: ISRCTN84437785 (https://trialsearch.who.int/Trial2.aspx?TrialID=ISRCTN84437785)

      We have also updated the literature search section and the flow chart in the main text of the revised manuscript, as follows:

      We applied search strategies to the following electronic bibliographic databases without language restrictions: PubMed, EMBASE, ProQuest, CENTRAL (Cochrane Central Register of Controlled Trials), WHO Global Index Medicus, Clinical Trials.gov, WHO ICTRP, China National Knowledge Infrastructure and Wanfang Data in April 2021 and updated the search in July 2022 for eligible studies addressing the effect of calcium or calcium supplementation, milk or dairy products with BMD or BMC as endpoints. (see Lines 80-85 on Page 5 and Figure 1 in the Main Text)

      The manuscript compares two sources of participants (in line 233) evaluating the effect of improvements on the femoral neck being "obviously stronger in Western countries than in Asian countries". But, I didn't identify if the searches were conducted applying language restrictions. This is important because we can be considering the entire world or specific countries.

      We are extremely grateful for your great patience and for your kind suggestions. We did not apply any language restrictions during the search process, as documented in the protocol of PROSPERO (CRD42021251275, https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=251275). Following your suggestion, we have added a description of this in the revised manuscript. (See Lines 80-81 on Page 5 in the Main Text)

      During the search process, we did identify five eligible articles from the Chinese databases including China National Knowledge Infrastructure (CNKI, https://www.cnki.net) and WanFang Data (https://www.wanfangdata.com.cn). However, we confirmed that these five studies were duplicates of the articles from the PubMed (PMID: 15230999; PMID: 17627404; PMID: 18296324; PMID: 20044757; PMID: 20460227). For those possibly relevant studies published in other languages than Chinese or English, the full text was downloaded and translated using DeepL translation website (https://www.deepl.com/translator) and then carefully reviewed. Ultimately, all included studies that met the inclusion and exclusion criteria were published in English. In view of this, after a systematic and comprehensive search, especially with the addition of your suggested databases, we could assume that our current study has incorporated all original researches in this field worldwide, rather than only from specific countries or regions.

      To explore whether the effects of calcium supplementation differ across different population regions, we performed subgroup analyses. Prior to the analysis, we hypothesized that the effect might be slightly better, or at least not worse, in populations with lower baseline dietary calcium intakes (lower baseline BMD/BMC levels) than that in populations with higher baseline dietary calcium intakes (higher baseline BMD/BMC levels). However, the results showed that the improvement effects on BMD at the femoral neck and total body and BMC at the femoral neck and lumbar spine were obviously stronger in Western countries than in Asian countries. These findings are likely to be contrary to our common sense, which is, that under normal circumstances, the effects of calcium supplementation should be more obvious in people with lower calcium intakes than in those with higher calcium intakes. Therefore, this issue needs to be tested and confirmed in future trials.

      The manuscript does not describe which version was used with the RoB tool.

      Thank you for your suggestion. As you mentioned, we completed the description of RoB tool in the Methods section, as follows:

      The quality of the included RCTs was assessed independently by two reviewers (SYL, HNJ) based on the Revised Cochrane Risk-of-Bias Tool for Randomized Trials (RoB 2 tool, version 22 August 2019), and each item was graded as low risk, high risk and some concerns. (See Lines 101-103 on Page 6 in the Main Text)

      Figures and Supplementary: No critique.

      Thanks for your kind comments and for your recognition to our work in this study.

      Appendix 1

      Search strategy • WHO Global Index Medicus:

      (tw:(calcium)) OR (mj:(calcium)) OR (tw:(calcium carbonate)) OR (tw:(calcium citrate)) OR (tw:(calcium pills)) OR (tw:(calcium supplement)) OR (tw:(Ca2)) OR (tw:(dairy product)) OR (tw:(milk)) OR (tw:(yogurt)) OR (tw:(cheese)) OR (tw:(dietary supplement)) AND (tw:(bone density)) OR (tw:(bone mineral density)) OR (tw:(bone mineral content))

      ClinicalTrials.gov

      (calcium) OR (calcium supplementation) OR (milk) OR (dairy product) OR (yogurt) OR (cheese) Applied Filters: Interventional (clinical trial); Child (birth–17); Adult (18–64)

      • WHO ICTRP

      (calcium) OR (milk) OR (dairy) OR (yogurt) OR (cheese) in the Intervention

    1. Author Response

      Reviewer #2 (Public Review):

      Zou et al. presented a comprehensive study where they generated single-cell RNA profiling of 138,982 cells from 13 samples of six patients including AK, squamous cell carcinoma in situ (SCCIS), cSCC, and their matched normal tissues, covering comprehensive clinical courses of cSCC. Using bioinformatics analysis, they identified keratinocytes, CAFs, immune cells, and their subpopulations. The authors further compared signatures within subpopulations of keratinocytes along with the clinical progression, especially basal cells, and identified many interesting genes. They also further validate some of the markers in an independent cohort using IHC, followed by some knockdown experiments using cSCC cell lines.

      The strength of this study is the unique data set they have created, providing the community with invaluable resources to study and validate their findings. However, a lot of analyses were not robust enough to support the claims and conclusions in the paper. More clarification and cross-comparison with polished data are needed to further strengthen the study and claims.

      1) Stemness markers were used. The authors used COL17A1, TP63, ITGB1, and ITGA3 to represent stemness markers. However, these were not common classic stemness markers used in cSCC. What is the source claiming these genes were stemness markers in cSCC? TP63 is a master regulator and early driver event in SCC, while COL17A1, ITGB1, and ITGA3 are all ECM genes. The authors need to use commonly well-known stem cell markers in cSCC, e.g., LGR5, to mark stem-like cells.

      Thanks for raising this good point. We may not have provided a clear description of the markers COL17A1, TP63, ITGB1, and ITGA3 in the previous texts. We would like to clarify that these genes were used as the markers of epidermal stem cells in normal skin samples rather than tumor stem cells in cSCC. To avoid any possible misunderstanding, we revised the main text accordingly and added the references [4-11].

      2) Cell proportion analysis. The authors used the mean proportions to compare different clinical groups for subpopulations of keratinocytes, e.g., Figure 2B, and Figure 5B. This is not robust, as no statistics can be derived from this. For example, from Fig 2A, it is clearly shown there is a high level of heterogeneity of cellular compositions for normal samples. One cannot say which group is higher or lower simply based on mean not variance as well.

      We replotted the proportion analysis with statistics and presented the new graphs in Figure 2-figure supplement 1 for Figure 2B and Figure 5-figure supplement 1 for Figure 5B.

      3) Basal tumour cells in SCCIS and SCC. To make the findings valid, authors need to compare these cells/populations with the keratinocyte cell populations defined by Ji et al. Cell 2020. Do basal-SCCIS-tumours cells, also in SCC samples, resemble any of the population defined in Ji et al. Ji et al. also had 10 match normal, thus the authors need to validate their findings of SCC vs normal analysis using the Ji et al. dataset.

      Thanks for this valuable suggestion. We compared basal tumor cell in our study with the cell populations defined in Ji et al. Cell 2020 data using SingleCellNet [1]. The results showed that both the basal-SCCIS-tumor cells of SCCIS and basal tumor cells of cSCC in our study closely resemble the Tumor_KC_Basal subcluster defined in Ji et al’s paper (Figure 4-figure supplement 4, C and D). Tumor_KC_Basal highly expressed CCL2, CXCL14, FTH1, MT2A, which is consistent with our findings in basal tumor cells.

      4) Copy number analysis. Authors used inferCNV to perform copy number analysis using scRNA-seq data and identified CNVs in subpopulations of keratinocytes in SCCIS and SCC. To ensure these CNVs were not artefacts, were some of the CNVs identified by inferCNV well-known copy number changes previously reported in cSCC?

      In poorly-differentiated cSCC sample, the significant gains in chromosome 7, 9 and deletion in chromosome 10 were reported in previous study, indicating the reliability of the CNV analysis results (Figure 5-figure supplement 2) [12].

      5) Pseudotime analysis lines 308-313. Not sure the pseudotime analysis added much as, as it is unclear two distinct subgroups were identified from this analysis. Suggest removing this to keep it neater

      Thank you for this suggestion. We have deleted the result of pseudotime analysis.

      6) Selection of candidate genes for validation using IHC and cell line work. For example, lines 205-206, lines 352-356 and lines 437-441, authors selected several genes associated with AK and SCC to further validate using IHC and cell line knockdown work. What are the criteria for selecting those genes for validation? It is unclear to readers how these were selected. It reads like a fishing experiment, then followed by a knockdown. Clear rationale/criteria need to be elaborated.

      The first consideration of candidate gene selection is the fold change of expression. We have provided the statistical results of DEGs in Supplementary file 1b, 1h, 1j-1m. Then we selected top changed genes and conducted an extensive literature search on these genes. We prioritized genes that, although not directly associated with cSCC development, have a close relationship with related pathways, as determined through functional enrichment analysis. These genes were arranged for further verification experiments. We have added more details in main text and methods section.

      7) TME. Compared to keratinocytes populations, the investigation of TME cells was weak. (a) can authors produce UMAP files just for T cells, DC cells, and fibroblasts separately? Figure 7B is not easy to see those subclusters. (b) similar to what was done for keratinocytes, can authors find differentially expressed clusters and genes among the different clinical groups, associated with disease progression? (c) where are the myeloid cell populations, also B cells?

      Thank you for your suggestions. (a) We have added the UMAP files for T cells, DC cells and stromal cells separately in new Figure 7A. (b) We identified DEGs in TME cells among the different groups. Several key genes showed monotonically changing trends associated with disease progression. For example, with the increase of malignancy, FOS shows down-regulation while S100A8 and S100A9 monotonically increase in all three types of TME cells (Figure 7C). (c) We identified two types of myeloid cell populations, macrophage and monocyte derived DCs (MoDC). We didn’t find other myeloid cells, such as neutrophil. For B cells, there were only 28 B cells in poorly-differentiated cSCC sample, which didn’t meet the threshold for further cell-cell communication analysis.

      8) Heat shock protein genes line 327-329. HSP signature was well-known to be induced via tissue dissociation and library prep during the scRNA experiment. How could the authors be sure these were not artefacts induced by the experiment? If authors regress their gene expression against HSP gene signatures, would this cluster still be identified?

      Thank you for this valuable suggestion. It is important to note that the Basal-SCCIS-tumor cluster was identified through CNV analysis, rather than the HSP signature. To address this concern and further validate this result, “AddModuleScore” function in Seurat package was used to regress gene expression against HSP gene signatures for retrieved basal cells. Our result showed that Basal_SCCIS tumor population still can be identified after regression, even more clearly (Author response image 1).

      Author response image 1.

      The identity of Basal-SCCIS-tumor cluster considering regression against HSP signatures.

      9) Cell-cell communication analysis. The authors claimed that that cell-to-cell interaction was significantly enhanced in poorly-differentiated cSCC, and multiple interaction pathways were significantly active. How was this kind of analysis carried out? How did the authors define significance? what statistical method was used? these were all unclear. Furthermore, it is difficult to judge the robustness of the cell-cell communication analysis. Were these findings also supported by another method, such as celltalker, and cellphoneDB?

      To determine the significance of the increased overall cell-to-cell interaction strength between two groups, we utilized CellChat to obtain the communication strength in different samples. We combined the communication strength based on cell type pairs, where missing values were set to 0. We performed a paired Wilcoxon test to determine whether the enhancement of cell-to-cell interaction between samples was significant.

      For the comparison of outgoing or incoming interaction strength of the same cell types between two groups, we first extracted the communication strength of each signal pathway contributing to outgoing or incoming strength, and then merged the strengths of signal pathways among samples, where the strength of non-shared pathways with missing value was determined to be 0. Subsequently, we performed a paired Wilcoxon test to define the significance.

      For multiple groups comparisons, the Kruskal-Wallis rank sum test was first performed. If the p-value is less than 0.1, the pairwise Wilcoxon test was used for subsequent pairwise comparisons. The comparison of individual signaling pathways between groups is similar to the above. We defined p-value < 0.1 as significance threshold. We have added the significance test method in figure legend for Figure 7 and Figure 8 as well as and detailed statistical data in new Supplementary file 1q-1u.

      As suggested, we also used the approach of CellPhoneDB based on CellChatDB database to verify our cell-cell communication results. There are 55-58% of the ligand-receptor interactions predicted by CellChat were also predicted by CellPhoneDB (Author response image 2). The enhancement of cell interaction through MHC-II, Laminin and TNF signaling pathways in poorly-differentiated cSCC sample compare to normal sample were consistent in both CellChat and CellPhoneDB (Figure 8C and Figure 8-figure supplement 1B).

      Author response image 2.

      The overlap of the predicted ligand-receptor interactions between CellChat and CellPhoneDB.

      10) Statistics and significance. In general, the detail of statistics and significance was lacking throughout the paper. Authors need to specify what statistical tests were used, and the p-values. It is difficult to judge the correctness of the test, and robustness without seeing the stats.

      We have included all statistics and significance values in the figure legend and supplemental tables, and described the statistical tests in the methods section. In this revision, we have added the necessary details of statistics and significance in the main text and figures.

      11) Overall, this manuscript needs a lot of re-writing. A lot of discussion was also included in the results, making it really difficult to read overall. The authors should simplify the results sections, remove the discussion bits, and further highlight and streamline with the key results of this paper.

      Thanks a lot for this advice. We have revised the paper thoroughly, removed discussion in results section to make the manuscript easier to read.

    1. Author Response:

      Reviewer #1 (Public Review):

      5.The reported data point to an important role of the premotor and parietal regions of the left as compared to the right hemisphere in the control of ipsilateral and contralateral limb movements. These are also the regions where the electrodes were primarily located in both subgroups of patients. I have 2 concerns in this respect. The first concern refers to the specific locus of these electrodes. For premotor cortex, the authors suggest PMd as well as PMv as potential sites for these bilateral representations. The other principal site refers to parietal cortex but this covers a large territory. It would help if more specific subregions for the parietal cortex can be indicated, if possible. Do the focal regions where electrodes were positioned refer to the superior vs inferior parietal cortex (anterior or posterior), or intra-parietal sulcus. Second, the manuscript's focus on the premotor-parietal complex emerges from the constraints imposed by accessible anatomical locations in the participants but does not preclude the existence of other cortical sites as well as subcortical regions and cerebellum for such bilateral representations. It is meaningful to clarify this and/or list this as a limitation of the current approach.

      On the first issue, we have updated the manuscript to specify the subregion within the parietal cortex in which we see stronger across-arm generalization - namely, the superior parietal cortex. On the second issue, we have added text in the Discussion that reference subcortical areas shown to exhibit laterality differences in bimanual coordination, providing a more holistic picture of bimanual representations across the brain. In addition, we acknowledge that with our current patient population we are limited to regions with substantial electrode coverage, which does not include all areas of the brain.

      6.The evidence for bilateral encoding during unilateral movement opens perspectives for a better understanding of the control of bimanual movements which are abundant during every day life. In the discussion, the authors refer to some imaging studies on bimanual control in order to infer whether the obtained findings may be a consequence of left hemisphere specialization for bimanual movement control, leading to speculations about the information that is being processed for each of both limb movements. Another perspective to consider is the possibility that making a movement with one limb may require postural stabilization in the trunk and contralateral body side, including a contribution from the opposite limb that is supposedly resting on the start button. Have the authors considered whether this postural mechanism could (partly) account for this bilateral encoding mechanism, in particular, because it appears more prominent during movement execution as compared to preparation. Furthermore, could the prominence of bilateral encoding during movement execution be triggered by inflow of sensory information about both limbs from the visual as well as the somatosensory systems.

      Thank you for these comments. We have added a paragraph to the Discussion to address the hypothesis that some component of ipsilateral encoding may be related to postural stabilization.

      In response to the final point in this comment, we agree that bilateral information during execution could be reflective of afferent inputs (somatosensory and/or visual). However, the encoding model shows that activity in premotor and parietal regions are well predicted based on kinematics during the task. While visual and somatosensory system information are likely integrated in these areas, the kinematic encoding would point to a more movement-based representation.

      Reviewer #2 (Public Review):

      Weaknesses: 1. Although the current human ECoG data set is valuable, there is still large variability in electrode coverage across the patients (I fully acknowledge the difficulty). This makes statistical assessment a bit tricky. The potential factors of interest in the current study would be Electrode (=Region), Subject, Hemisphere, and their interactions. The tricky part is that Electrode is nested within Subject, and Subject is nested within Hemisphere. Permutation-based ANOVA used for the current paper requires proper treatment of these nested factors when making permutations (Anderson and Braak, 2003). With this regard, sufficient details about how the authors treated each factor, for instance, in each pbANOVA, are not provided in the current version of the manuscript. Similarly, the scope of statistical generalizability, whether the inference is within-sample or population-level, for the claims (e.g., statement about the hemispheric or regional difference) needs to be clarified.

      We discuss at length the issue of electrode variability and have addressed this in the revised manuscript. Graphically, we have added a Supplemental Figure (S2). Statistically, we appreciate the point about the need for the analysis to address the nested structure of the data. We have redone all of the statistics, now using a permutation-based linear mixed effects model with a random effect of patient. This approach did not change any of the findings.

      As to the comment about hemispheric or regional differences, the data show that both are important factors. Our hemispheric effect is characterized by stronger ipsilateral encoding in the left hemisphere and subsequently better across-arm generalization (Figures 2-4). We then examine the spatial distribution of electrodes that generalized well or poorly and found clusters in both hemispheres of electrodes that generalize poorly. In contrast, only in the left hemisphere did we find clusters of electrodes that generalize well. These electrodes were localized to PMd, PMv and superior parietal cortex (Fig 5D). In summary, we argue that activity patterns in M1 are similar in the left and right hemispheres, but there is a marked asymmetry for activity patterns over premotor and parietal cortices.

      Additional contexts that would help readers interpret or understand the significance of the work: The greater amount of shared movement representation in the left hemisphere may imply the greater reliance of the left arm on the left hemisphere. This may, in turn, lead to the greater influence of the ongoing right arm motion on the left arm movement control during the bimanual coordination. Indeed, this point is addressed by the authors in the Discussion (page 15, lines 26-41). One critical piece of literature missing in this context is the work done by Yokoi, Hirashima, and Nozaki (2014). In the experiments using the bimanual reaching task, they in fact found that the learning by the left arm is to the greater degree influenced by the concurrent motion of the right arm than vice versa (Yokoi et al., J Neurosci, 2014). Together with Diedrichsen et al. (2013), this study will strengthen the authors' discussion and help readers interpret the present result of left hemisphere dominance in the context of more skillful bimanual action.

      The Yokoi paper is a very important paper in revealing hemispheric asymmetries during skilled bimanual movements. However, we think it is problematic to link the hemispheric asymmetries we observe to the behavioral effects reported in the Yokoi paper (namely, that the nondominant, left arm was more strongly influenced by the kinematics of the right arm). One could hypothesize that the left hemisphere, given its representation of both arms, could be controlling both arms in some sort of direct way (and thus the action of the right arm will have an influence on left arm movement given the engagement of the same neural regions for both movements). It is also possible that the left hemisphere is receiving information about the state of both the right and left arms, and this underlies the behavioral asymmetry reported in Yokoi.

      Reviewer #3 (Public Review):

      In the present work, Merrick et al. analyzed ECoG recordings from patients performing out-and-back reaching movements. The authors trained a linear model to map kinematic features (e.g., hand speed, target position) to high frequency ECoG activity (HFA) of each electrode. The two primary findings were: 1) encoding strength (as assessed by held-out R2 values) of ipsilateral and contralateral movements was more bilateral in the left hemisphere than in the right and 2) across-arm generalization was stronger in the left hemisphere than in the right. As the authors point out in the Introduction, there are known 'asymmetries between the two hemispheres in terms of praxis', so it may not be surprising to find asymmetries in the kinematic encoding of the two hemispheres (i.e., the left hemisphere contributes 'more equally' to movements on either side of the body than the right hemisphere).

      There is one point that I feel must be addressed before the present conclusions can be reached and a second clarification that I feel will greatly improve the interpretability of the results.

      First, as is often the case when working with patients, the authors have no control over the recording sites. This led to some asymmetries in both the number of electrodes in each hemisphere (as the authors note in the Discussion) and (more importantly) in the location of the recording electrodes. Recording site within a hemisphere must be controlled for before any comparisons between the hemispheres can be made. For example, the authors note that 'the contralateral bias becomes weaker the further the electrodes are from putative motor cortex'. If there happen to be more electrodes placed further from M1 in the left hemisphere (as Supplementary Figure 1 seems to suggest), than we cannot know whether the results of Figures 2 and 3 are due to the left hemisphere having stronger bilateral encoding or simply more electrodes placed further from M1.

      The reviewer makes a very valid point and this comment has led to our inclusion of a new Supplementary Figure, S2, in which we quantify the percentage of electrodes in each subregion.

      Second, it would be useful if the authors provided a bit of clarification about what type of kinematic information the linear model is using to predict HFA. I believe the paragraph titled 'Target modulation and tuning similarity across arms' suggests that there is very little across-target variance in the HFA signal. Does this imply that the model is primarily ignoring the Phi and Theta (as well as their lagged counterparts) and is instead relying on the position and speed terms? How likely is it that the majority of the HFA activity around movement onset reflects a condition-invariant 'trigger signal' (Kaufman, et al., 2016). This trigger signal accounts for the largest portion of neural variance around movement onset (by far), and the weight of individual neurons in trigger signal dimensions tend to be positive, which means that this signal will be strongly reflected in population activity (as measured by ECoG). This interpretation does not detract from the present results in any way, but it may serve to clarify them.

      To address this comment, we have added a new figure (Fig 6) which shows the relative contribution of each kinematic feature as well as their average weights across time for both contralateral and ipsilateral movements. This figure also addresses the reviewer’s question about the contribution of the target position to the model. As can be seen, features that reflect timing/movement initiation (position, speed) make a larger contribution compared to the two features which capture directional tuning (theta, phi). As the reviewer suggested, this result is in line Kaufman et al. (2016) which reported that a condition-invariant ‘trigger signal’ comprises the largest component of neural activity. We note that the target dependent features theta and phi still make a substantial contribution to the model (relative contribution: contra = 32%, ipsi = 37%). Previously, we have tested the contribution of the theta and phi features by comparing two models, one that only used position and speed (Movement model) and one that also included the two angular components phi and theta (Target Model). For a subset of electrodes, the held-out predictions were significantly better using the Target Model, a result we take as further evidence of electrode tuning within our dataset.

      The figure below shows an electrode located in M1 that is tuned to targets when the patient reached with their contralateral arm as an example. We believe that having an explicit depiction of how the four features contribute to the HFA predictions will help the reader evaluate the model. These points are now addressed in the text in the results section discussing Figure 6.

    1. Author Response

      Reviewer #3 (Public Review):

      This manuscript by Pendse et al aimed to identify the role of the complement component C1q in intestinal homeostasis, expecting to find a role in mucosal immunity. Instead, however, they discovered an unexpected role for C1qa in regulating gut motility. First, using RNA-Seq and qPCR of cell populations isolated either by mechanical separation or flow cytometry, the authors found that the genes encoding the subunits of C1q are expressed predominantly in a sub-epithelial population of cells in the gut that Cd11b+MHCII+F4/80high, presumably macrophages. They support this conclusion by analyzing mice in which intestinal macrophages are depleted with anti-CSF1R antibody treatment and show substantial loss of C1qa, b and c transcripts. Then, they generate Lyz2Cre-C1qaflx/flx mice to genetically deplete C1qa in macrophages and assess the consequences on the fecal microbiome, transcript levels of cytokines, macromolecular permeability of the epithelial barrier, and immune cell populations, finding no major effects. Furthermore, provoking intestinal injury with chemical colitis or infection (Citrobacter) did not reveal macrophage C1qa-dependent changes in body weight or pathogen burden.

      Then, they analyzed C1q expression by IHC of cross-sections of small and large intestine and find that C1q immunoreactivity is detectable adjacent to, but not colocalizing with, TUBB3+ nerve fibers and CD169+ cells in the submucosa. Interestingly, they find little C1q immunoreactivity in the muscularis externa. Nevertheless, they perform RNA-sequencing of LMMP preparations (longitudinal muscle with adherent myenteric plexus) and find a number of changes in gene ontology pathways associates with neuronal function. Finally, they perform GI motility testing on the conditional knockout mice and find that they have accelerated GI transit times manifesting with subtle changes in small intestinal transit and more profound changes in measures of colonic motility.

      Overall, the manuscript is very well-written and the observation that macrophages are the major source of C1q in the intestine is well supported by the data, derived from multiple approaches. The observations on C1q localization in tissue and the strength of the conclusions that can be drawn from their conditional genetic model of C1qa depletion, however, would benefit from more rigorous validation.

      1) Interpretation of the majority of the findings in the paper rest on the specificity of the Lyz2 Cre for macrophages. While the specificity of this Cre to macrophages and some dendritic cells has been characterized in the literature in circulating immune cells, it is not clear if this has been characterized at the tissue level in the gut. Evidence demonstrating the selectivity of Cre activity in the gut would strengthen the conclusions that can be drawn.

      As indicated by the reviewer, Cre expression driven by the Lyz2 promoter is restricted to macrophages and some myeloid cells in the circulation (Clausen et al., 1999). To better understand intestinal Lyz2 expression at a cellular level, we analyzed Lyz2 transcripts from a published single cell RNAseq analysis of intestinal cells (Xu et al., 2019; see Figure below). These data show that intestinal Lyz2 is also predominantly expressed in gut macrophages with limited expression in dendritic cells and neutrophils.

      Figure. Lyz2 expression from single cell RNAseq analysis of mouse intestinal cells. Data are from Xu et al., Immunity 51, 696-708 (2019). Analysis was done through the Single Cell Portal, a repository of scRNAseq data at the Broad Institute.

      Additionally, our study shows that intestinal C1q expression is restricted to macrophages (CD11b+MHCII+F4/80hi) and is absent from other gut myeloid cell lineages (Figure 1E-H). This conclusion is supported by our finding that macrophage depletion via anti-CSF1R treatment also depletes most intestinal C1q (Figure 2A-C). Importantly, we found that the C1qaDMf mice retain C1q expression in the central nervous system (Figure 2 – figure supplement 1). Thus, the C1qaDMf mice allow us to assess the function of macrophage C1q in the gut and uncouple the functions of macrophage C1q from those of C1q in the central nervous system.

      2) Infectious and inflammatory colitis models were used to suggest that C1qa depletion in Lyz2+ lineage cells does not alter gut mucosal inflammation or immune response. However, the phenotyping of the mice in these models was somewhat cursory. For example, in DSS only body weight was shown without other typical and informative read-outs including colon length, histological changes, and disease activity scoring. Similarly, in Citrobacter only fecal cfu were measured. Especially if GI motility is accelerated in the KO mice, pathogen burden may not reflect efficiency of immune-mediated clearance alone.

      We have added additional results which support our conclusion that C1qaDMf mice do not show a heightened sensitivity to acute chemically induced colitis. In Figure 3 – figure supplement 1 we now show a histological analysis of the small intestines of DSS-treated C1qafl/fl and C1qaΔMφ mice. This analysis shows that C1qaDMf mice have similar histopathology, colon lengths, and histopathology scores following DSS treatment. Likewise, our revised manuscript includes histological images of the colons of Citrobacter rodentium-infected C1qafl/fl and C1qaΔMφ mice showing similar pathology (Figure 3 – figure supplement 2).

      3) The evidence for C1q expression being restricted to nerve-associated macrophages in the submucosal plexus was insufficient. Localization was shown at low magnification on merged single-planar images taken from cross-sections. The data shown in Figure 4C is not of sufficient resolution to support the claims made - C1q immunoreactivity, for example, is very difficult to even see. Furthermore, nerve fibers closely approximate virtually type of macrophage in the gut, from those in the lamina propria to those in the muscularis….Finally, the resolution is too low to rule out C1q immunoreactivity in the muscularis externa.

      Similar points were raised by Reviewer 2. Our original manuscript claimed that C1q-expressing macrophages were mostly located near enteric neurons in the submucosal plexus but were largely absent from the myenteric plexus. However, as both Reviewers have pointed out, this conclusion was based solely on our immunofluorescence analysis of tissue cross-sections.

      To address this concern we further characterized C1q+ macrophage localization by performing a flow cytometry analysis on macrophages isolated from the mucosa (encompassing both the lamina propria and submucosa) and the muscularis, finding similar levels of C1q expression in macrophages from both tissues (Figure 4 – figure supplement 1 in the revised manuscript). Although the mucosal macrophage fraction encompasses both lamina propria and submucosal macrophages, our immunofluorescence analysis (Figure 4 B and C) suggests that the mucosal C1q-expressing macrophages are mostly from the submucosal plexus. This observation is consistent with the immunofluorescence studies of CD169+ macrophages shown in Asano et al., which suggest that most C169+ macrophages are located in or near the submucosal region, with fewer near the villus tips (Fig. 1e, Nat. Commun. 6, 7802).

      Most importantly, our flow cytometry analysis indicates that the muscularis/myenteric plexus harbors C1q-expressing macrophages. To further characterize C1q expression in the muscularis, we performed RNAscope analysis by confocal microscopy of the myenteric plexus from mouse small intestine and colon (Figure 4D). The results show numerous C1q-expressing macrophages positioned close to myenteric plexus neurons, thus supporting the flow cytometry analysis. We note that although the majority of C1q immunofluorescence in our tissue cross-sections was observed in the submucosal plexus, we did observe some C1q expression in the muscularis by immunofluorescence (Figure 4B and C). We have rewritten the Results section to take these new findings into account.

      Is the 5um average on the proximity analysis any different for other macrophage populations to support the idea of a special relationship between C1q-expressing macrophages and neurons?

      We agree that the proximity analysis lacks context and have therefore removed it from the figure. The other data in the figure better support the idea that C1q+ macrophages are found predominantly in the submucosal and myenteric plexuses and that they are closely associated with neurons at these tissue sites.

      There are many vessels in the submucosa and many associated perivascular nerve fibers - could the proximity simply reflect that both cell types are near vessels containing C1q in circulation?

      Our revised manuscript includes RNAscope analysis showing C1q transcript expression by macrophages that are closely associated with enteric neurons (Figure 4D). These findings support the idea that the C1q close to enteric neurons is derived from macrophages rather than from the circulation.

      4) A major disconnect was between the observation that C1q expression is in the submucosa and the performance of RNA-seq studies on LMMP preparations. This makes it challenging to draw conclusions from the RNA-Seq data, and makes it particularly important to clarify the specificity of Lyz2-Cre activity.

      Our revised manuscript provides flow cytometry data (Figure 4 – figure supplement 1) and RNAscope analysis (Figure 4D) showing that C1q is expressed in macrophages localized to the myenteric plexus. This accords with the results of our RNAseq analysis, which indicates altered LMMP neuronal function in C1qa∆Mφ mice (Figure 6A and B). Since neurons in the myenteric plexus are known to govern gut motility, it also helps to explain our finding that gut motility is accelerated in C1qa∆Mφ mice.

      Finally, the pathways identified could reflect a loss of neurons or nerve fibers. No assessment of ENS health in terms of neuronal number or nerve fiber density is provided in either plexus.

      Reviewers 1 and 2 also raised this point. Our revised manuscript includes a comparison of the numbers of enteric neurons in C1qafl/fl and C1qaΔMφ mice. There were no marked differences in neuron numbers in C1qaDMf mice when compared to C1qafl/fl controls (Figure 5A and B). There were also similar numbers of inhibitory (nitrergic) and excitatory (cholinergic) neuronal subsets and a similar enteric glial network (Figure 5C-E). Thus, our data suggest that the altered gut motility in the C1qaΔMφ mice arises from altered neuronal function rather than from an overt loss of neurons or nerve fibers. This conclusion is further supported by increased neurogenic activity of peristalsis (Figure 6H and I), and the expression of the C1q receptor BAI1 on enteric neurons (Figure 6 – figure supplement 4).

      5) To my knowledge, there is limited evidence that the submucosal plexus has an effect on GI motility. A recent publication suggests that even when mice lack 90% of their submucosal neurons, they are well-appearing without overt deficits (PMID: 29666241). Submucosal neurons, however, are well known to be involved in the secretomotor reflex and fluid flux across the epithelium. Assessment of these ENS functions in the knockout mice would be important and valuable.

      Our revised manuscript provides new data showing C1q expression by muscularis macrophages in the myenteric plexus. We analyzed muscularis macrophages by flow cytometry and found that they express C1q (Figure 4 – figure supplement 1). These findings are further supported by RNAscope analysis of C1q expression in wholemounts of LMMP from small intestine and colon (Figure 4D and E). These results are thus consistent with the increased CMMC activity and accelerated gut motility in the C1qaDMf mice. As suggested by the reviewer, our finding of C1q+ macrophages in the submucosal plexus indicates that C1q may also have a role controlling the function of submucosal plexus neurons. We are further exploring this idea through extensive additional experimentation. Given the expanded scope of these studies, we are planning to include them in a follow-up manuscript.

      6) Immune function and GI motility can be highly sex-dependent - in all experiments mice of both sexes were reportedly used but it is not clear if sex effects were assessed.

      This is a great point, and as suggested by the reviewer we indeed did encounter differences between male and female mice in our preliminary assays of gut motility. We therefore conducted our quantitative comparisons of gut motility between C1qafl/fl and C1qaDMf mice in male mice and now clearly indicate this point in the Materials and Methods.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper describing membrane potential dynamics of hippocampal principal cells during UP/DOWN transitions and sharp-wave ripples. Using whole-cell in combination with linear LFP recordings in head-fixed awake mice, the authors show striking differences of membrane potential responses in principal cells from the dentage gyrus, CA3 and CA1 sectors. The authors propose that switches between a dominant inhibitory excitable state and a disinhibited non-excitable state control the intra-hippocampal dynamics during UP/DOWN transitions.

      Obtaining intracellular recordings in vivo is commendable. The authors provide valuable data and analysis. While data show clear trends and some of the conclusions are well supported, the authors may need to clarify the following potential confounds, which can actually impact their conclusions and interpretation:

      1- All the analysis is based in z-scored membrane potential responses but the mean resting membrane potential is never reported. For DG granule cells recorded in awake conditions, the membrane potential is usually hyperpolarized so that most of the effect may be due to reversed GABAa mediated currents. Similarly, for those cells exhibiting the non-expected polarization during UP/DOWN states there may be drifts around reversal potentials explaining their behavior. Moreover, regional trends on passive and active membrane parameters and connectivity can actually explain part of the variability. A longitudinal comparison of state Vm and spikes in fig.5 suggests that some of the largest depolarized responses are not correlated with firing. Authors should evaluate this angle, ideally showing the distribution of membrane potential values across cells and regions and confronting this with the different membrane potential responses.

      We added Figure 1 - figure supplement 4, which now describes the mean resting membrane potential, input resistance, burst propensity, and spikes per burst for the recorded cells. These data are provided in Figure 1 - source data 1 together with a recording identifier that can be used to link each cell to all other figure panels and data files. We further added Figure 1 - figure supplement 1, which provides examples of morphological information for our recordings, Figure 1 - figure supplement 2 that shows examples of bursts from morphologically identified neurons, and Figure 1 - figure supplement 3 that shows the locations of recorded cells.

      In addition, we added Figure 5 - figure supplement 4 that includes the resting Vm and proximodistal location of cells in relation to their UP-DOWN modulation. We did not detect any significant trends with respect to brain state modulation. DG cells are more hyperpolarized compared to CA3 and CA1 cells and are closest to the reversal potential for GABAa (Figure 1 - figure supplement 4). The lack of any clear trends with respect to the resting Vm suggests that drifts around the GABAa reversal potential are unlikely to be a major factor driving variability in the observed UDS modulation.

      2- While there are some trends for each hippocampal regions, there is also individual variability across cells during UP/DOWN transitions (fig.5) and near ripples (fig.6). What part of this variability can be explained by proximodistal and/or deep-superficial differences of cell location and identity? Can authors provide some morphological validation, even if in only a subset of cells? For CA3, proximodistal heterogeneity for intrinsic properties and entorhinal input responses are well documented in intracellular recordings both in vitro and in vivo. What is the location of CA3 cell contributing to this study? For CA1 cells, deep-superficial trends of GABAergic perisomatic inhibition and connectivity with input pathways dominate firing responses. Regarding DG cells, are all they from the upper blade?

      We now provide morphological validation for a subset of cells (Figure 1 - figure supplement 1). Since we patch multiple cells in each experiment it is not possible to unequivocally determine their depth within the cell layer, although it is possible to confirm that they are granule cells or pyramidal cells in experiments where all labeled cells are principal neurons (Figure 1 - figure supplement 1). In addition, we added Figure 1 - figure supplement 3 that shows the proximodistal locations of recorded cells. With respect to the DG cells 20/22 are from the upper blade, with only two granule cells recorded in the lower blade (Figure 1 - figure supplement 3).

      We added Figure 5 - figure supplement 4 that includes the resting Vm and proximodistal location of each cell as a function of UP-DOWN modulation. We did not detect any significant trends with respect to UDS modulation.

      In addition, we added Figure 6 - figure supplement 1 that includes the resting Vm and proximodistal location of each cell as a function of ripple modulation. This figure shows that the most depolarized CA3 cells tend to hyperpolarize most during ripples, consistent with the fact that these cells are furthest away from the GABAa reversal potential and experience the highest driving force. No other significant trends were detected, although we would like to note that our recordings do not span the full proximodistal axis and may hence not be ideally suited to test the dependence of our results on proximodistal location.

      3- AC-coupled LFP recordings cannot provide unambiguous identification of the sign of phasic CSD signals, because fluctuations accompanying UP/DOWN states alter the baseline reference. This is actually the case, given changes of membrane potential accompanying UP/DOWN transitions. I recommend reading Brankack et al. 1993 doi: 10.1016/0006-8993(93)90043-m. The authors should acknowledge this limitation and discuss how it could influence their results. One potential solution to get rid of this effect is using principal/independent component analysis for blind source separation.

      We acknowledge the inherent limitations of AC-coupled recordings in regards to CSD analysis (Brankack et al., 1993). However, we do not believe these limitations affect our analysis or results for the reasons illustrated in Figure R1. Specifically, we do not attempt to measure the low frequency (< 1 Hz) CSD content directly. Instead, we extract the envelope of the rectified fast CSD transients. In the original submission we referred to this envelope signal as “DG CSD magnitude”, which may have been confusing. In the revised manuscript we use “DG CSD activity” instead to remove any suggestion that the low frequency CSD signal was directly measured. Notice that because of the rectification step the envelope signal is insensitive to the actual polarity of the fast transient CSD fluctuations. Using the envelope, we identify UP states as time periods when the rate and amplitude of EC input current transients, rather than the DC level, increases, in accordance with previous publications (Isomura et al., 2006). We further validated that the extracted UP/DOWN states reflect modulation of pupil diameter and ripple rate, quantities that are independently measured.

      Figure R1. Deriving slow envelope signal from AC coupled recordings. (A) In this example the true CSD signal contains both a slow component (8 Hz) and a fast component (80 Hz) that is amplitude modulated by the slow component. Such phase-amplitude coupling is well known between theta and gamma oscillations in the hippocampus. The true CSD shows a current sink with time-varying magnitude. (B) The power spectral density (PSD) estimate of the signal in (A) shows both the slow (8 Hz) and fast (three peaks near 80 Hz) components. (C) Assume LFP recordings are obtained with a high-pass filter that has eliminated the slow component. Consequently, the estimated CSD signal contains only fast fluctuations. Furthermore, instead of a time-varying current sink it shows quickly alternating sinks and sources (both negative and positive values). The slow component can be visualized as the amplitude envelope (interrupted red line) of the signal. (D) PSD estimate shows that the slow component is absent from the extracted CSD signal. (E) Rectifying the CSD estimate (black) and then filtering (red) approximately recovers the true slow component (red interrupted). This is how the DG CSD activity signal is obtained. (F) PSD estimate of the rectified and filtered CSD signal recovers the slow component (interrupted red vertical line).

      Reviewer #2 (Public Review):

      In this manuscript "Inhibition is the hallmark of CA3 intracellular dynamics around awake ripples" the authors obtained Vm recordings from CA1, CA3 and DG neurons while also obtaining local field potentials across the CA1 and DG layers. This enabled them to identify periods of up and down state transitions, and to detect sharp-wave ripples (SWRs). Using these data, they then came to the conclusion that compared to CA1 and DG, the Vm of more CA3 neurons is hyperpolarized at the approximate time of SWRs.

      Unfortunately, for the following reasons, the current manuscript does not necessarily support this conclusion:

      Recordings are obtained in mice who are recently (same day) recovering from craniotomy surgery/anesthesia and have no training on head fixation. This means that the behavioral state is abnormal, and the animal may have residual anesthesia effects.

      The main surgery for implanting the head-fixation apparatus and marking the coordinates for multisite and pipette insertion was carried out at least two days before the experiment. On the day of the experiment animals were briefly lightly anesthetized (<1 hr, at <1% isoflurane at 1 lit/min) for the sole purpose of resecting the dura at the two sites for multisite probe and pipette insertion. This procedure was carried out on the same day as the experiment in order to minimize the time the brain was exposed and optimize the quality of the recordings. Experiments began at least six hours after this short procedure. Furthermore, animals were given time to get familiarized with the behavioral apparatus before recordings began and showed no signs of distress.

      Previous studies show that about 95% of isoflurane is eliminated within minutes by exhalation (Holaday et al., 1975). The further elimination of isoflurane proceeds with a fast phase with half-time of about 7-9 min and a slower phase with half-time of about 100-115 min (Chen et al., 1992), with the faster phase reflecting elimination from the brain (Litt et al., 1991). Given these considerations there should be negligible residual isoflurane from the short anesthesia six hours later when recordings are initiated.

      In order to further investigate whether the short and light anesthesia during the day of recordings has any effect on the results reported in the paper, we carried out additional experiments in which we performed the surgery, including dura removal, 3 days before the recording session. The animals were habituated under head-fixation on the spherical treadmill for two hour periods each of the two days following the surgery. On the third day after surgery, we carried out recordings without any surgical procedures or anesthesia. The durations of UP and DOWN states without same day anesthesia were similar to those obtained in our previous experiments (Figure 2- figure supplement 4). The additional CA3 whole-cell recordings obtained in these new experiments have the same hyperpolarization features typical of our previous recordings. These additional experiments argue that the brief anesthesia on the day of recordings has no significant effect on the results.

      Most of the paper is dedicated to dynamics around up-down state transitions, not focused on ripples.

      We changed the title to “Up-Down states and ripples differentially modulate membrane potential dynamics across DG, CA3, and CA1 in awake mice” to reflect the analysis of both UP-DOWN state transitions and ripples. The two analyses are linked as the brain state modulation accounts for the slow Vm modulation around ripples.

      Vm should be examined raw first, then split into fast and slow -the cell lives with the raw Vm.

      The raw Vm can be obtained by adding the slow and fast Vm components. Hence the behavior of the Vm around ripples can be obtained by adding the panels of columns 1 and 3 in Figure 6. Decomposing into the slow and fast components illustrates how the slow modulation around ripples is due to brain state modulation of the slow component of the Vm (Figure 6).

      While some (assumed) CA3 principal cells were hyperpolarized around the time of ripples, saying inhibition is the hallmark of CA3 dynamics around ripples is an exaggeration, especially because it does not seem mechanistically tied to anything else.

      While a small fraction of CA3 cells is excited around ripples, the majority is inhibited. We suggest that the inhibition of the majority of CA3 neurons can account for the sparse and selective activation of CA3 around ripples.

      The use of ripple onset time is questionable, since the detected onset of the ripple depends on the detector settings, amplifier signal-to-noise ratio, etc. The best and most widely used (including by a subset of these authors) metric is the ripple peak time.

      We added Figure 6 - figure supplement 2, which shows that the Vm modulation around peak ripple power is the same as the modulation around ripple start, except for a small time shift due to the fact that the ripple power peaks shortly after ripple start. Our focus on ripple onset facilitates characterizing the timing of pre-ripple activity, such as the Vm depolarization observed before ripple onset for DG and CA1 neurons.

      There is not enough raw data (or quality metrics) shown to judge the quality of the data, especially for the whole cell recordings. For instance what was the input resistance of the neurons? Was the access resistance constant?

      We added Figure 1 - figure supplement 4, which now describes the mean resting membrane potential, input resistance, burst propensity, and spikes per burst for the recorded cells. These data are provided in Figure 1 - source data 1 together with a recording identifier that can be used to link each cell to all other figure panels and data files. We further added Figure 1- figure supplement 1, which provides examples of morphological information for our recordings, Figure 1 - figure supplement 2 that shows examples of bursts from morphologically identified neurons, and Figure 1 - figure supplement 3 that shows the locations of recorded cells.

      There is not enough explanation regarding why the reported results on the spiking of CA1 and CA3 neurons in SWRs is so different than previously published. In general, whole cell recording is not the most reliable way to record spike timing, and the presented whole cell data differ from previously published juxtacellular and extracellular recording methods, which better preserve physiological spiking activity.

      The CA1 neurons in this study depolarize and elevate their firing around ripples, consistent with previous intracellular and extracellular recordings. Our study reveals hyperpolarization of the majority of CA3 cells while only a small fraction is depolarized. This is consistent with the sparse activation of CA3 around ripples previously reported with extracellular studies. The overall firing rate change of CA3 neurons around ripples is a balance between the firing rate elevation of the small subset of activated cells and the net decrease in firing across the rest of the population. Since the baseline firing rate of CA3 pyramidal neurons in quiet wakefulness and sleep is low, the ripple-associated inhibition may not be readily observable in the spiking of individual CA3 neurons due to a “floor effect”. The overall rate of CA3 neurons we record increases before ripple onset, consistent with previous studies (Fig. 6D4). The subthreshold hyperpolarization of the majority of neurons provides novel insights into the mechanisms ensuring sparse and selective activation of the CA3 population around ripples.

      The number of neurons from each area is not reported.

      The number of cells was (indirectly) reported as the number of rows in Figs. 3-7. We now report the number of cells explicitly: 22 DG cells, 32 CA3 cells, and 32 CA1 cells.

      There is no verification of cell type so it is inappropriate to assume that all neurons are the principal neurons.

      We added Figure 1 - figure supplement 1, which shows morphological identification of recorded cells. We patch multiple cells in each experiment, but we can confirm the morphological identity of principal neurons when all stained cells have morphology of dentate granule cells or CA3/CA1 pyramidal neurons. The properties of morphologically identified cells in Figure 1 - figure supplement 1 are typical of all recorded cells (morphologically identified neurons from Figure 1 - figure supplement 1 are shown as diamonds in Figure 1- figure supplement 4, while the rest are shown as dots). There were no significant differences between the two groups (p > 0.05 t-test; p > 0.05 Wilcoxon rank sum test).

      Are the fluctuations in the CA3 Vm generally smaller than for CA1 and DG because of physiology or technical reasons?

      The recordings were done in exactly the same way across areas, arguing against technical reasons for any differences observed across the hippocampal subfields.

      Reviewer #3 (Public Review):

      During slow wave sleep and quiet immobility, communication between the hippocampus and the neocortex is thought to be important for memory formation notably during periods of hippocampal synchronous activity called sharp-wave ripple events. The cellular mechanisms of sharp-wave ripple initiation in the hippocampus are still largely unknown, notably during awake immobility. In this paper, the authors addressed this question using patch-clamp recordings of principal cells in different hippocampal subfields (CA3, CA1 and the dentate gyrus) combined with extracellular recordings in awake head-fixed mice as well as computer modeling. Using the current source density (CSD) profile of local field potential (LFP) recordings in the molecular layer of the dentate gyrus as a proxy of UP/DOWN state activity in the entorhinal cortex they report the preferential occurrence of sharp-wave ripple (recorded in area CA1) during UP states with a higher probability toward the end of the UP state (unlike eye blinks which preferentially occur during DOWN states). Patch-clamp recordings reveal that a majority of dentate granule cells get depolarized during UP state while a majority of CA3 pyramidal cells get hyperpolarized and CA1 pyramidal cells show a more mixed behavior. Closer examination of Vm behavior around state transitions revealed that CA3 pyramidal cells are depolarized and spike at the DOWN/UP transition (with some cells depolarizing even earlier) and then progressively hyperpolarize during the course of the UP state while DGCs and CA1 pyramidal cells tend to depolarize and fire throughout the UP state. Interestingly, CA3 pyramidal cells also tend to be hyperpolarized during ripples (except for a minority of cells that get depolarized and could be instrumental in ripple generation), while DGCs and CA1 pyramidal cells tend to be depolarized and fire. The strong activation of dentate granule cells during ripples is particularly interesting and deserves further investigations. The observation that the probability of ripple occurrence increases toward the end of the UP state, when CA3 pyramidal cells are maximally hyperpolarized, suggests that the inhibitory state of the CA3 hippocampal network could be permissive for ripple generation possibly by de-inactivation of voltage-gated channels thus increasing their excitability (i.e. ability to get excited). Altogether, these results confirm previous work on the impact of slow oscillations on the membrane potential of hippocampal neurons in vivo under anesthesia but also point to specificities possibly linked to the awake state. They also invite to revisit previous models derived from in vitro recordings attributing synchronous activity in CA3 to a global build-up of excitatory activity in the network by suggesting a role for Vm hyperpolarization in preserving the excitability of the CA3 network.

      1) In light of recent report of heterogeneity within hippocampal cell types (and notably description of a new CA3 pyramidal cell type instrumental for sharp-wave ripple generation) (Hunt et al., 2018), the small minority of CA3 pyramidal cells depolarized during ripples deserve more attention. These cells are indeed likely key in the generation of sharp wave ripple. Several analyses could be performed in order to decipher whether they have specific intrinsic properties (baseline Vm, firing threshold, burst propensity), whether they are located in specific sub-areas of CA3 (a versus b, deep versus superficial) and whether they are distinctively modulated during UP/DOWN states.

      Following the reviewer’s suggestion we now analyze the properties and UDS modulation of the CA3 neurons that are depolarized around ripples (Figure 6 - figure supplement 3). These neurons have comparable resting Vm, spike thresholds, and burst propensity as the rest of the CA3 population (p > 0.05, t-test). These CA3 cells had lower firing probability in the DOWN state. The locations of the depolarized cells are distributed across CA3c,b and are not clustered compared to the rest of the cells (Figure R2).

      Figure R2. Proximodistal locations of CA3 cells that depolarize during ripples. Same as Figure 1 - figure supplement 3, but CA3 cells showing depolarization in their ripple-triggered average (RTA) response are marked with black dots. There was no significant difference in the proximodistal locations of these cells compared to the rest of the CA3 population (p > 0.05, t-test).

      The population of athorny cells described in Hunt et al. represents a small percentage of CA3 cells (10-20%) that are concentrated in the CA3a region, which we do not sample in our recordings. Hence, the depolarized cells are unlikely to correspond to the athorny cells reported in Hunt et al.

      2) The authors use CSD analysis in the DG as a proxy of synaptic inputs coming from the EC to define alternating periods of UP and DOWN states. I have few questions concerning this procedure: 1- It is unclear if only periods when animals was still/immobile were analyzed. 2- How coherent were these periods with slow oscillations recorded in the cortex (which are also recorded with the linear probe?).

      The analysis was restricted to periods of immobility, which comprise the majority of the recording time as the animals are not performing any task. Cortical LFPs exhibit high coherence for low frequencies (<1 Hz) with the rectified DG CSD signal (Figure R3), although the contribution of volume conduction to this effect cannot be ruled out.

      Figure R3. Coherence between DG CSD power and cortical LFP. (Top) population average magnitude squared coherence between DG CSD power (rectified CSD from the DG molecular layer) and cortical LFP across all recorded datasets. Notice the elevated coherence at low frequencies (< 1 Hz, vertical interrupted line) as well as the peak at theta ( 7-8 Hz). Volume conduction from other brain areas (i.e. the hippocampus) contributes to the cortical LFP and may be responsible for the coherence at theta, as well as at low frequencies. (Bottom) Each row in the pseudocolor image shows the coherence between DG CSD power and cortical LFP for a given dataset.

      3- How long did these periods last? Did they occur during classically described hippocampal states (LIA/SIA) or do they correspond to a different state (Wolansky et al., J Neurosci 2006).

      The distribution of UP and DOWN state durations is shown in Figure 2 - figure supplement 4.

      We also added Figure 2 - supplementary figure 8 that shows the distribution of LIA and SIA transitions as a function of UDS phase. The LIA and SIA states were computed based on LFPs from CA1 stratum radiatum as described in (Hulse et al., 2017). The detected LIA→SIA transitions map very closely to UP→DOWN transitions. The SIA→LIA transitions are also concentrated around DOWN→UP transitions, but the distribution is broader compared to the LIA→SIA transitions. These observations are consistent with UP states broadly overlapping with LIA and DOWN states with SIA.

      3) To better characterize hippocampal CSD profiles around ripples and UP/Down states transitions, could you plot ripple and UDS transition-triggered average CSD profiles across hippocampal subfields?

      We added Figure 2 - supplementary figure 7 that shows average CSD profiles around UP/DOWN state transitions and ripples.

      4) The duration of UP states appears longer than that reported in anesthetized animals. To ascertain this fact could the authors quantify and report mean UP and DOWN states durations? Shorter DOWN states would decrease the probability to detect ripple. Could the authors correct for this bias in their analysis of ripple occurrence during UP and DOWN states?

      We report the medians and means of the distributions of UP and DOWN durations in Figure 2 - figure supplement 4. Ripples occur almost exclusively during the UP states, with almost no ripples occurring in DOWN states. Furthermore, the duration of UP and DOWN states is comparable suggesting that the duration of DOWN states does not bias the probability of ripple detection. We also added Figure 2 - figure supplement 2B, showing the rate (in Hz) of ripple occurrence as a function of UDS phase, which explicitly controls for UDS phase occupancy.

      The duration of UP and DOWN states in quiet wakefulness depend on the behavior of the animal, attentional state, and external stimuli and need not be the same as in anesthesia or sleep when the animal is not behaving and is less responsive to external stimuli. To provide validation that the extracted UP and DOWN states in quiet wakefulness indeed correspond to genuine brain states, we show that the pupil diameter and ripple rates which are independently extracted are strongly modulated around the extracted UP and DOWN states.

      5) The authors report a high coherence between the Vm of an example CA3 pyramidal cells and UP/DOWN state in DG. Was it a general property of a majority of CA3 pyramidal cells? The coherence values should be reported for all CA3 pyramidal cells.

      We added Figure 2 - figure supplement 1, which reports the coherence of all cells across the subfields with the rectified DG CSD. The coherence values are similar across cells and subfields. We also report correlations between the slow component of the Vm and DG CSD activity for all cells in Figure 3. Neurons in CA3 exhibit negative correlations in contrast to DG and CA1, with the absolute values of the correlations similar across the subfields.

      6) Was the high coherence between DG CSD magnitude and CA3 Vm specific to these slow oscillatory periods or a more general feature of the DG/CA3 functional coupling. For example, was it also observed during theta/movement periods?

      Figure 2 - figure supplement 1 reports the coherence of all cells across the subfields with rectified DG CSD over the entire recording duration. Mice do not perform any tasks during the recordings so periods of immobility and quiet wakefulness comprise the majority of the recording session and are the focus of our analysis. During some occasional theta periods there is increased coherence in the theta frequency band (figure R4).

      7) Fig. 6 shows depolarization and increase firing in DGCs up to 150 ms prior to ripple onset. However, ripples sometime occur in bursts with one ripple following others. Could such phenomenon explain the firing prior to ripples? (which would in fact correspond to firing during a previous ripple). What is the behavior of firing rate and Vm of different cells types if analysis is restricted to isolated ripples? This analysis is notably important in CA3 where feedback inhibition following a first ripple could lead to hyperpolarization « during » the next ripple.

      We added a new figure (Figure 7 - figure supplement 2) that compares Vm aligned to the onset of isolated single ripples vs. ripple doublets. The pre-ripple depolarization in DG and CA1 is similar for isolated ripples and ripple doublets arguing against the hypothesis that pre-ripple responses are a reflection of ripple bursts.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Royall et al. builds on previous work in the mouse that indicates that neural progenitor cells (NPCs) undergo asymmetric inheritance of centrosomes and provides evidence that a similar process occurs in human NPCs, which was previously unknown.

      The authors use hESC-derived forebrain organoids and develop a novel recombination tag-induced genetic tool to birthdate and track the segregation of centrosomes in NPCs over multiple divisions. The thoughtful experiments yield data that are concise and well-controlled, and the data support the asymmetric segregation of centrosomes in NPCs. These data indicate that at least apical NPCs in humans undergo asymmetric centrosome inheritance. The authors attempt to disrupt the process and present some data that there may be differences in cell fate, but this conclusion would be better supported by a better assessment of the fate of these different NPCs (e.g. NPCs versus new neurons) and would support the conclusion that younger centriole is inherited by new neurons.

      We thank the reviewer for their supportive comments (“…thoughtful experiments yield data that are concise and well-controlled…”).

      Reviewer #2 (Public Review):

      Royall et al. examine the asymmetric inheritance of centrosomes during human brain development. In agreement with previous studies in mice, their data suggest that the older centrosome is inherited by the self-renewing daughter cell, whereas the younger centrosome is inherited by the differentiating daughter cell. The key importance of this study is to show that this phenomenon takes place during human brain development, which the authors achieved by utilizing forebrain organoids as a model system and applying the recombination-induced tag exchange (RITE) technology to birthdate and track the centrosomes.

      Overall, the study is well executed and brings new insights of general interest for cell and developmental biology with particular relevance to developmental neurobiology. The Discussion is excellent, it brings this study into the context of previous work and proposes very appealing suggestions on the evolutionary relevance and underlying mechanisms of the asymmetric inheritance of centrosomes. The main weakness of the study is that it tackles asymmetric inheritance only using fixed organoid samples. Although the authors developed a reasonable mode to assign the clonal relationships in their images, this study would be much stronger if the authors could apply time-lapse microscopy to show the asymmetric inheritance of centrosomes.

      We thank the reviewer for their constructive and supportive comments (“…the study is well executed and brings new insights of general interest for cell and developmental biology with particular relevance to developmental neurobiology….”). We understand the request for clonal data or dynamic analyses in organoids (e.g., using time-lapse microscopy). We also agree that such data would certainly strengthen our findings. However, as outlined above (please refer to point #1 of the editorial summary), this is unfortunately currently not feasible. However, we have explicitly discussed this shortcoming in our revised manuscript and why future experiments (with advanced methodology) will have to do these experiments.

      Reviewer #3 (Public Review):

      In this manuscript, the authors report that human cortical radial glia asymmetrically segregates newly produced or old centrosomes after mitosis, depending on the fate of the daughter cell, similar to what was previously demonstrated for mouse neocortical radial glia (Wang et al. 2009). To do this, the authors develop a novel centrosome labelling strategy in human ESCs that allows recombination-dependent switching of tagged fluorescent reporters from old to newly produced centrosome protein, centriolin. The authors then generate human cortical organoids from these hESCs to show that radial glia in the ventricular zone retains older centrosomes whereas differentiated cells, i.e. neurons, inherit the newly produced centrosome after mitosis. The authors then knock down a critical regulator of asymmetric centrosome inheritance called Ninein, which leads to a randomization of this process, similar to what was observed in mouse cortical radial glia.

      A major strength of the study is the combined use of the centrosome labelling strategy with human cortical organoids to address an important biological question in human tissue. This study is similarly presented as the one performed in mice (Wang et al. 2009) and the existence of the asymmetric inheritance mechanism of centrosomes in another species grants strength to the main claim proposed by the authors. It is a well-written, concise article, and the experiments are well-designed. The authors achieve the aims they set out in the beginning, and this is one of the perfect examples of the right use of human cortical organoids to study an important phenomenon. However, there are some key controls that would elevate the main conclusions considerably.

      We thank the reviewer for their overall support of our findings (“..authors achieve the aims they set out in the beginning, and this is one of the perfect examples of the right use of human cortical organoids to study an important phenomenon…”). We also understand the reviewer’s request for additional experiments/controls that “…would elevate the main conclusions considerably.”

      1) The lack of clonal resolution or timelapse imaging makes it hard to assess whether the inheritance of centrosomes occurs as the authors claim. The authors show that there is an increase in newly made non-ventricular centrosomes at a population level but without labelling clones and demonstrating that a new or old centrosome is inherited asymmetrically in a dividing radial glia would grant additional credence to the central conclusion of the paper. These experiments will put away any doubt about the existence of this mechanism in human radial glia, especially if it is demonstrated using timelapse imaging. Additionally, knowing the proportions of symmetric vs asymmetrically dividing cells generating old/new centrosomes will provide important insights pertinent to the conclusions of the paper. Alternatively, the authors could soften their conclusions, especially for Fig 2.

      We understand the reviewer’s request. As outlined above (please refer to point #1 of the editorial summary), we had tried previously to add data using single cell timelapse imaging. However, due to the size and therefore weakness of the fluorescent signal we had failed despite extensive efforts. According to the reviewer’s suggestion we have now explicitly discussed this shortcoming and softened our conclusions.

      2) Some critical controls are missing. In Fig. 1B, there is a green dot that does not colocalize with Pericentrin. This is worrying and providing rigorous quantifications of the number of green and tdTom dots with Pericentrin would be very helpful to validate the labelling strategy. Quantifications would put these doubts to rest. Additionally, an example pericentrin staining with the GFP/TdTom signal in figure 4 would also give confidence to the reader. For figure 4, having a control for the retroviral infection is important. Although the authors show a convincing phenotype, the effect might be underestimated due to the incomplete infection of all the analyzed cells.

      We have included more rigorous quantifications in our revised manuscript.

      For Figure 1: There are indeed some green speckles that might be misinterpreted as a green centrosome. However, the speckles are usually smaller and by applying a strict size requirement we exclude speckles. To check whether the classifier might interpret any speckles as centrosomes, we manually checked 60 green “dots” that were annotated as centrosome. From these images all green spots detected as centrosome co-localized with Pericentrin signal (Images shown in Author response image 1).

      For Figure 4: as we are comparing cells that were either infected with a retrovirus expressing scrambled or Ninein-targeting shRNA we compare cells that experienced a similar treatment. Besides that, only cells infected with the virus express Cre-ERT2 whereby only the centrosomes of targeted cells were analyzed. Accordingly, we only compare cells expressing scrambled or Ninein-targeting shRNA, all surrounding “wt” cells are not considered.

      Author response image 1.

      Pictures used to test the classifier. Each of the green “dots” recognized by the classifier as a Centriolin-NeonGreen-containing centrosome (green) co-localized with Pericentrin signal (white).

      3) It would be helpful if the authors expand on the presence of old centrosomes in apical radial glia vs outer radial glia. Currently, in figure 3, the authors only focus on Sox2+ cells but this could be complemented with the inclusion of markers for outer radial glia and whether older centrosomes are also inherited by oRGCs. This would have important implications on whether symmetric/asymmetric division influences the segregation of new/old centrosomes.

      That is an interesting question and we do agree that additional analyses, stratified by ventricular vs. oRGCs would be interesting. However, at the time points analysed there are only very few oRGCs present (if any) in human ESC-derived organoids (Qian et al., Cell, 2016). However, we have now added this point for future experiments to our discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Recently, pupil dilation was linked to cholinergic and noradrenergic neuromodulation as well as cortical state dynamics in animal research. This work adds substantially to this growing research field by revealing the temporal and spatial dynamics of pupil-linked changes in cortical state in a large sample of human participants.

      The analyses are thorough and well conducted, but some questions remain, especially concerning unbiased ways to account for the temporal lag between neural and pupil changes. Moreover, it should be stressed that the provided evidence is of indirect nature (i.e., resting state pupil dilation as proxy of neuromodulation, with multiple neuromodulatory systems influencing the measure), and the behavioral relevance of the findings cannot be shown in the current study.

      Thank you for your positive feedback and constructive suggestions. We are especially grateful for the numerous pointers to other work relevant to our study.

      1. Concerning the temporal lag: The authors' uniformly shift pupil data (but not pupil derivative) in time for their source-space analyses (see above). However, the evidence for the chosen temporal lags (930 ms and 0 ms) is not that firm. For instance, in the cited study by Reimer and colleagues [1] , cholinergic activation shows a temporal lag of ~ 0.5 s with regard to pupil dilation - and the authors would like to relate pupil time series primarily to acetylcholine. Moreover, Joshi and colleagues [2] demonstrated that locus coeruleus spikes precede changes in the first derivative of pupil dilation by about 300 ms (and not 0 ms). Finally, in a recent study recording intracranial EEG activity in humans [3], pupil dilation lagged behind neural events with a delay between ~0.5-1.7s. Together, this questions the chosen temporal lags.

      More importantly, Figures 3 and S3 demonstrate variable lags for different frequency bands (also evident for the pupil derivative), which are disregarded in the current source-space analyses. This biases the subsequent analyses. For instance, Figure S3 B shows the strongest correlation effect (Z~5), a negative association between pupil and the alpha-beta band. However, this effect is not evident in the corresponding source analyses (Figure S5), presumably due to the chosen zero-time-lag (the negative association peaked at ~900 ms)).

      As the conducted cross-correlations provided direct evidence for the lags for each frequency band, using these for subsequent analyses seems less biased.

      This is an important point and we gladly take the opportunity to clarify this in detail. In essence, choosing one particular lag over others was a decision we took to address the multi-dimensional issue of presenting our results (spectral, spatial and time dimensions) and fix one parameter for the spatial description (see e.g. Figure 4). It is worth pointing out first that our analyses were all based on spectral decompositions that necessarily have limited temporal resolutions. Therefore, any given lag represents the center of a band that we can reasonably attribute to a time range. In fact, Figure 3C shows how spread out the effects are. It also shows that the peaks (troughs) of low and high frequency ranges align with our chosen lag quite well, while effects in the mid-frequency range are not “optimally” captured.

      As picking lags based on maximum effects may be seen as double dipping, we note that we chose 0.93 sec a priori based on the existing literature, and most prominently based on the canonical impulse response of the pupil to arousing stimuli that is known to peak at that latency on average (Hoeks & Levelt, 1993; Wierda et al. 2012; also see Burlingham et al.; 2021). This lag further agrees with the results of reference [3] cited by the reviewer as it falls within that time range, and with Reimer et al.’s finding (cited as [1] above), as well as Breton-Provencher et al. (2019) who report a lag of ~900 ms sec (see their Supplementary Figure S8) between noradrenergic LC activation and pupil dilation. Finally, note that it was not our aim to relate pupil dilations to either ACh or NE in particular as we cannot make this distinction based on our data alone. Instead, we point out and discuss the similarities of our findings with time lags that have been reported for either neurotransmitter before.

      With respect to using different lags, changing the lag to 0 or 500 msec is unlikely to alter the reported effects qualitatively for low- and high frequency ranges (see Figure 3C), as both the pupil time series as well as fluctuations in power are dominated by very slow fluctuations (<< 1 Hz). As a consequence, shifting the signal by 500 msec has very little impact. For comparison, below we provide the reviewer with the results presented in Figure 4 but computed based on zero (Figure R1) and 500-msec (Figure R2) lags. While there are small quantitative differences, qualitatively the results remain mostly identical irrespective of the chosen lag.

      Figure R1. Figure equivalent to main Figure 4, but without shifting the pupil.

      In sum, choosing one common lag a priori (as we did here) does not necessarily impose more of a bias on the presentation of the results than choosing them post-hoc based on the peaks in the cross-correlograms. However, we have taken this point as a motivation to revise the Results and Methods sections where applicable to strengthen the rationale behind our choice. Most importantly, we changed the first paragraph that mentions and justifies the shift as follows, because original wording may have given the false impression that the cross-correlation results influenced lag choice:

      “Based on previous reports (Hoeks & Levelt, 1993; Joshi et al., 2016; Reimer et al., 2016), we shifted the pupil signal 930 ms forward (with respect to the MEG signal). We introduced this shift to compensate for the lag that had previously been observed between external manipulations of arousal (Hoeks & Levelt, 1993) as well as spontaneous noradrenergic activity (Reimer et al., 2016) and changes in pupil diameter. In our data, this shift also aligned with the lags for low- and high-frequency extrema in the cross-correlation analysis (Figure 3B).”

      Figure R2. Figure equivalent to main Figure 4, but with shifting the pupil with respect to the MEG by 500 ms.

      Related to this aspect: For some parts of the analyses, the pupil time series was shifted with regard to the MEG data (e.g., Figure 4). However, for subsequent analyses pupil and MEG data were analyzed in concurrent 2 s time windows (e.g., Figure 5 and 6), without a preceding shift in time. This complicates comparisons of the results across analyses and the reasoning behind this should be discussed.

      The signal has been shifted for all analyses that relate to pupil diameter (but not pupil derivative). We have added versions of the following statement in the respective Results and Methods section to clarify (example from Results section ‘Nonlinear relations between pupil-linked arousal and band-limited cortical activity’):

      “In keeping with previous analyses, we shifted the pupil time series forward by 930 msec, while applying no shift to the pupil derivative.”

      1. The authors refer to simultaneous fMRI-pupil studies in their background section. However, throughout the manuscript, they do not mention recent work linking (task-related) changes in pupil dilation and neural oscillations (e.g., [4-6]) which does seem relevant here, too. This seems especially warranted, as these findings in part appear to disagree with the here-reported observations. For instance, these studies consistently show negative pupil-alpha associations (while the authors mostly show positive associations). Moreover, one of these studies tested for links between pupil dilation and aperiodic EEG activity but did not find a reliable association (again conflicting with the here-reported data). Discussing potential differences between studies could strengthen the manuscript.

      We have added a discussion of the suggested works to our Discussion section. We point out however that a recent study (Podvalny et al., https://doi.org/10.7554/eLife.68265) corroborates our finding while measuring resting-state pupil and MEG simultaneously in a situation very similar to ours. Also, we note that Whitmarsh et al. (2021) (reference [6]) is actually in line with our findings as we find a similar negative relationship between alpha-range activity in somatomotor cortices and pupil size.

      Please also take into account that results from studies of task- or event-related changes in pupil diameter (phasic responses) cannot be straightforwardly compared with the findings reported here (focusing on fluctuations in tonic pupil size) , due to the inverse relationship between tonic (or baseline) and phasic pupil response (e.g. Knapen et al., 2016). This means that on trials with larger baseline pupil diameter, phasic pupil dilation will be smaller and vice versa. Hence, a negative relation between the evoked change in pupil diameter and alpha-band power can very well be consistent with the positive correlation between tonic pupil diameter and alpha-band activity that we report here for visual cortex.

      In section ‘Arousal modulates cortical activity across space, time and frequencies’ we have added:

      “Seemingly contradicting the present findings, previous work on task-related EEG and MEG dynamics reported a negative relationship between pupil-linked arousal and alpha-range activity in occipito-parietal sensors during visual processing (Meindertsma et al, 2017) and fear conditioning (Dahl et al. 2020).Note however that results from task-related experiments, that focus on evoked changes in pupil diameter rather than fluctuations in tonic pupil size, cannot be directly compared with our findings. Similar to noradrenergic neurons in locus coeruleus (Aston-Jones & Cohen, 2005), phasic pupil responses exhibit an inverse relationship with tonic pupil size (Knapen et al., 2016). This means that on trials with larger baseline pupil diameter (e.g. during a pre-stimulus period), the evoked (phasic) pupil response will be smaller and vice versa. As a consequence, a negative correlation between alpha-band activity in the visual cortex and task-related phasic pupil responses does not preclude a positive correlation with tonic pupil size during baseline or rest as reported here. In line with this, Whitmarsh et al., 2021 found a negative relationship between alpha-activity and pupil size in the somatosensory cortex that agrees with our finding. Although using an event-related design to study attention to tactile stimuli, this relationship occurred in the baseline, i.e. before observing any task-related phasic effects on pupil-linked arousal or cortical activity.”

      In section ‘Arousal modulation of cortical excitation-inhibition ratio’ we have added: “The absence of this effect in visual cortices may explain why Kosciessa et al. (2021) found no relationship between pupil-linked arousal and spectral slope when investigating phasic pupil dilation in response to a stimulus during visual task performance. However, this behavioral context, associated with different arousal levels, likely also changes E/I in the visual cortex when compared with the resting state (Pfeffer et al., 2018).”

      Finally, in the Conclusion we added (note: ‘they’ = the present results): “Further, they largely agree with similar findings of a recent independent report (Podvalny et al., 2021).”

      Related to this aspect: The authors frequently relate their findings to recent work in rodents. For this it would be good to consider species differences when comparing frequency bands across rodents and primates (cf. [7,8]).

      Throughout our Results section we have mainly remained agnostic with respect to labeling frequency ranges when drawing between-species comparisons, and have only reverted to it as a justification for a dimension reduction for some of the presented analysis. Following your comment however, we have phrased the following section in the Discussion, section ‘Arousal modulates cortical activity across space, time and frequencies’, more carefully:

      “The low-frequency regime referred to in rodent work (2—10Hz; e.g., McGinley et al., 2015) includes activity that shares characteristics with human alpha rhythms (3—6Hz; Nestogel and McCormick, 2021; Senzai et al. 2019). The human equivalent however clearly separates from activity in lower frequency bands and,here, showed idiosyncratic relationships with pupil-linked arousal.”

      1. Figure 1 highlights direct neuromodulatory effects in the cortex. However, seminal [9-11] and more recent work [12,13] demonstrates that noradrenaline and acetylcholine also act in the thalamus which seems relevant concerning the interpretation of low frequency effects observed here. Moreover, neural oscillations also influence neuromodulatory activity, thus the one-headed arrows do not seem warranted (panel C) [3,14].

      This is a very good point. First, we would like to note that we have extended on acknowledging thalamic contributions to low-frequency (specifically alpha) effects in response to the Reviewer’s point 11 (‘Recommendations for authors’ section below). Also, we have added a reference to the role of potential top-down (reverse) influences to our Discussion, section ‘Arousal modulates cortical activity across space, time and frequencies’, as follows:

      “Further, we note that our analyses and interpretations focus on arousal-related neuromodulatory influences on cortical activity, whereas recent work also supports a reverse “top-down” route, at least for frontal cortex high-frequency activity on LC spiking activity (Totah et al., 2021).”

      Ultimately, however, we decided to leave the arrows in Figure 1C uni-directional to keep in line with the rationale of our research that stems mostly from rodent work, which also emphasises the indicated directionality. Also, reference [3] is highly interesting for us because it actually aligns with our data: The authors show that a spontaneous peak of high-frequency band activity (>70 Hz) in insular cortex precedes a pupil dilation peak (or plateau) in two of three participants by ~500msec (which mimics a pattern found for task-evoked activity; see their Figure 5b/c). We find a maximum in our cross-correlation between pupil size and high frequency band activity (>64 Hz) that indicates a similar lag (see our Figure 3B). Importantly, both results do not rule out a common source of neuromodulation for the effects. We have added the following to the end of the section ‘An arousal-triggered cascade of activity in the resting human brain’:

      “In fact, Kucyi & Parvizi (2020) found spontaneous peaks of high-frequency band activity (>70 Hz) in the insular cortex of three resting surgically implanted patients that preceded pupil dilation by ~500msec - a time range that is consistent with the lag of our cross-correlation between pupil size and high frequency (>64Hz) activity (see Figure 3B). Importantly, they showed that this sequence mimicked a similar but more pronounced pattern during task performance. Given the purported role of the insula (Menon & Uddin, 2015), this finding lends support to the idea that spontaneous covariations of pupil size and cortical activity signal arousal events related to intermittent 'monitoring sweeps' for behaviourally relevant information.”

      1. In their discussion, the authors propose a pupil-linked temporal cascade of cognitive processes and accompanying power changes. This argument could be strengthened by showing that earlier events in the cascade can predict subsequent ones (e.g., are the earlier low and high frequency effects predictive of the subsequent alpha-beta synchronization?)-

      We added this cascade angle as one possible interpretation of the observed effects. We fully agree that this is an interesting question but would argue that this would ideally be tested in follow-up research specifically designed for that purpose. The suggested analysis would add a post-hoc aspect to our exploratory investigation in the absence of a suitable contrast, while also potentially side-tracking the main aim of the study. We have revised the language in this section and added the following changes (bold) to the last paragraph to emphasise the speculatory aspect, and clarify what we think needs to be done to look into this further and with more explanatory power.

      “The three scenarios described here are not mutually exclusive and may explain one and the same phenomenon from different perspectives. Further, it remains possible that the sequence we observe comprises independent effects with specific timings. A pivotal manipulation to test these assumptions will be to contrast the observed sequence with other potential coupling patterns between pupil-linked arousal and cortical activity during different behavioural states.”

    1. Author Response

      Reviewer #1 (Public Review):

      The study by Akter et al demonstrates that astrocyte-derived L-lactate plays a key role in schema memory formation and promotes mitochondrial biogenesis in the Anterior Cingulate Cortex (ACC).

      The main tool used by the authors is the DREADD technology that allows to pharmacologically activate receptors in a cell-specific manner. In the study, the authors used the DREADD technique to activate appropriately transfected astrocytes, a subtype of muscarinic receptor that is not normally present in cells. This receptor being coupled to a Gi-mediated signal transduction pathway inhibiting cAMP formation, the authors could demonstrate cell-(astrocyte) specific decreases in cAMP levels that result in decreased L-lactate production by astrocytes.

      Behaviorally this pharmacological manipulation results in impairments of schema memory formation and retrieval in the ACC in flavor-place paired associate paradigms. Such impairments are prevented by co-administration of L-lactate.

      The authors also show that activation of Gi signaling resulting in L-lactate decreased release by astrocytes impairs mitochondrial biogenesis in neurons in an L-lactate reversible manner.

      By using MCT 2 inhibitors and an NMDAR antagonist the authors conclude that the molecular mechanisms underlying the observed effects are mediated by L-lactate entering neurons through MCT2 transporters and involve NMDAR.

      Overall, the article's conclusions are warranted by the experimental evidence, but some weak points could be addressed which would make the conclusions even stronger.

      The number of animals in some of the experiments is on the low side (4 to 6).

      In the revised manuscript, we have increased the animal numbers in two key experimental groups (hM4Di-CNO and Control groups) of behavioral experiments. Now the animal numbers in different groups are as follows:

      • 15 rats in hM4Di-CNO group

      o Further divided into two subgroups for probe tests (PT1-4) conducted during flavor-place paired associate training; 8 rats in the hM4Di-CNO (saline) and 7 rats in the hM4Di-CNO (CNO) subgroups receiving I.P. saline or I.P. CNO, respectively, before these PTs.

      • 8 rats in the Control group

      • 7 rats in the Rescue group (hM4Di-CNO+L-lactate)

      • 4 rats in the Control-CNO group. Animal number in this group was not increased as it was apparent from these 4 rats that CNO alone was not impairing the PA learning and memory retrieval in these rats (AAV8-GFAP-mCherry injected). Their result was very similar to the control group. Additionally, in a previous study (Liu et al., 2022), we showed that CNO administration in the rats injected with AAV8-GFAP-mCherry into the hippocampus does not show any impairments in schema.

      Also, in the newly added open field test experiments to investigate the locomotor activity as suggested by the Reviewer #2, 8 rats were used in each group.

      The use of CIN to inhibit MCT2 is not optimal. Authors may want to decrease MCT2 expression by using antisense oligonucleotides.

      In the revised manuscript, we have conducted the experiment using MCT2 antisense oligodeoxynucleotide (ODN) as suggested.

      To test whether the L-lactate-induced neuronal mitochondrial biogenesis is dependent on MCT2, we bilaterally injected MCT2 antisense oligodeoxynucleotide (MCT2-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) or scrambled ODN (SC-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) into the ACC. After 11 hours, bilateral infusion of L-lactate (10 nmol, 1 μl) or ACSF (1 μl) was given into the ACC and the rats were kept in the PA event arena. After 60 mins (12 hours from MCT2-ODN or SC-ODN administration), the rats were sacrificed. As shown in Author response image 1B, SC-ODN+L-lactate group showed significantly increased relative mtDNA copy number compared to the SC-ODN+ACSF group (p<0.001, ANOVA followed by Tukey's multiple comparisons test). However, this effect was completely abolished in MCT2-ODN+L-lactate group, suggesting that MCT2 is required for the L-lactate-induced mitochondrial biogenesis in the ACC.

      We have integrated this new data and results in the revised manuscript.

      Author response image 1.

      Mitochondrial biogenesis by L-lactate is dependent on MCT2 and NMDAR. A. Experimental design to investigate whether MCT2 and NMDAR activity are required for L-lactate-induced mitochondrial biogenesis. B and C. mtDNA copy number abundance in the ACC of different rat groups relative to nDNA. Data shown as mean ± SD (n=4 rats in each group). ***p<0.001, ANOVA followed by Tukey's multiple comparisons test.

      The experiment using AVP to block NMDAR only partially supports the conclusions. Indeed, blocking NMDAR will knock down any response that involves these receptors, whether L-lactate is necessary or not.

      In the current study we found that Astrocytic Gi activation in the ACC reduced L-lactate level in the ECF of ACC which was also associated with decreased PGC-1α/SIRT3/ATPB/mtDNA abundance suggesting downregulation of mitochondrial biogenesis pathway. We also found that exogenous administration of L-lactate into the ACC of astrocytic Gi-activated rats rescued this downregulation. In line with this, in a recently published study (Akter et al., 2023), we found upregulation of mitochondrial biogenesis pathway in the hippocampus neurons of exogenous L-lactate-treated anesthetized rats. Another recent study has demonstrated that exercise-induced L-lactate release from skeletal muscle or I.P. injection of L-lactate can induce hippocampal PGC-1α (which is a master regulator of mitochondrial biogenesis) expression and mitochondrial biogenesis in mice (Park et al., 2021). Together, these results provide compelling evidence that L-lactate promotes mitochondrial biogenesis.

      L-lactate is known to promote expression of synaptic plasticity genes like Arc, c-Fos, and Zif268 in neurons (Yang et al., 2014). After entry into the neuronal cytoplasm, mainly through MCT2, it is converted into pyruvate by lactate dehydrogenase 1 (LDH1). This conversion also produces NADH, affecting the redox state of the neuron. NADH positively modulates the activity of NMDAR resulting in enhanced Ca2+ currents, the activation of intracellular signaling cascades, and the induction of the expression of plasticity-associated genes (Yang et al., 2014; Magistretti & Allaman, 2018). The study demonstrated that L-lactate–induced plasticity gene expression was abolished in the presence of NMDAR antagonists including D-APV (Yang et al., 2014). These results suggested that the MCT2 and NMDAR are key players in the regulation of L-lactate induced plasticity gene expression.

      In the current study, we investigated whether similar mechanisms might be involved in L-lactate-induced neuronal mitochondrial biogenesis. We now used MCT2 antisense oligodeoxynucleotide to decrease the expression of MCT2 (as mentioned in the previous response and Author response image 1B) and showed that MCT2 is necessary for L-lactate-induced mitochondrial biogenesis to manifest, indicating that L-lactate’s entry into the neuron is required. As mentioned before, after entry into neuron, L-lactate is converted into pyruvate by LDH, which also produce NADH, which in turn potentiates NMDAR activity. Therefore, we investigated whether NMDAR activity is required for L-lactate-induced mitochondrial biogenesis. We used D-APV to inhibit NMDAR (Author response image 1C) and found that L-lactate does not increase mtDNA copy number abundance if D-APV is given, suggesting that NMDAR activity is required for L-lactate to promote mitochondrial biogenesis.

      NMDAR serves diverse functions. Therefore, as mentioned by the reviewer, blocking NMDAR may knock down many such functions. While our current data only suggests the involvement of MCT2 and NMDAR in the upregulation of mitochondrial biogenesis by L-lactate, we have not investigated other mechanisms and pathways modulating mitochondrial biogenesis that are either dependent or independent of MCT2 and NMDAR activity. Further studies are needed in future to dissect and better understand this interesting observation. We have now clarified this in the discussion section of the manuscript.

      Is inhibition of glycogenolysis involved in the observed effects mediated by Gi signaling? Indeed, L-lactate is formed both by glycolysis and glycogenolysis. The authors could test whether the glycogen metabolism-inhibiting drug DAB would mimic the effects of Gi activation.

      In this study we have shown that astrocytic Gi activation in the ACC leads to a decrease in the cAMP and L-lactate. L-lactate is produced by glycogenolysis and glycolysis. cAMP in astrocytes acts as a trigger for L-lactate production (Choi et al., 2012; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021; Zhou et al., 2021) by promoting glycogenolysis and glycolysis (Vardjan et al., 2018; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021). Therefore, one promising explanation of reduced L-lactate level observed in our study is the reduction of L-lactate production in the astrocyte due to decreased glycogen metabolism as a result of decreased cAMP. We have now mentioned this in the discussion.

      DAB is an inhibitor of glycogen phosphorylase that suppresses L-lactate production. It was shown to impair memory by decreasing L-lactate (Newman et al., 2011; Suzuki et al., 2011; Iqbal et al., 2023). As we found that the impairment in the schema memory and mitochondrial biogenesis was associated with decreased L-lactate level in the ACC and that the exogenous L-lactate administration can rescue the impairments, it is likely that DAB will mimic the effect of Gi activation in terms of schema memory and mitochondrial biogenesis. However, further study is needed to confirm this.  

      Reviewer #2 (Public Review):

      The manuscript of Akter et al is an important study that investigates the role of astrocytic Gi signaling in the anterior cingulate cortex in the modulation of extracellular L-lactate level and consequently impairment in flavor-place associates (PA) learning. However, whereas some of the behavioral observations and signaling mechanism data are compelling, the conclusions about the effect on memory are inadequate as they rely on an experimental design that does not allow to differentiate acute or learning effect from the effect outlasting pharmacological treatments, i.e. effect on memory retention. With the addition of a few experiments, this paper would be of interest to the larger group of researchers interested in neuron-glia interactions during complex behavior.

      • Largely, I agree with the authors' conclusion that activating Gi signaling in astrocytes impairs PA learning, however, the effect on memory retrieval is not that obvious. All behavioral and molecular signaling effects described in this study are obtained with the continuous presence of CNO, therefore it is not possible to exclude the acute effect of Gi pathway activation in astrocytes. What will happen with memory on retrieval test when CNO is omitted selectively during early, middle, or late session blocks of PA learning?

      We have now added 8 more rats to the hM4Di-CNO group (i.e., the group with astrocytic Gi activation) to clarify the memory retrieval. These rats underwent flavor-place paired associate (PA) training similar to the previously described rats (n=7) of this group, that is they received CNO 30 minutes before and 30 minutes after the PA training sessions (S1-2, S4-8, S10-17). However, contrasting to the previous rats of this group which received CNO before PTs (PT1, PT2, PT3), we omitted the CNO (instead administered I.P. saline) selectively on these PTs conducted at the early, middle, and late stage of PA training, as suggested by the reviewer. These newly added rats did not show memory retrieval in these PTs, suggesting that the rats were not learning the PAs from the PA training sessions. See Author response image 2C-E, where this subgroup is denoted as hM4Di-CNO (Saline).

      We then continued more PA training sessions (S21 onwards, Author response image 2B) for these rats without CNO. They gradually learned the PAs. PTs (PT5, PT6, PT7; Author response image 2G-I) were done during this continuation phase of PA training; once without CNO (i.e., with I.P. saline instead), and another one with CNO. As seen in the Author response image 2H and 2I, they retrieved the memory when PT6 and PT7 were done without CNO. However, if these PTs were done with CNO, they could not retrieve the memory. Together these results suggest that ACC astrocytic Gi activation by CNO during PT can impair memory retrieval in rats which have already learned the PAs.

      As shown in the Author response image 2B, we replaced two original PAs with two new PAs (NPA 9 and 10) at S34. This was followed by PT8 (S35). As seen in Author response image 2J, these rats retrieved the NPA memory if the PT is done without CNO. However, they could not retrieve the NPA memory if the PT was done with CNO. This result suggests that ACC astrocytic Gi activation by CNO during PT can impair NPA memory retrieval.

      In summary, these data show that astrocytic Gi activation in the ACC can impair PA memory retrieval. We have integrated this new data and results in the revised manuscript.

      Author response image 2.

      A. PI (mean ± SD) during the acquisition of the six original PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=8), hM4Di-CNO (n=15), and rescue (hM4Di-CNO+L-lactate) (n=7) groups. From S6 onwards, hM4Di-CNO group consistently showed lower PI compared to control. However, concurrent L-lactate administration into the ACC (rescue group) can rescue this impairment. B. PI (mean ± SD) of hM4Di-CNO group (n=8) from S21 onwards showing gradual increase in PI when CNO was withdrawn. C, D, and E. Non-rewarded PTs (PT1, PT2, and PT3 conducted on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control, hM4Di-CNO, and rescue groups. The percentage of digging time at the cued location relative to that at the non-cued locations are shown (mean ± SD). In both PT2 and PT3, the control group spent significantly more time digging the cued sand well above the chance level, indicating that the rats learned OPAs and could retrieve it. Contrasting to this, hM4Di-CNO group did not spend more time digging the cued sand well above the chance level irrespective of CNO administration before the PTs. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO, indicating that this group learned OPAs and could retrieve it. p < 0.05, p < 0.01, p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. F. Non-rewarded PT4 (S20) which was conducted after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control, hM4Di-CNO, and rescue groups. Results show that the control group spent significantly more time digging the new cued sand well above the chance level indicating that the rats learned the NPAs from S19 and could retrieve it in this PT. Contrasting to this, hM4Di-CNO group did not spend more time digging the new-cued sand well above the chance level irrespective of CNO administration before the PT. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO indicating that this group learned NPAs from S19 and could retrieve it. p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%. G, H, and I. Non-rewarded PTs (PT5, PT6, and PT7 conducted on S23, S27, and S33, respectively) to test memory retrieval of OPAs for the hM4Di-CNO group. In both PT6 and PT7, the rats spent significantly more time digging the cued sand well above the chance level if the tests are done without CNO, indicating that the rats learned the OPAs and could retrieve it. However, CNO prevented memory retrieval during these PTs. p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. J. Non-rewarded PT4 (S35) which was conducted after replacing two OPAs with two NPAs (NPA 9 & 10) in S34 for the hM4Di-CNO group. Results show that the rats spent significantly more time digging the new cued sand well above the chance level if CNO was not given before the PT, indicating that the rats learned the NPAs from S34 and could retrieve it in this PT. However, if CNO is given before the PT, the retrieval is impaired. *p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%.

      • I found it truly exciting that the administration of exogenous L-lactate is capable to rescue CNO-induced PA learning impairment, when co-applied. Would it be possible that this treatment has a sensitivity to a particular stage of learning (acquisition, consolidation, or memory retrieval) when L-lactate administration would be the most efficacious?

      The hM4Di-CNO group, when continued with PA training without CNO (S21-S32) (Author response image 2B), was able to learn the six original PAs (OPAs). In the PT7 done at S33 (Author response image 2I), this group of rats was able to retrieve the memory if the test was done without CNO but could not retrieve the memory if CNO was given. Similarly, the Rescue group (hM4Di-CNO+L-lactate) (Author response image 2A), which received both CNO and L-lactate during PA training sessions (S1-S17), they were able to learn the OPAs. And at PT3 done at S18 (Author response image 2E), these rats were able to retrieve the memory when the test was done with CNO+L-lactate but not if the test is done with only CNO. Together, these results clearly show that ACC astrocytic Gi activation with CNO impairs memory retrieval and exogenous L-lactate can rescue the impairment. Therefore, it can be concluded that the memory retrieval is sensitive to L-lactate.

      The PA learning is hippocampus-dependent. Over the course of repeated PA training, systems consolidation occurs in the ACC, after which the already learned PA memory (schema) becomes hippocampus-independent (Tse et al., 2007; Tse et al., 2011). A higher activation (indicated by expression of c-Fos) in the hippocampus relative to the ACC during the early period of schema development, and the reverse at the late stage was observed in our previous study (Liu et al., 2022). However, rapid assimilation of new PA into the ACC requires simultaneous activation/retrieval of previous schema from ACC and hippocampus dependent new PA learning (Tse et al., 2007; Tse et al., 2011). During new PA learning, increase of c-Fos neurons in both CA1 and ACC was detected (Liu et al., 2022).

      Our hM4Di-CNO group received CNO 30 mins before and after each PA training session in S1-S17 (Author response image 2A). Also, the Rescue group similarly received CNO+L-lactate before and after each PA training session in S1-S17. Therefore, while this study design allowed us to conclude that ACC astrocytic Gi activation impairs PA learning and that exogenous L-lactate can rescue the impairment, it does not allow clear differentiation of the effects of these treatments on memory acquisition and consolidation. Further studies are needed to investigate this.

      • The hypothesis that observed learning impairments could be associated with diminished mitochondrial biogenesis caused by decreased l-lactate in the result of astrocytic Gi-DREADDS stimulation is very appealing, but a few key pieces of evidence are missing. So far, the hypothesis is supported by experiments demonstrating reduced expression of several components of mitochondrial membrane ATP synthase and a decrease in relative mtDNA copy numbers in ACC of rats injected with Gi-DREADDs. L-lactate injections into ACC restored and even further increased the expression of the above-mentioned markers. Co-administration of NMDAR antagonist D-APV or MCT-2 (mostly neuronal) blocker 4-CIN with L-lactate, prevented L-lactate-induced increase in relative mtDNA copy. I am wondering how the interference with mitochondrial biogenesis is affecting neuronal physiology and if it would result in impaired PA learning or schema memory.

      The observation of diminished mitochondrial biogenesis in the astrocytic Gi-activated rats that showed impaired PA learning is exciting. However, our study does not provide experimental data on how mitochondrial biogenesis could be associated with impaired PA learning and schema memory. Results from several previous studies linked mitochondrial biogenesis and its regulators such as PGC-1α and SIRT3 to diverse neuronal and cognitive functions as described in the discussion section of the manuscript. In the revised manuscript, we have provided further discussion as follows to discuss potential mechanisms:

      “In this study, we have demonstrated that ACC astrocytic Gi activation impairs PA learning and schema formation, PA memory retrieval, and NPA learning and retrieval by decreasing L-lactate level in the ACC. Although we have shown that these impairments are associated with diminished expression of proteins of mitochondrial biogenesis, the precise mechanisms of how astrocytic Gi activation affects neuronal functions and schema memory remain to be elucidated. We previously demonstrated that neuronal inhibition in either the hippocampus or the ACC impairs PA learning and schema formation (Hasan et al., 2019). In another recent study (Liu et al., 2022), we showed that astrocytic Gi activation in the CA1 impaired PA training-associated CA1-ACC projecting neuronal activation. Yao et al. recently showed that reduction of astrocytic lactate dehydrogenase A (an enzyme that reversibly catalyze L-lactate production from pyruvate) in the dorsomedial prefrontal cortex reduces L-lactate levels and neuronal firing frequencies, promoting depressive-like behaviors in mice (Yao et al., 2023). These impairments could be rescued by L-lactate infusion. It is possible that the impairment in PA learning and schema observed in our study might have involved a similar functional consequence of reduced neuronal activity in the ACC neurons upon astrocytic Gi activation.

      Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema. Our previous study also showed that ACC myelination is necessary for PA learning and schema formation, and that repeated PA training is associated with oligodendrogenesis in the ACC (Hasan et al., 2019). Oligodendrocytes facilitate fast, synchronized, and energy efficient transfer of information by wrapping axons in myelin sheath. Furthermore, they supply axons with glycolysis products, such as L-lactate, to offer metabolic support (Fünfschilling et al., 2012; Lee et al., 2012). The association of oligodendrogenesis and myelination with schema memory may suggest an adaptive response of oligodendrocytes to enhance metabolic support and neuronal energy efficiency during PA learning. Given the impairments in PA learning observed in the ACC astrocytic Gi-activated rats in the current study, it is reasonable to conclude that the direct metabolic support to axons provided by oligodendrocytes is not sufficient to rescue the schema impairments caused by decreased L-lactate levels upon astrocytic Gi activation. On the other hand, L-lactate was shown to be important for oligodendrogenesis and myelination (Sánchez-Abarca et al., 2001; Rinholm et al., 2011; Ichihara et al., 2017). Therefore, it is tempting to speculate that a decrease in L-lactate level may also impede oligodendrogenesis and myelination, consequently preventing the enhanced axonal support provided by oligodendrocytes and myelin during schema learning. Recently, a study has demonstrated that upon demyelination, mitochondria move from the neuronal cell body to the demyelinated axon (Licht-Mayer et al., 2020). Enhancement of this axonal response of mitochondria to demyelination, by targeting mitochondrial biogenesis and mitochondrial transport from the cell body to axon, protects acutely demyelinated axons from degeneration. Given the connection between schema and increased myelination, it remains an open question whether L-lactate-induced mitochondrial biogenesis plays a beneficial role in schema through a similar mechanism. Nevertheless, our results contribute to the mounting evidence of the glial role in cognitive functions and underscores the new paradigm in which glial cells are considered as integral players in cognitive functions alongside neurons. Disruption of neurons, myelin, or astrocytes in the ACC can disrupt PA learning and schema memory.”

      Reviewer #3 (Public Review):

      Akter et al. investigated how the astroglial Gi signaling pathway in the rat anterior cingulate cortex (ACC) affects cognitive functions, in particular schema memory formation. Using a stereotactic approach they intracranially introduced AAV8 vectors carrying mCherry-tagged hM4Di DREADD (Designer Receptor Exclusively Activated by Designer Drugs) under astrocyte selective GFAP promotor (AAV8-GFAP-hM4Di-mCherry) into the AAC region of the rat brain. hM4Di DREADD is a genetically modified form of the human M4 muscarinic (hM4) receptor insensitive to endogenous acetylcholine but is activated by the inert clozapine metabolite clozapine-N-oxide (CNO), triggering the Gi signaling pathway. The authors confirmed that hM4Di DREADD is selectively expressed in astrocytes after the application of the AAV8 vector by analysing the mCherry signals and immunolabeling of astrocytes and neurons in the ACC region of the rat brain. They activated hM4Di DREADD (Gi signalling) in astrocytes by intraperitoneal administration of CNO and measured cognitive functions in animals after CNO administration. Activation of Gi signaling in astrocytes by CNO application decreased paired-associate (PA) learning, schema formation, and memory retrieval in tested animals. This was associated with a decrease in cAMP in astrocytes and L-lactate in extracellular fluid as measured by immunohistochemistry in situ and in awake rats by microdialysis, respectively. Administration of exogenous L-lactate rescued the astroglial Gi-mediated deficits in PA learning, memory retrieval, and schema formation, suggesting that activation of astroglial Gi signalling downregulates L-lactate production in astrocytes and its transport to neurons affecting memory formation. Authors also show that expression level of proteins involved in mitochondrial biogenesis, which is associated with cognitive functions, is decreased in neurons, when Gi signalling is activated in astrocytes, and rescued when exogenous L-lactate is applied, suggesting the implication of astrocyte-derived L-lactate in the maintenance of mitochondrial biogenesis in neurons. The latter depended on lactate MCT2 transporter activity and glutamate NMDA receptor activity.

      The paper is very well written and discussed. The conclusions of this paper are well supported by the data. Although this is a study that uses established and previously published methodologies, it provides new insights into L-lactate signalling in the brain, particularly in AAC, and further confirms the role of astroglial L-lactate in learning and memory formation. It also raises new questions about the molecular mechanisms underlying astrocyte-derived L-lactate-mediated mitochondrial biogenesis in neurons and its contribution to schema memory formation.

      • The authors discuss astrocytic L-lactate signalling without considering the recently discovered L-lactate-sensitive Gs and Gi protein-coupled receptors in the brain, which are present in both astrocytes and neurons. The use of nonendogenous L-lactate receptor agonists (Compound 2, 3-chloro-5-hydroxybenzoic acid) would clarify the implication of L-lactate receptor signalling in schema memory formation.

      In the revised manuscript, we have included this point in the discussion section to mention the potential role of HCAR1 in schema memory as follows:

      “Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema.”

      • The use of control animals transduced with an "empty" AAV9 vector (AAV8-GFAP-mCherry) compared with animals transduced with AAV8-GFAP-hM4Di-mCherry throughout the study would strengthen the results of this study, since transfection itself, as well as overexpression of the mCherry protein, may affect cell function.

      We thank the reviewer for pointing this. The schema experiment includes a control group (Control-CNO group) of rats injected with AAV8-GFAP-mCherry bilaterally into the ACC. As shown in Author response image 3, after habituation and pretraining, these rats were trained for PA learning similarly to the other groups. Before 30 mins and after 30 mins of each PA training session, they received I.P. CNO. The PA learning, schema formation, memory retrieval, NPA learning and retrieval, and latency (time needed to commence digging at the correct well) were similar to the control group of rats. This result is consistent with our previous study where rats bilaterally injected with AAV8-GFAP-mCherry into CA1 of hippocampus did not show impairments in PA learning and schema formation upon CNO treatment (Liu et al., 2022).

      Author response image 3.

      A. PI (mean ± SD) during the acquisition of the original six PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=6) and control-CNO (n=4) groups. B. Non-rewarded PTs (PT1, PT2, and PT3 done on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control-CNO group. C. Non-rewarded PT4 (S20) which was done after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control-CNO group. D. Latency (in seconds) before commencing digging at the correct well for control and control-CNO groups. Data shown as mean ± SD.

      References

      Abrantes, H. d. C., Briquet, M., Schmuziger, C., Restivo, L., Puyal, J., Rosenberg, N., Rocher, A.-B., Offermanns, S., & Chatton, J.-Y. (2019). The Lactate Receptor HCAR1 Modulates Neuronal Network Activity through the Activation of Gα and Gβγ Subunits. The Journal of Neuroscience, 39(23), 4422-4433. https://doi.org/10.1523/jneurosci.2092-18.2019

      Akter, M., Ma, H., Hasan, M., Karim, A., Zhu, X., Zhang, L., & Li, Y. (2023). Exogenous L-lactate administration in rat hippocampus increases expression of key regulators of mitochondrial biogenesis and antioxidant defense [Original Research]. Frontiers in Molecular Neuroscience, 16. https://doi.org/10.3389/fnmol.2023.1117146

      Bozzo, L., Puyal, J., & Chatton, J.-Y. (2013). Lactate Modulates the Activity of Primary Cortical Neurons through a Receptor-Mediated Pathway. PLoS One, 8(8), e71721. https://doi.org/10.1371/journal.pone.0071721

      Choi, H. B., Gordon, G. R., Zhou, N., Tai, C., Rungta, R. L., Martinez, J., Milner, T. A., Ryu, J. K., McLarnon, J. G., Tresguerres, M., Levin, L. R., Buck, J., & MacVicar, B. A. (2012). Metabolic communication between astrocytes and neurons via bicarbonate-responsive soluble adenylyl cyclase. Neuron, 75(6), 1094-1104. https://doi.org/10.1016/j.neuron.2012.08.032

      Covelo, A., Eraso-Pichot, A., Fernández-Moncada, I., Serrat, R., & Marsicano, G. (2021). CB1R-dependent regulation of astrocyte physiology and astrocyte-neuron interactions. Neuropharmacology, 195, 108678. https://doi.org/https://doi.org/10.1016/j.neuropharm.2021.108678

      Descalzi, G., Gao, V., Steinman, M. Q., Suzuki, A., & Alberini, C. M. (2019). Lactate from astrocytes fuels learning-induced mRNA translation in excitatory and inhibitory neurons. Communications Biology, 2(1), 247. https://doi.org/10.1038/s42003-019-0495-2

      Endo, F., Kasai, A., Soto, J. S., Yu, X., Qu, Z., Hashimoto, H., Gradinaru, V., Kawaguchi, R., & Khakh, B. S. (2022). Molecular basis of astrocyte diversity and morphology across the CNS in health and disease. Science, 378(6619), eadc9020. https://doi.org/10.1126/science.adc9020

      Fünfschilling, U., Supplie, L. M., Mahad, D., Boretius, S., Saab, A. S., Edgar, J., Brinkmann, B. G., Kassmann, C. M., Tzvetanova, I. D., Möbius, W., Diaz, F., Meijer, D., Suter, U., Hamprecht, B., Sereda, M. W., Moraes, C. T., Frahm, J., Goebbels, S., & Nave, K.-A. (2012). Glycolytic oligodendrocytes maintain myelin and long-term axonal integrity. Nature, 485(7399), 517-521. https://doi.org/10.1038/nature11007

      Harris, R. A., Lone, A., Lim, H., Martinez, F., Frame, A. K., Scholl, T. J., & Cumming, R. C. (2019). Aerobic Glycolysis Is Required for Spatial Memory Acquisition But Not Memory Retrieval in Mice. eNeuro, 6(1). https://doi.org/10.1523/ENEURO.0389-18.2019

      Hasan, M., Kanna, M. S., Jun, W., Ramkrishnan, A. S., Iqbal, Z., Lee, Y., & Li, Y. (2019). Schema-like learning and memory consolidation acting through myelination. FASEB J, 33(11), 11758-11775. https://doi.org/10.1096/fj.201900910R

      Herrera-López, G., & Galván, E. J. (2018). Modulation of hippocampal excitability via the hydroxycarboxylic acid receptor 1. Hippocampus, 28(8), 557-567. https://doi.org/https://doi.org/10.1002/hipo.22958

      Horvat, A., Muhič, M., Smolič, T., Begić, E., Zorec, R., Kreft, M., & Vardjan, N. (2021). Ca2+ as the prime trigger of aerobic glycolysis in astrocytes. Cell Calcium, 95, 102368. https://doi.org/https://doi.org/10.1016/j.ceca.2021.102368

      Horvat, A., Zorec, R., & Vardjan, N. (2021). Lactate as an Astroglial Signal Augmenting Aerobic Glycolysis and Lipid Metabolism [Review]. Frontiers in Physiology, 12. https://doi.org/10.3389/fphys.2021.735532

      Ichihara, Y., Doi, T., Ryu, Y., Nagao, M., Sawada, Y., & Ogata, T. (2017). Oligodendrocyte Progenitor Cells Directly Utilize Lactate for Promoting Cell Cycling and Differentiation. J Cell Physiol, 232(5), 986-995. https://doi.org/10.1002/jcp.25690

      Iqbal, Z., Liu, S., Lei, Z., Ramkrishnan, A. S., Akter, M., & Li, Y. (2023). Astrocyte L-Lactate Signaling in the ACC Regulates Visceral Pain Aversive Memory in Rats. Cells, 12(1), 26. https://www.mdpi.com/2073-4409/12/1/26

      Jourdain, P., Rothenfusser, K., Ben-Adiba, C., Allaman, I., Marquet, P., & Magistretti, P. J. (2018). Dual action of L-Lactate on the activity of NR2B-containing NMDA receptors: from potentiation to neuroprotection. Sci Rep, 8(1), 13472. https://doi.org/10.1038/s41598-018-31534-y

      Kofuji, P., & Araque, A. (2021). G-Protein-Coupled Receptors in Astrocyte-Neuron Communication. Neuroscience, 456, 71-84. https://doi.org/10.1016/j.neuroscience.2020.03.025

      Lee, Y., Morrison, B. M., Li, Y., Lengacher, S., Farah, M. H., Hoffman, P. N., Liu, Y., Tsingalia, A., Jin, L., Zhang, P. W., Pellerin, L., Magistretti, P. J., & Rothstein, J. D. (2012). Oligodendroglia metabolically support axons and contribute to neurodegeneration. Nature, 487(7408), 443-448. https://doi.org/10.1038/nature11314

      Licht-Mayer, S., Campbell, G. R., Canizares, M., Mehta, A. R., Gane, A. B., McGill, K., Ghosh, A., Fullerton, A., Menezes, N., Dean, J., Dunham, J., Al-Azki, S., Pryce, G., Zandee, S., Zhao, C., Kipp, M., Smith, K. J., Baker, D., Altmann, D., Anderton, S. M., Kap, Y. S., Laman, J. D., Hart, B. A. t., Rodriguez, M., Watzlawick, R., Schwab, J. M., Carter, R., Morton, N., Zagnoni, M., Franklin, R. J. M., Mitchell, R., Fleetwood-Walker, S., Lyons, D. A., Chandran, S., Lassmann, H., Trapp, B. D., & Mahad, D. J. (2020). Enhanced axonal response of mitochondria to demyelination offers neuroprotection: implications for multiple sclerosis. Acta Neuropathologica, 140(2), 143-167. https://doi.org/10.1007/s00401-020-02179-x

      Liu, S., Wong, H. Y., Xie, L., Iqbal, Z., Lei, Z., Fu, Z., Lam, Y. Y., Ramkrishnan, A. S., & Li, Y. (2022). Astrocytes in CA1 modulate schema establishment in the hippocampal-cortical neuron network. BMC Biol, 20(1), 250. https://doi.org/10.1186/s12915-022-01445-6

      Magistretti, P. J., & Allaman, I. (2018). Lactate in the brain: from metabolic end-product to signalling molecule. Nat Rev Neurosci, 19(4), 235-249. https://doi.org/10.1038/nrn.2018.19

      Margineanu, M. B., Mahmood, H., Fiumelli, H., & Magistretti, P. J. (2018). L-Lactate Regulates the Expression of Synaptic Plasticity and Neuroprotection Genes in Cortical Neurons: A Transcriptome Analysis. Front Mol Neurosci, 11, 375. https://doi.org/10.3389/fnmol.2018.00375

      Netzahualcoyotzi, C., & Pellerin, L. (2020). Neuronal and astroglial monocarboxylate transporters play key but distinct roles in hippocampus-dependent learning and memory formation. Progress in Neurobiology, 194, 101888. https://doi.org/https://doi.org/10.1016/j.pneurobio.2020.101888

      Newman, L. A., Korol, D. L., & Gold, P. E. (2011). Lactate produced by glycogenolysis in astrocytes regulates memory processing. PLoS One, 6(12), e28427. https://doi.org/10.1371/journal.pone.0028427

      Park, J., Kim, J., & Mikami, T. (2021). Exercise-Induced Lactate Release Mediates Mitochondrial Biogenesis in the Hippocampus of Mice via Monocarboxylate Transporters. Front Physiol, 12, 736905. https://doi.org/10.3389/fphys.2021.736905

      Peterson, S. M., Pack, T. F., & Caron, M. G. (2015). Receptor, Ligand and Transducer Contributions to Dopamine D2 Receptor Functional Selectivity. PLoS One, 10(10), e0141637. https://doi.org/10.1371/journal.pone.0141637

      Rangaraju, V., Lauterbach, M., & Schuman, E. M. (2019). Spatially Stable Mitochondrial Compartments Fuel Local Translation during Plasticity. Cell, 176(1), 73-84.e15. https://doi.org/10.1016/j.cell.2018.12.013

      Rinholm, J. E., Hamilton, N. B., Kessaris, N., Richardson, W. D., Bergersen, L. H., & Attwell, D. (2011). Regulation of oligodendrocyte development and myelination by glucose and lactate. J Neurosci, 31(2), 538-548. https://doi.org/10.1523/JNEUROSCI.3516-10.2011

      Sánchez-Abarca, L. I., Tabernero, A., & Medina, J. M. (2001). Oligodendrocytes use lactate as a source of energy and as a precursor of lipids. Glia, 36(3), 321-329. https://doi.org/10.1002/glia.1119

      Suzuki, A., Stern, S. A., Bozdagi, O., Huntley, G. W., Walker, R. H., Magistretti, P. J., & Alberini, C. M. (2011). Astrocyte-neuron lactate transport is required for long-term memory formation. Cell, 144(5), 810-823.

      Tang, F., Lane, S., Korsak, A., Paton, J. F. R., Gourine, A. V., Kasparov, S., & Teschemacher, A. G. (2014). Lactate-mediated glia-neuronal signalling in the mammalian brain. Nature Communications, 5(1), 3284. https://doi.org/10.1038/ncomms4284

      Tauffenberger, A., Fiumelli, H., Almustafa, S., & Magistretti, P. J. (2019). Lactate and pyruvate promote oxidative stress resistance through hormetic ROS signaling. Cell Death Dis, 10(9), 653. https://doi.org/10.1038/s41419-019-1877-6

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      Tse, D., Takeuchi, T., Kakeyama, M., Kajii, Y., Okuno, H., Tohyama, C., Bito, H., & Morris, R. G. (2011). Schema-dependent gene activation and memory encoding in neocortex. Science, 333(6044), 891-895. https://doi.org/10.1126/science.1205274

      Vardjan, N., Chowdhury, H. H., Horvat, A., Velebit, J., Malnar, M., Muhič, M., Kreft, M., Krivec, Š. G., Bobnar, S. T., Miš, K., Pirkmajer, S., Offermanns, S., Henriksen, G., Storm-Mathisen, J., Bergersen, L. H., & Zorec, R. (2018). Enhancement of Astroglial Aerobic Glycolysis by Extracellular Lactate-Mediated Increase in cAMP [Original Research]. Frontiers in Molecular Neuroscience, 11. https://doi.org/10.3389/fnmol.2018.00148

      Vezzoli, E., Cali, C., De Roo, M., Ponzoni, L., Sogne, E., Gagnon, N., Francolini, M., Braida, D., Sala, M., Muller, D., Falqui, A., & Magistretti, P. J. (2020). Ultrastructural Evidence for a Role of Astrocytes and Glycogen-Derived Lactate in Learning-Dependent Synaptic Stabilization. Cereb Cortex, 30(4), 2114-2127. https://doi.org/10.1093/cercor/bhz226

      Wang, J., Tu, J., Cao, B., Mu, L., Yang, X., Cong, M., Ramkrishnan, A. S., Chan, R. H. M., Wang, L., & Li, Y. (2017). Astrocytic l-Lactate Signaling Facilitates Amygdala-Anterior Cingulate Cortex Synchrony and Decision Making in Rats. Cell Rep, 21(9), 2407-2418. https://doi.org/10.1016/j.celrep.2017.11.012

      Yang, J., Ruchti, E., Petit, J. M., Jourdain, P., Grenningloh, G., Allaman, I., & Magistretti, P. J. (2014). Lactate promotes plasticity gene expression by potentiating NMDA signaling in neurons. Proc Natl Acad Sci U S A, 111(33), 12228-12233. https://doi.org/10.1073/pnas.1322912111

      Yao, S., Xu, M.-D., Wang, Y., Zhao, S.-T., Wang, J., Chen, G.-F., Chen, W.-B., Liu, J., Huang, G.-B., Sun, W.-J., Zhang, Y.-Y., Hou, H.-L., Li, L., & Sun, X.-D. (2023). Astrocytic lactate dehydrogenase A regulates neuronal excitability and depressive-like behaviors through lactate homeostasis in mice. Nature Communications, 14(1), 729. https://doi.org/10.1038/s41467-023-36209-5

      Yu, X., Zhang, R., Wei, C., Gao, Y., Yu, Y., Wang, L., Jiang, J., Zhang, X., Li, J., & Chen, X. (2021). MCT2 overexpression promotes recovery of cognitive function by increasing mitochondrial biogenesis in a rat model of stroke. Anim Cells Syst (Seoul), 25(2), 93-101. https://doi.org/10.1080/19768354.2021.1915379

      Zhou, Z., Okamoto, K., Onodera, J., Hiragi, T., Andoh, M., Ikawa, M., Tanaka, K. F., Ikegaya, Y., & Koyama, R. (2021). Astrocytic cAMP modulates memory via synaptic plasticity. Proc Natl Acad Sci U S A, 118(3), e2016584118. https://doi.org/10.1073/pnas.2016584118

      Zhu, J., Hu, Z., Han, X., Wang, D., Jiang, Q., Ding, J., Xiao, M., Wang, C., Lu, M., & Hu, G. (2018). Dopamine D2 receptor restricts astrocytic NLRP3 inflammasome activation via enhancing the interaction of β-arrestin2 and NLRP3. Cell Death Differ, 25(11), 2037-2049. https://doi.org/10.1038/s41418-018-0127-2

    1. Author Response:

      Reviewer #1 (Public Review):

      The physical principles underlying oligomerization of GPCRs are not well understood. Here, authors focused on oligomerization of A2AR. They found that oligomerization of A2AR is mediated by the intrinsically disordered, extramembraneous C-terminal tail. Using experiment and MD simulation, they mapped the regions that are responsible for oligomerization and dissected the driving forces in oligomerization.

      This is a nice piece of work that applies fundamental physical principles to the understanding of an important biological problem. It is a significant finding that oligomerization of A2AR is mediated by multiple weak interactions that are "tunable" by environmental factors. It is also interesting that solute-induced, solvent-mediated "depletion interactions" can be a key driving force in membrane protein-protein interactions.

      Although this work is potentially a significant contribution to the fields of GPCRs and molecular biophysics of membrane proteins in general, there are several concerns that would need to be implemented to strengthen the conclusions.

      1) How reasonably would the results obtained in the micellar environment be translated into the phenomenon in the cell membranes?

      1a) Here authors measured oligomerization of A2AR in detergent micelles, not in the bilayer or cellular context. Although the cell membranes would provide another layer of complexity, the hydrophobic properties and electrostatics of the negatively charged membrane surface may cooperate or compete with the interactions mediated by the C-terminal tail, especially if the oligomerization is mediated by multiple weak interactions.

      The translatability of properties of membrane proteins in detergent micelles to the cellular context is a valid concern. However, this shortcoming applies to all biophysical studies of membrane proteins in non-native environments. Even for membrane proteins reconstituted in liposomes, the question arises whether the artificial lipid composition that differs from that in the human plasma membrane would alter protein properties, especially as surface charges and cholesterol content can impact membrane protein dynamics, association, and stability. In that sense, this question cannot be answered satisfyingly, especially for GPCRs that are notoriously difficult to isolate. However, we can offer some perspectives. The propensity for membrane proteins to associate and oligomerize, if anything, is greater in lipid bilayers compared to that in detergent micelles, while detergent micelles can effectively solubilize membrane protein monomers (Popot and Engelman, Biochem 1990, 29 (17), 4031–4037). Hence, the findings that A2AR readily oligomerizes in detergent micelles and that the degree of oligomerization can be systematically tuned by the C-terminal length of A2AR in the same micellar system suggest that inter-A2AR interactions are modulating receptor oligomerization; we speculate that A2AR oligomers will be present or be enhanced in the lipid bilayer environment. In fact, in the cellular context, it has been shown that A2AR assembles into homodimers at the cell surface in transfected HEK293 cells (Canals et al, J Neurochem 2004, 88, 726–734) and into higher- order oligomers at the plasma membrane in Cath.A differentiated neuronal cells (Vidi et al, FEBS Lett 2008, 582, 3985–3990). Furthermore, C-terminally truncated A2AR has been demonstrated to show no protein aggregation or clustering on the cell surface, a process otherwise observed in the WT form (Burgueno et al, J Biol Chem 2003, 278 (39), 37545–37552). These results provide the research community with a valid starting point to discover factors that control oligomerization of A2AR in the cellular context.

      1b) Related to the point above (1a), I wonder if MD simulation could provide an insight into the role of the lipid bilayer in the inter- or intra-molecular interactions involving the tail. Although the neutral POPC bilayer was employed in the simulation, the tail-membrane interaction may affect oligomerization since the tail is intrinsically disordered and possess a significant portion of nonpolar residues (Fig. S4).

      The reviewer brings up a valid point about the ability for MD simulations to provide insights into the role of membrane-protein interactions. In response to the reviewer, we performed additional analysis focusing on the interactions of the C-terminus with the lipid bilayer. Overall, as the C-terminus is extended, there is a decrease in its interaction with the cytoplasmic leaflet of the membrane (left figure below). More specifically, we find that the C-terminal segment associated with helix 8 (residues 291 to 314) interacts tightly with the membrane, while the rest of the C-terminus (an intrinsically disordered segment) more weakly interacts with the membrane, regardless of truncation (right figure below). As the C-terminus is extended, the inherent conformational flexibility leads to a decrease in the interactions between the protein and the bilayer. We also observe that shorter stretches of the disordered segment do have the ability to interact more closely with the membrane. While these portions include charged residues that can participate in formation of the dimer interface, no general trends are observed. We therefore cannot draw any conclusions regarding the role of C-terminal-membrane interactions on the dimerization of A2AR. What we do know is that the MD simulations presented here should be considered a model study that reveals that the charged and disordered C-terminus of A2AR can account for oligomerization via multiple and weak inter-protomer contacts.

      MD simulations showing (Left) average distance of all C-terminal residues and (right) average per-residue distance from the cytoplasmic membrane of the lipid bilayer.

      2) Ensuring that the oligomer distributions are thermodynamic products.

      Since authors interpret the SEC results on the basis of thermodynamic concepts (driving forces, depletion interactions, etc.), it would be important to verify that the distribution of different oligomeric states is the outcome of the thermodynamic control. There is a possibility that the distribution is the outcome of the "kinetic trapping" during detergent solubilization.

      This is an important question. As we have shown in the manuscript, the A2AR dimer level was found to be reduced in the presence of TCEP (Figure 2B), suggesting that disulfide linkages have a role in facilitating A2AR oligomerization. However, disulfide cross-linking reaction cannot be the sole driving force of A2AR oligomerization because (1) a significant population of A2AR dimer remained resistant to TCEP (Figure 2B), (2) A2AR oligomer levels decreased progressively with the shortening of the C-terminus (Figure 3), and (3) A2AR oligomerization is driven by depletion interactions enhanced with increasing ionic strength (Figure 5).

      To answer whether A2AR oligomer is a thermodynamic or kinetic product, we tested the stability and reversibility of the A2AR monomer and dimer/oligomer population. We used SEC to separate these populations of both the A2AR-WT and A2AR-Q372ΔC variants, then performed a second round of SEC to observe their repopulation, if any. The results are summarized in the figure below, which we will include in the revised manuscript as Figure 5-figure supplement 1.

      We find that the SEC-separated monomers repopulate measurably into dimer/oligomer, with the total oligomer level after redistribution comparable with that of the initial samples for both A2AR WT (initial: 2.87; redistributed: 1.60) and A2AR-Q372ΔC (initial: 1.49; redistributed: 1.40) (Figure 5-figure supplement 1A). This observation indicates that A2AR oligomer is a thermodynamic product with a lower free energy compared with that of the monomer. This is consistent with the results we have shown in the manuscript that the oligomer levels of A2AR-WT are consistent (1.34–2.87; Table S1) and that A2AR oligomerization can be modulated with ionic strengths via depletion interactions (Figure 5).

      Figure S5. The dimer/oligomerization of A2AR is a thermodynamic process where the dimer and HMW oligomer once formed are kinetically trapped. (A) SEC chromatograms of the consecutive rounds of SEC performed on A2AR-WT and Q372ΔC. The first rounds of SEC are to separate the dimer/oligomer population and the monomer population, while the second rounds of SEC are performed on these SEC-separated populations to assess their stability and reversibility. The total oligomer level is expressed relative to the monomeric population in arbitrary units. (B) Energy diagram depicting A2AR oligomerization progress. The monomer needs to overcome an activation barrier (EA), driven by depletion interactions, to form the dimer/oligomer. Once formed, the dimer/oligomer populations are kinetically trapped by disulfide linkages.

      Interestingly, the SEC-separated dimer/oligomer populations do not repopulate to form monomers (Figure 5-figure supplement 1). This observation is, again, consistent with a published study of ours on A2AR dimers (Schonenbach et al, FEBS Lett 2016, 590, 3295–3306). This observation furthermore indicates that once the oligomers are formed, some are kinetically trapped and thus cannot redistribute into monomers.

      We believe that disulfide linkages are likely candidates that kinetically stabilize A2AR oligomers, as demonstrated by their redistribution into monomers only in the presence of a reducing agent (Figure 2B). Taken together, we suggest that A2AR oligomerization is a thermodynamic process (Figure 5-figure supplement 1B), with the monomer overcoming the activation energy (EA) by depletion interactions to repopulate into dimer/oligomer with a slightly lower free energy (given that we see a distribution between the two). Once formed, the redistributed dimer/oligomer populations can be kinetically stabilized by disulfide linkages.

      3) The claim that the C-terminal tail is engaged in "cooperative" interactions is too qualitative (p. 11 line 274, p.12 line 279 and p.18 line 426).

      This claim seems derived from Fig. 3b and Figs. 4b-c. However, the gradual decrease in the dimer level and the number of interactions may indicate that different parts in the C-terminal tail contribute to dimerization additively rather than cooperatively. The large decrease in the number of interactions may stem from the large decrease in the length (395 to 354). Probably, a more quantitative measure would be the number of interactions (H-bonds/salt bridges) normalized to the tail length upon successive truncation. Even in that case, the polar/charged residues would not be uniformly distributed along the primary sequence, making the quantitative argument of cooperativity challenging.

      The request to clarify our basis to refer to a cooperative interaction is well taken. Figure 4B and 4C show that the truncation of one part of the C-terminus (segment 335–394) leads to a reduction in contacts of a different part (segment 291–334) of A2AR. Therefore, we conclude that the binding interactions that occur in segment 291–334 are altered by the interactions exerted by the segment 335–394. This characteristic is consistent with allosteric interactions. We believe that characterizing these interactions as “cooperative” is possible but is not fully justified in this work. We also agree with the comment that quantifying the role and segments involved in contacts would be challenging. The manuscript has been amended to use the term “allosteric” in place of “cooperative”.

      4) On the compactness and conformation of the C-terminal tail:

      Although the C-terminal tail is known as "intrinsically disordered", the results seem to indicate that its conformation is rather compact (or collapsed) with a number of intra- and intermolecular polar interactions (Fig. 4) and buried nonpolar residues (Fig. 6), which are subject to depletion interactions (Fig. 5). This raises a question if the tail indeed "intrinsically disordered" as is known. Recent folding studies on IDPs (Riback et al. Science 2017, 358, 238-; Best, Curr Opin Struct Biol 2020, 60, 27-) suggest that IDPs are partially expanded or expanded rather than collapsed.

      We agree that our results seem to suggest that the conformation of the C-terminus could be partially compact. However, by stating that the C-terminus on average is an intrinsically disordered region (IDR), we do not exclude the possibility of partially structured regions, or greater compactness than that of an excluded volume polymer. IDR or IDP should refer to all proteins or protein regions that do not adopt a unique structure. By that standard, we know that the C-terminus of A2AR falls into that category according to our experiments and MD simulation, as well as the literature. In isolation, the majority the C-terminus is indeed an IDR, as has been demonstrated not only by simulations but also by experimental data. In fact, the C-terminus exhibits partial alpha-helical structure, and transiently populates beta-sheet conformations, depending on its state and buffer conditions (Piirainen et al, Biophys J 2015, 108 (4), 903–917). The literature studies suggest that A2AR’s C-terminus may adopt a greater level of compactness when interactions are formed between the C-terminus and the rest of the A2AR oligomer.

      Reviewer #2 (Public Review):

      The authors expressed A2A receptor as wild type and modified with truncations/mutations at the C-terminus. The receptor was solubilized in detergent solution, purified via a C-terminal deca-His tag and the fraction of ligand binding-competent receptor separated by an affinity column. Receptor oligomerization was studied by size exclusion chromatography on the purified receptor solubilized in a DDM/CHAPS/CHS detergent solution. It was observed that truncation greatly reduces the tendency of A2A to form dimers and oligomers. Mechanistic insights into interactions that facilitate oligomerization were obtained by molecular simulations and the study of aggregation behavior of peptide sequences representing the C-terminus of A2A. It is concluded that a multitude of interactions including disulfide linkages, hydrogen bonds electrostatic- and depletion interactions contribute to aggregation of the receptor.

      The general conclusions appear to be correct and the paper is well written. This is a study of protein association in detergent solution. It is conceivable that observations are relevant for A2A receptors in cell membranes as well. However, extrapolation of mechanisms observed on receptor in detergent micelles to receptor in membranes should proceed with caution. In particular, the spatial arrangement of oligomerized receptor molecules in micelles may differ from arrangement in lipid bilayers. The lipid matrix may have a profound influence on oligomerization.

      The ultimate question to answer is how oligomerization alters receptor function. This will have to be addressed in a future study.

      We could not agree more. We address the concern regarding the translatability of properties of membrane proteins in detergent micelles to the cellular context in our response to Reviewer 1. In short, we believe the general propensity for A2AR to form dimers/oligomers and the role of the C-terminus will hold in the cellular context. However, even if it does not, given that biophysical structure-function studies of GPCRs are conducted in detergent micelles and other artificial environments, it is critical to understand the role of the C-terminus in the oligomerization of reconstituted A2AR in detergent micelles. How oligomerization alters receptor function is a question that is always on our mind and should be the the focus of future studies. Indeed, it has been demonstrated that truncation of the A2AR C-terminus significantly reduces receptor association with Gαs and cAMP production in cellular assays (Koretz et al, Biophys J 2021, https://doi.org/10.1016/j.bpj.2021.02.032). The results presented in this manuscript, which have demonstrated the impact of C-terminal truncation on A2AR oligomerization, will offer critical understanding for such study of the functional consequences of A2AR oligomerization.

      Reviewer #3 (Public Review):

      The work of Nguyen et al. demonstrates the relevant role of the C-terminus of A2AR for its homo-oligomerization. A previous work (Schonenbach et al. 2016) found that a point mutation of C394 in the C-terminus (C394S) reduces homo-oligomerization. Following this direction, more mutants were generated, the C-terminus was also truncated at different levels, and, using size-exclusion chromatography (SEC), the oligomerization levels of A2AR variants were assessed. Overall, these experiments support the role of the C-terminus in the oligomerization process. MD studies were performed and the non-covalent interactions were monitored. To 'identify the types of non-covalent interaction(s)', A2AR variants were also analysed modulating the ionic strength from 0.15 to 0.95 M. The C-terminus peptides were investigated to assess their interaction in absence of the TM domain.

      The SEC results on the A2AR variants strongly support the main conclusion of the paper, but some passages and methodologies are less convincing. The different results obtained for dimerization and oligomerization are low discussed. The MD simulations are performed on models that are not accurately described - structural information currently available may compromise the quality of the model and the validity of the results (i.e., applying MD simulations to low-resolution models may not be appropriate for the goal of this analysis, moreover the formation of disulfide bonds cannot be simulated but this can affect the conformation and consequently the interactions to be monitored). Although the C-terminus is suggested as 'a driving factor for the oligomerization', the TM domain is indeed involved in the process and if and how it will be affected by modulating the solvent ionic strength should be discussed.

      We thank the reviewer for the overall positive assessment and critical input. We will respond to the comments as followed.

      The qualitative trend for dimerization is consistent with that for oligomerization, as demonstrated in Figs. 2A, 3B, and 5. For example, a reduction in both dimerization and oligomerization was observed upon C394X mutations (Figure 2A), as well as upon systematic truncations (Figure 3B), while very similar trends were seen for the change in the dimer and oligomer levels of all four constructs upon variation of ionic strength (Figure 5).

      We agree that the experimental observation and MD simulation only incompletely describe the state of the A2AR dimer/oligomer. For example, we discover the impact of ERR:AAA mutations of the C-terminus (Figure 3C) on oligomer formation, but do not know whether this segment interacts with the TM domain or C-terminus of the neighboring A2AR. MD simulations suggest that the inter-protomer interface certainly involves inter-C-termini contact. We also mention that the A2AR oligomeric interfaces could be asymmetric, suggesting that the C-terminus can interact with other parts of the receptor, including the TM domain. However, we do not have evidence that the TM domain directly interact with each other to stabilize A2AR oligomers, and thus cannot discuss the effect of the solvent ionic strength on how the TM domain contributes to A2AR oligomerization. We minimize such discussion in our manuscript because we have incomplete insights. What we can say is that multiple and weak inter-protomer interactions that contribute to the dimer and oligomer interface formation prominently involve the C-terminus. Ultimately, the structure of the A2AR dimer/oligomer needs to be solved to answer the reviewer’s question fully.

      With respect to the validity of our model, we restricted ourselves to using the best-available X-ray crystal structure for A2AR. Since this structure (PDB 5G53) does not include the entire C-terminus, we resorted to using homology modeling software (i.e., MODELLER) to predict the structures of the C-terminus. In our model, the first segment of the C-terminus consisting of residues 291 to 314 were modeled as a helical segment parallel to the cytoplasmic membrane surface while the rest of the C-terminus was modeled as intrinsically disordered. MODELLER is much more accurate in structural predictions for segments less than 20 residues. This limitation necessitated that we run an equilibrium MD simulation for 2 µs to obtain a well-equilibrated structure that possesses a more viable starting conformation. We have included this detailed description of our model in lines 641–650. To validate our models of all potential variants of A2AR, we calculated the RMSD and RMSF for each truncated variant. Our results clearly show that the transmembrane helical bundle is very stable, as expected, and that the C-terminus is more flexible (see figure below). This flexibility is somewhat consistent for lengths up to 359 residues, with a more noticeable increase in flexibility for the 394-residue variant of A2AR.

      Root mean square fluctuation (RMSF) from sample trajectories of truncated variants modeled from the crystal structure of the adenosine A2AR bound to an engineered G protein (PDB ID 5G53), and the root mean square deviation (RMSD) of the C-terminus of each variant starting from residue 291.

    1. Author Response:

      Reviewer #1 (Public Review):

      Wang et al., investigated the role of RNA m6A modification in intestinal epithelial cells (IECs) in the context of rotavirus infection. The authors found that the mice which specifically lacks METTL3 in IECs show resistance to rotavirus infection. They attributed this effect to increased IFN and ISG expression presumably via IRF7 upregulation. Further genetic IRF7 ablation in IECs led to the sensitivity rotavirus infection. They also found that ALKBH5 is suppressed by a rotaviral protein, although the knockout of ALKBH5 in IECs did not influence viral infection.

      Overall, although the resistance of IEC-specific METTL3-deficient mice upon rotavirus infection via the control of IRF7 is a novel and interesting finding, the proposed model is not fully supported by the findings here. Especially, the following points need to be addressed:

      We are grateful to the reviewer for the complimentary summary of our research. We also appreciate the valuable experiments suggested by the reviewer to improve our manuscript. We have added additional important controls and mechanistic data to further support our conclusions.

      1) The m6A dot blot used in Figure 1 is not a good measurement system of total m6A modification levels, because the antibody used here also detects other RNA modification, m6Am (PMID: 31676230). Therefore, it is unclear if the increase of m6A dot blot intensity is due to the increase of m6A in RNAs mediated by METTL3 in IECs. The authors should investigate the m6A levels in IECs, not BMDMs, under METTL3 deficiency. Ideally, this analysis should be done using mass spectrometry.

      We thank the reviewer for raising a critical point. We have tried several methods to avoid the potential non-specific detection of the previous antibody (Synaptic System, #202003) we used, which was reported to detect m6Am as well.

      1.We have included Dot Blot data for m6A modification in Mettl3^△IEC and WT IECs during RV infection by using another m6A antibody (Anti-N6-methyladenosine (m6A), Sigma-Aldrich, Cat. No. ABE572-I). (see below and also Fig. 1d, 1e)

      2.We have included mass spectrometry data for m6A modification in IECs during development (see below and also Fig. 1c) or RV infection (see below and also Fig. s3a).

      These data suggested m6A modifications in IECs are indeed regulated during the development or RV infection. We have included the descriptions in the text.

      Figure 1. Rotavirus infection increases global m6A modifications, and Mettl3 deficiency in intestinal epithelial cells results in increased resistance to rotavirus infection. (c) MS analysis of m6A level in ileum tissue from mice with different ages. (mean ± SEM), Statistical significance was determined by Student’s t-test (*P < 0.05, NS., not significant). (d) WT and Mettl3^△IEC mice were infected by rotavirus EW strain at 8 days post birth. m6A dot blot analysis of total RNA in ileum IEC at 2 dpi. Methylene blue (MB) staining was the loading control. (e) Quantitative analysis of (d) (mean ± SEM). Statistical significance was determined by Student’s t-test (*P < 0.05, ***P<0.001, NS., not significant). The quantitative m6A signals were normalized to quantitative MB staining signals.

      Figure s3. MS analysis of total m6A level in mice ileum. (a) WT and Mettl3 △IEC mice were infected by rotavirus EW strain at 8 days post birth. MS analysis of m6A level in ileum tissue from mice at 2 dpi (mean ± SEM), Statistical significance was determined by Student’s t-test (**P < 0.005)

      2) The authors show that Alkbh5 expression is increased when the mice grow up to 3 weeks old. However, the Alkbh5 protein expression changes are missing.

      We thank the reviewer for raising this point. We have included the protein expression of ALKBH5 in intestine during the development (see below and Fig. s1). The ALKBH5 protein levels are increased in the intestine along with the age (Fig. s1a, s1b), which is consistent to the changes of mRNA levels of ALKBH5 during the development (Fig. 1d).

      Figure s1. ALKBH5 regulate total m6A level in intestine. (a) Immunoblotting with antibodies target ALKBH5 and TUBULIN in ileum tissues from mice with different ages. (b) Quantitative analysis of (a) (mean ± SEM), Statistical significance was determined by Student’s t-test (*P < 0.05, NS., not significant).

      3) The authors claim that m6A declined from 2 to 2 weeks post birth is caused by increased Alkbh5 (Line 110). However, it is not clear if the subtle increase in Alkbh5 mRNA leads to the change in global m6A levels. The author can use ALKBH5-deficient mouse cells to confirm this point.

      We thank the reviewer for pointing out an important point. We have included the ALKBH5 over-expression or knock-down data in a mouse IEC cell line MODE-K, to test whether the regulation of Alkbh5 mRNA in IECs leads to the change in global m6A levels.

      Over-expression of ALKBH5 in MODE-K cells largely reduced the global m6A level (see below and Fig. s1d). 1. Crispr-mediated knock down of ALKBH5 in MODE-K cells augmented the global m6A level while knock down of another m6A eraser FTO in MODE-K cells didn’t affect the global m6A level (see below and Fig. s10b).

      Figure s1. ALKBH5 regulate total m6A level in intestine. (d) Immunoblotting with antibodies target ALKBH5 and TUBULIN in MODE-K cells transfected with pSIN-EV or pSIN-mAlkbh5-3xFlag for 24h. m6A dot blot analysis of total RNA in indicated samples. Methylene blue (MB) staining was the loading control.

      Figure s10. Alkbh5 is the dominant m6A eraser in intestine. (b) m6A dot blot analysis of total RNA in different MODE-K cells. Methylene blue (MB) staining was the loading control.

      4) The authors should describe the overall phenotype of IEC-specific METTL3-deficient mice at the steady state. It is important to clarify if the augmented expression of ISG upon METTL3 deficiency is dependent on rotavirus infection. Also, the authors should describe any detectable abnormalities or changes without stimulation.

      We actually collaborated another group and found there is a defect in intestinal stem cells in IEC-specific METTL3-deficient mice. However, as RV normally infected IECs in the villi but not in the crypt, and stem cells are not the major producers of IFN/ISGs (Sue E. Crawford et al. Nature reviews disease primers, 2017). The defect in intestinal stem cells will less likely affect the RV infection phenotype. As it is another story that are under review, we tend to not include this part of the data in our manuscript. Moreover, we have crossed Irf7^−/− mice to Mettl3^ΔIEC mice and verified Irf7 mediated induction of ISGs is critical for the anti-viral phenotype in Mettl3^ΔIEC mice.

      Our bulk RNA-seq data in IECs showed the augmented expression of ISGs upon METTL3 deficiency in steady state (Fig. 2a). We also found an augmented ISG expression in intestine of METTL3-deficient mice in steady state or early infection of RV (2d) by qPCR. However, as the RV loads in METTL3-deficient mice during the late infection stage are significantly lower than WT mice, thus the inducible ISGs expressions are consequently lower in intestine of METTL3-deficient mice than WT mice in day 4 post infection (Fig. 3f).

      5) The finding that IRF7 is targeted by METTL3 is not convincing. First, the authors performed MeRIP-seq and -qPCR experiments only using RNAs from wild-type IECs not from METTL3-deficient cells. It is necessary to show that the modification levels on IRF7 mRNA is indeed reduced upon METTL3 deficiency. Second, it is unclear if MeRIP-seq is properly performed or not, because there is no quality checking figure shown. For instance, the authors can generate metagene plots or gene logos of m6A modified sites to see if there is any consistency with previous reports. Third, in Figure 2h, the authors should show that the change in luciferase activity between wild-type and mutant Irf7-3'UTR reporters is dependent on METTL3 activity by performing METTL3 knockdown or knockout. Also, the authors should describe how they mutagenize the sequences for clarification. Fourth, in Figures 2F and 3C, they showed that IRF7 is upregulated in METTL3-deficient IECs while in Figure 3F, IRF7 is conversely downregulated in METTL3-deficient IECs. This is apparently contradictory to each other.

      We appreciate the valuable suggestion provided by the reviewer to improve our manuscript.

      1. We have done RIP-qPCR in Mettl3 knock-down and WT MODE-K cells to verify the m6A modification on IRF7 mRNA, the modification levels on IRF7 mRNA is indeed reduced upon METTL3 deficiency (see below and Fig. s5c, s5d). We have added the description of the experiment in the manuscript.

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (c) m6A-RIP-qPCR confirms Irf7 as an m6A-modified gene in IECs. Fragmented RNA of sgEV and sgMettl3 MODE-K cells was incubated with an anti-m6A antibody (Sigma Aldrich ABE572-I). The eluted RNA and input were processed as described in ‘RT-qPCR’section, the data were normalized to the input samples (n=3, mean ± SEM, Statistical significance was determined by Student’s t-test (*P < 0.05, **P < 0.005, NS., not significant). Tlr3 and Rps14 were measured with m6A sites specific qPCR primer as positive control and negative control, Irf7 was measured with predicted m6A sites specific qPCR primers. (d) Knock down efficiency of METTL3 in MODE-K cells.

      1. We have performed metagene plots as suggested. As shown in figure s5b, the m6A peak is enriched near the stop codon and 3’UTR region, which is consistent with previously study (Xuan et al. 2018; Dominissini et al., 2012; Yang et al., 2019). We have added the description in the manuscript.

      Figure s5. Characterization of m6A modfications on Irf7 mRNA. (b) Metagene plots of m6A modified sites.

      1. We have performed the luciferase assay in WT and METTL3 knockdown 293t cell, and found increased luciferase activity in mutant Irf7-3'UTR reporters is dependent on METTL3 activity (see below and fig. 2h, s5e). We have added the description of the experiment into the manuscript.

      Figure 2. Mettl3 deficiency in intestinal epithelial cells results in decreased m6A deposition on Irf7, and increased interferon responses. (h) Relative luciferase activity of sgEV and sgMettl3 HEK293T cells transfected with pmirGLO-Irf7-3’UTR (Irf7-WT) or pmirGLO-Irf7-3’UTR containing mutated m6A modification sites (Irf7-MUT). The firefly luciferase activity was normalized to Renilla luciferase activity (n=3, mean ± SEM). Statistical significance was determined by Student’s t-tests between genotypes (*P < 0.05, NS., not significant).

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (e) Knock down efficiency of METTL3 in 293t cells used for luciferase assay.

      1. IRF7 is an ISG. The expression of IRF7 is controlled by both PAMP (such as virus component)-induced transcription and post-transcriptional regulation like m6A modification mediated mRNA decay. In steady state or early stage (2d) of rotavirus infection, there is no virus or the viral loads is comparable in both Mettl3^△IEC mice and WT mice, thus, IRF7 expression is mainly regulated by m6A and is higher in IECs from Mettl3^△IEC mice in comparison with that from WT mice. However, as the RV loads in Mettl3^△IEC mice during the late infection stage are significantly lower than WT mice, in this case, IRF7 expression is mainly regulated by the PAMP from virus, thus the inducible IRF7 expressions is consequently lower in intestine of Mettl3^△IEC than WT mice in day 4 post infection (Fig. 3f).

      6) It is unclear if the augmented expression of IRF7 per se upregulates IFN and ISG expression. Since IRF7 exerts its transcriptional activity upon phosphorylation, the authors should examine IRF7 phosphorylation and total protein levels in METTL3-deficient IECs. Also, it is interesting to see if the phosphorylation of TBK1 is augmented or not.

      We have provided the phosphorylation and total protein levels of IRF7 and TBK1 in MODE-K cells treated with poly I:C. Both total IRF7 and phosphorylated IRF7 are upregulated in Mettl3-knock down cells compare to control cells (see below and Fig s5f). However, Both total TBK1 and phosphorylated TBK1 remain unchanged (Fig s5f), suggesting the augmented ISGs are less likely due to the activation of the upstream signal of IFN.

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (f) Western blot analysis of sgEV and sgMettl3 MODE-K cells transfected by lipo3000 with 2ug/ml poly I:C at indicated hours post transfection, at least three replicate experiments were performed.

      7) In Figure 3, the authors utilized METTL3 and IRF7 deficient mice to show the contribution of METTL3-mediated IRF7 regulation in rotavirus infection. However, if IRF7 is totally abrogated, IFN production should be greatly impaired as shown in Figure 3A. Thus, it is not surprising to see that the IFN response is diminished. The authors can use heterozygous IRF7 deficient mice instead to check if upregulation of IRF7 under METTL3 deficiency is critical to control rotavirus infection.

      We thank the reviewer for pointing out an important issue. However, we checked the IRF7 expression levels in IECs from Irf7^+/+ , Irf7^+/- and Irf7^-/- mice and found that there is no difference between IRF7 levels in IECs from Irf7^+/- mice and that in IECs from Irf7^+/+ mice. Thus, it is not feasible to use heterozygous IRF7 deficient mice to test the idea (Supporting Figure 1).

      Supporting Figure 1. WT and Irf7 Heterozygous mice show same IRF7 expression level in IECs. (a) IECs from 2-weeks-old Irf7^+/+ , Irf7^+/-, Irf7^-/- mice were isolated. Western blot analysis show IRF7 expression level in different mice. (b) Quantitative analysis of (a) (mean ± SEM), statistical significance was determined by Student’s t-test ( ***P < 0.001, NS., not significant).

      8) Given no effect of ALKBH5 knockout on rotavirus infection as shown in Figure 4, it is questionable if ALKBH5 has a profound role in the regulation of m6A in IECs. The authors should determine if m6A modification levels are increased in IECs under ALKBH5 deficiency.

      We performed the m6A dot blot assay to detect m6A modification levels in ALKBH5-knock down MODE-K cells and we do find an increase of m6A modification level under ALKBH5 deficiency (see above and Fig s10). No effect of ALKBH5 knockout on rotavirus infection actually puzzled us as well before (Fig.4c, 4d and 4e), until we found RV infection down-regulated ALKBH5 expression in the intestine of WT mice (Fig.4a).

    1. Author Response

      Reviewer #1 (Public Review):

      This work raises the question of how in plane forces generated at the apical surface of an epithelial cell sheet cause out of plane motion, an important morphogenetic motif. To address this question, a new ontogenetic dominant negative rho1 tool, based on the cry2-CIBN system is presented. The authors use this tool to analyze the well studied biophysical process of ventral furrow formation, and dissect the spatiotemporal requirement of rho1 signaling to modulate myosin accumulation. They separate the effect on morphogenesis into an early phase that becomes significantly slowed down by myosin inhibition, and a late phase where the kinetics is comparable to wild type despite treatment. For interpretation of the data, an older model of cell mechanics treating tissue as a purely elastic material is presented. It fails to reproduce the observations. As a modification, in analogy to buckling of a thin beam under load, a compressive stress exerted by the adjacent ectoderm is introduced. Further analysis of cell behaviors in response to various laser mediated tissue manipulations is presented as support of the proposed mechanism.

      Overall, the manuscript addresses an important aspect of morphogenesis. In particular the use of optogenetic tools promises new insights that might be more challenging to achieve with traditional mutant analysis. However, reservations remain with respect to (1) rigor of the analysis, and (2) interpretation and quality of the data in support of the proposed mechanism; this applies in particular to presentation of biophysical observations, including experiment and simulations.

      The manuscript adds valuable quantitative data, in particular the findings described in Fig 2ab. However, insufficient analysis are performed to fully support the claims of the manuscript by the data presented.

      (I) The manuscript proposes an elasticity based model of tissue mechanics, but provides no experimental evidence in support of this assumption. Many rheology studies performed in a wide range of specimen (including the Drosophila embryo) found a separation of time scales, that shows elasticity is a good approximation of tissue mechanics only for time scales short compared to the process studied here.

      We agree with the reviewer that an elasticity-based model of tissue mechanics is a simplification for the actual tissue properties in the real embryos. To provide justification for this simplification, in the revised manuscript, we have cited a previous biophysical study measuring tissue viscoelasticity in early Drosophila embryos (Doubrovinski et al., 2017). Using a magnetic tweezers-based approach, Doubrovinski et al. shows that the lower bound of the decay time of the elastic response is four minutes (the lower limit on the timescales where tissue behaves elastically). In addition, when history dependence of the response is considered, the decay time increases to nine minutes, which is close to the duration of ventral furrow formation (~ 15 – 20 minutes). Therefore, we consider elasticity is a reasonable approximation of tissue mechanics during ventral furrow formation. The elasticity assumption has been widely used in the previously published modeling work to simulate ventral furrow formation (Allena et al., 2010; Conte et al., 2009; Gracia et al., 2019; Heer et al., 2017; Hocevar Brezavšček et al., 2012; Muñoz et al., 2007; Rauzi et al., 2015).The modeling framework used in our current study, which is initially described in Polyakov et al. 2014, successfully predicts the intermediate and final furrow morphologies with a minimal set of active and passive forces without prescribing individual cell shape changes. It is therefore advantageous to use this model to explore the main novel aspect of the folding mechanics underlying ventral furrow formation. We show that the model can recapitulate the binary tissue response to acute myosin inhibition. In addition, it accurately predicts the intermediate furrow morphology at the transitional state and several other morphological properties associated with myosin inhibition. We therefore believe that this minimalistic model captures the central aspect of the physical mechanism underlying mesoderm bistability observed in the experiments.

      (II) The manuscript uses a method of micro-dissection to soften cells, but does not provide a clear definition of the concept softening, provides no rational for the methods functioning, and does not provide independent validation. The described treatment might affect cells in many alternative ways to the offered interpretation. This data is the central experimental evidence given in support of the proposed ectoderm compression mechanism, and therefore it is essential to provide a precise physical explanation of the method, and validation of measurements that bolster the conclusion.

      We apologize for not explaining the meaning of “softening” clearly in our original manuscript and the rationale for using laser ablation to detect compression. By “softening”, we meant to describe the mechanical status of the cell when the subcellular structures that normally support the mechanical integrity (e.g., cortical actin) are disrupted. We reason that when such a change in mechanical properties happens in a specific region of a tissue that is under compression, the cells in this region should have an impaired ability to resist compression from outside of the region and thereby cause the region to shrink.

      Laser ablation has been widely used to measure tensile stresses in cells and tissues by disruption of cells or subcellular structures. The method we used is adapted from previous described protocols, where a femtosecond near infrared laser is used to disrupt subcellular structures for detection of tissue tension (Rauzi et al., 2015; Rauzi et al., 2008).It has been shown that when laser intensity is properly controlled, the treatment can leave the plasma membrane intact but disrupt subcellular structures associated with the plasma membrane, such as adherens junctions and the cortical actomyosin networks (Rauzi et al., 2015; Rauzi et al., 2008).Using a femtosecond near infrared laser, we were able to ablate embryonic tissues that are under tension and observe tissue recoil after laser ablation, suggesting that our approach has disrupted the cortical cytoskeleton in the laser treated region (e.g., Figure 3 and Authors’ Response Figure 1). In these experiments, the lack of damage on the plasma membrane is indicated by the readily recovery of the plasma membrane signal after laser treatment, as well as the lack of bright burn marks on the tissue.

      As we noted before, we reasoned that if tissue is compressive, similar laser treatment that generates tissue recoil in tissues under tension should result in tissue shrinking within the laser-treated region. The data presented in our original manuscript demonstrate that tissue shrinking is not a non-specific response to our laser treatment – we did not observe such a response when we treat the tissue during cellularization or within the first five minutes of gastrulation, although identical experimental conditions were used (Original Figure 4). We have also obtained additional evidence that supports the use of tissue shrinking as a readout of tissue compression. We tested our laser ablation approach in Stage 8 – 9 embryos at regions where cells are actively dividing/proliferating, which would expect to generate compressive stresses in the tissue. As we perform laser ablation in this region, we observed shrinking of the treated region, which was distinct from the tensile tissue response (Authors’ Response Figure 1). While this preliminary evidence is encouraging, we agree with the reviewer that further independent validations are needed given that the methods for detecting tissue compression have not been well established in the field. Following the editor’s suggestion, we have removed this experiment from the current manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 1: Laser ablation in regions of tissues with active cell proliferation (a) or undergoing apical constriction (b). The movement of tissues is indicated by overlaying membrane signals (Ecadherin-GFP) at T = 0 sec and at T = 10 sec. T = 0 in the “After ablation” panels marks the time immediately after ablation. (a) Stage 8 – 9 embryos. Multiple cells are in the process of cell division, as indicated by mitotic rounding (yellow arrowheads) or the appearance of cleavage furrows (red arrowheads). Immediately after laser ablation, the surrounding cells moved towards the ablated region (cyan arrows). (b) An embryo undergoing ventral furrow formation. Ablation within the constriction domain results in recoil of the surrounding cells away from the ablated region (cyan arrows).

      (III) Mechanical isolation of the mesoderm is a very exciting approach to test the possible involvement of adjacent tissues in folding. Indeed, the authors report a delay of ventral furrow formation. However, there is no evidence provided that (a) the mesoderm is mechanically uncoupled, and (b) that the treatment did not have undesired side effects. For example, a similar procedure (so-called cauterization, see Rauzi 2015) has been used to immobilize cells in the Drosophila embryo. Such an effect could account for the observed delay in furrow formation.

      We agree with the reviewer that “mechanical uncoupling” is merely a prediction based on our observation but has not been directly demonstrated. On the other hand, since the purpose of this experiment is to ask whether the presence of the lateral ectoderm is important for the mesoderm to transition between apical constriction and invagination (and our result shows yes), whether the approach we used mechanically uncoupled mesoderm and the ectoderm is no longer an immediately relevant question. We apologize for the imprecise use of the term “mechanically uncoupling” in our original manuscript and we thank the reviewer for pointing this out.

      As for the reviewer’s point (b), we have several pieces of evidence indicating that our approach did not cause anchoring of the tissue to the vitelline membrane. The major difference between the approach we used and that used by Rauzi et al. 2015 is the location of the tissue where the laser treatment was imposed. In order to anchor the tissue to the vitelline membrane, Rauzi et al. target the laser to the apical side of the tissue, adjacent to the vitelline membrane. The resulting cauterization of the tissue caused anchoring of the tissue to the vitelline membrane, presumably by fusion of the tissue with the vitelline membrane. In our approach, we used similar type of laser (femtosecond near infrared laser) to perform tissue disruption, but instead of targeting the apical side of the tissue, we targeted the basal region of the invaginating cleavage furrows during cellularization, with the goal to block cell formation. While the laser intensity we used is high enough to cause cauterization of the tissue as indicated by the appearance of bright autofluorescence in the laser treated region, these “burn marks” are not located at the apical side of the cells (Authors’ Response Figure 2a). The lack of “burn marks” on the vitelline membrane in our experiment is in sharp contrast to the result shown in Rauzi et al 2015 (see Authors’ Response Figure 2b for an example from Rauzi et al in comparison to our own data in 2a). Because of the difference in the location of cauterization, we do not expect that the tissue would be fused with the vitelline membrane after our treatment. This is further suggested by the observation that the burn marks can move before the onset of gastrulation, which again indicates that the tissue is not anchored to the vitelline membrane (Authors’ Response Figure 2c).

      That being said, we acknowledge that we do not fully understand the impact of the laser treatment on the embryo (e.g., what causes the reduced rate of apical constriction), and more control experiments are required in order to fully describe the tissue response we observed. As suggested by the editor, we decided to remove the ectoderm-ablation experiment from the revised manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 2: Laser disruption of cell formation in the lateral ectodermal region. (a) Cross-section and en face views showing the basal location of the “burn marks” after laser disruption in the lateral ectodermal region. No burn marks are observed at the level of the vitelline membrane. Blue and red curves in the cross-section views indicate the vitelline membrane and the position where the projections were made for the en face views. Magenta arrows: burn marks. (b) Figure 5a from Rauzi et al., 2015, clear bright burn marks can be seen from the apical surface view. (c) Overlay of the signal at T = -10 min and 0 min (onset of gastrulation) showing the movement of burn marks before gastrulation (yellow arrows).

      (IV) Some panels show two distinct molecules tagged with the same or spectrally overlapping flurophores, that unfortunately localize in similar spatial patterns. This encumbers data validation.

      We agree with the reviewer that having two distinct proteins tagged with the same fluorophore is not ideal for understanding the behavior of the tagged proteins, however, it usually does not affect the evaluation of the cell or tissue morphology, as far as the cell membrane is explicitly labeled. For example, in our original Figure 2 (new Figure 4), although GFP is tagged on both CIBN and Sqh, and mCherry is tagged on both CRY2-Rho1DN and Sqh, the cell and tissue morphology is clearly discernable by these markers, which allowed us to evaluate the progression of ventral furrow formation. In the cases where there was a need to evaluate the behavior of a particular molecule (e.g. Sph), we always repeated the experiments in a way such that the molecule of interest is tagged with a distinct fluorophore that does not spectrally overlap with other fluorophores – this often requires the use of an plasma membrane anchored CIBN that is not fluorescently tagged (e.g. Figure 1, Figure 4 – figure supplement 3).

      (V) The physical model is a central part for data interpretation. In its current form it is very challenging to follow. It is also critical the system be studied with proper cell aspect ratio, as the elasticity of thin sheets has a well established non-linear thickness dependence.

      These are valid critiques of our thin layer physical model (original Figure 5). The original purpose of this model is not to recapitulate the actual furrow morphology or cell shape change observed in the actual embryo, but rather to test the possibility of recapitulating the acceleration in tissue flow during the folding process by combining local constriction and global compression in a spherical (circular in 2D) elastic shell. Developing a dynamic vertex model that contains the realistic cell aspect ratio comparable to the actual cells in the embryo while displaying realistic cellular dynamics during the folding process is nontrivial and need substantial further development of the model. Since the manuscript is now focused on the bistable characteristics of the mesoderm during gastrulation rather than tissue dynamics during the folding process, we decide to leave the dynamics vertex model out of the revised manuscript, as suggested by the editor.

      Reviewer #2 (Public Review):

      Guo and colleagues aim to unravel the mechanisms driving the fast process of mesoderm invagination in the Drosophila early developing embryo. While cell apical constriction is known to drive ventral furrowing (1st phase), it is still not clear if apical constriction is necessary/sufficient to drive mesoderm internalization (2nd phase) and weather other mechanisms cooperate during this process. By using 1ph optogenetics, the authors cannot test specifically the role of apical constriction but can systematically affect the overall actomyosin network in ventral cells in a time specific fashion (1-minute resolution). In this way, they come to the conclusion that actomyosin contractility is necessary for the 1st phase but not for the 2nd phase of mesoderm invagination. Interestingly, they conclude that the system is bistable. In the second part of this study, the authors test the role of the coupling between mesoderm and ectoderm by using 2D computational modelling and infrared pulsed laser dissection. They propose that the ectoderm can generate compressive forces on the mesoderm facilitating mesoderm internalization (2nd phase).

      This project is of interest since it tackles a key morphogenetic process that is necessary for the development of the embryo. The conclusion of 'bistability' resulting from the RhoDN optogenetic experiments (1st part of this study) are well supported and quite interesting. The IR laser experiments used to tackle the coupling between ectoderm and mesoderm (2nd part of the study) are key to support main conclusions, nevertheless their experimental design and results are puzzling. It is not clear what the authors are actually doing to the tissues. The experiments performed in the 2nd part of this study need to be revisited and conclusions eventually softened.

      Major comments:

      1) The 920 nm laser ablation of ectoderm cells is a key experiment in this study to support the ectoderm compression hypothesis. Nevertheless, this experiment is puzzling: the rationale of the experimental design, the effect of the laser on cells and the interpretation of the results are unclear.

      The rationale for the laser ablation experiment designed to test tissue compression is analogous to the widely used laser ablation approach for detecting tissue tension (Rauzi et al., 2015; Rauzi et al., 2008). In typical experiments where laser ablation was used to measure tensile stresses in cells and tissues, ablation of cells or subcellular structures that are under tension results in recoil of surrounding cell/tissue structures. We reasoned that if the tissue is under compression, similar laser treatment should result in shrinking of the laser-treated region, as the cells in the laser-treated region are expected to have an impaired ability to resist compressive stresses from outside of the region.

      In our experiment, we used the reduction of the width of the laser treated region within the first 10 sec after laser treatment as the measure for tissue shrinking, which we considered as an indication for the presence of compressive stresses. This tissue response, albeit mild, is not a non-specific tissue response to our laser treatment – we did not observe tissue shrinking when we treat the tissue during cellularization or within the first five minutes of gastrulation, although identical experimental conditions were used. The rate and magnitude of tissue shrinking after laser treatment is determined by multiple factors, including the level of compressive stresses, the difference in cell rigidity before and after laser treatment, and the overall viscosity of the tissue. We acknowledge that the knowledge on these factors is largely lacking, and therefore additional independent validations of our approach are needed to further strengthen our conclusion on the presence of tissue compression. Following the editor’s suggestion, we decided to remove the laser ablation experiment from the current manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      2) The authors propose to use again 920 nm laser ablation but this time to "physically separate" the two ectoderms from the ventral tissue. This is again a key experiment, but it raises some concerns:

      a. "Physical separation" would need to be demonstrated (e.g., EM after laser ablation). From Fig. 6b it is clear that IR laser ablation results in prominent auto-fluorescent zones. This has been already reported in previous work (De Medeiros G. et al. Scientifc Reports 2020) showing that high power and sustained IR fs laser targeting produces auto-fluorescence and highly electron-dense structures in the early developing Drosophila embryo. This process is referred to laser cauterization that does not induce separation between tissues. This structures eventually displace together with the lateral tissue (also shown in Fig.6 b). b. This strong laser "treatment", that should be ectoderm specific, results in perturbation of other non-ectoderm related processes (e.g., mesoderm apical constriction as shown by the authors). This can support the idea that many other processes are affected and that in general this laser heating "treatment" has global effects. These results might invalidate the conclusion proposed by the authors.

      These are both valid critiques. As for the reviewer’s point “a”, we agree with the reviewer that a “physical separation” of the mesoderm from the ectoderm has not been rigorously demonstrated in our original manuscript. As detailed in our response to reviewer #1 comment #3, since the purpose of this experiment is to ask whether the presence of the lateral ectoderm is important for the mesoderm to transition between apical constriction and invagination (and our result shows yes), whether the approach we used physically separated the mesoderm and the ectoderm is no longer an immediately relevant question. We apologize for the vague use of “physical separation” in our original manuscript and we thank the reviewer for pointing this out.

      To address the reviewer’s point “b” and to ask whether the laser treatment used in our experiment has a global effect, we performed a control experiment where we treated the yolk region of the embryo with the identical approach. Despite the appearance of burn marks in the treated yolk region, mesoderm invagination proceeded largely normally under this condition, with a mild reduction in the rate of furrow invagination (Authors’ Response Figure 3). Therefore, the prominent delay in the transitional state we observed after disruption of lateral ectoderm (Original Figure 6) is not likely caused by non-specific laser heating effect. In addition, in both the yolk-ablation and the ectoderm-ablation experiments, cellularization occurred normally outside of the laser-treated regions, in further support of the lack of strong non-specific effect from our laser treatment. That being said, we acknowledge that we do not fully understand the impact of the laser treatment on the embryo (e.g., what causes the reduced rate of apical constriction), and more control experiments are required in order to fully describe the tissue response we observed. As suggested by the editor, we decided to remove the ectoderm-ablation experiment from the revised manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 3. Laser treatment in the yolk region of the embryo. (a) Cartoon depicting the position of laser treatment. Similar laser condition was used as described in the original Figure 6. Laser ablation was performed during cellularization and the treated embryo was imaged during gastrulation. (b) An example control embryo without laser treatment. (d-e) Two examples showing ventral furrow formation after laser treatment in the yolk region. Only a mild delay in furrow invagination was observed. Red arrowheads indicate the invagination front. Scale bar: 25μm.

      Reviewer #3 (Public Review):

      The authors address how contractile forces near the apical surface of a cell sheet drive out-of-plane bending of the sheet. To determine whether actomyosin contractility is required throughout the folding process and to identify potential actomyosin independent contributions for invagination, they develop an optogenetic-mediated inhibition of myosin and show that myosin contractility is critical to prevent tissue relaxation during the early stage of folding but is dispensable for the deepening of the invagination. Their results support the idea that the mesoderm is mechanically bistable during gastrulation. They propose that this mechanical bistability arises from an in-plane compression from the surrounding ectoderm and that mesoderm invagination is achieved through the combination of apical constriction and tissue compression. Regarding global message of the manuscript, I have two main critics. The authors consider their work as the first to prove that there is a additional mechanism to apical constriction leading to invagination. This is not true. First, the fact that the ectoderm could exert a compressive force on the invaginating mesoderm is not new and has been not only proposed, but tested previously (Rauzi and Leptin, 2015). Second, several recent publications demonstrated that on top of apical constriction, lateral forces were also required for the invagination and the authors ignore these data (Gracia et al, 2019 ; John et al, 2021).

      We thank the reviewer for this important comment. In the original Introduction, we have mentioned several previous studies that suggest the presence of additional mechanisms to apical constriction during ventral furrow formation. We stated: “The observation that the maximal rate of apical constriction and the maximal rate of tissue invagination occur at distinct times suggests that apical constriction does not directly cause tissue invagination (Polyakov et al., 2014; Rauzi et al., 2015). A number of computational models also predict that mesoderm invagination requires additional mechanical input, such as “pushing” forces from the surrounding ectodermal tissues, but experimental evidence for this additional mechanical input remains sparse (Munoz et al., 2007; Conte et al., 2009; Allena et al., 2010; Brodland et al., 2010).”

      To address the reviewer’s comment, in the revised manuscript, we expanded this paragraph to further elaborate the previous contributions: “However, accumulating evidence suggests that apical constriction does not directly drive invagination during the shortening phase. First, it has been observed that the maximal rate of apical constriction (or cell lengthening) and the maximal rate of tissue invagination occur at distinct times (Polyakov et al., 2014; Rauzi et al., 2015). Second, it has been previously proposed, and more recently experimentally demonstrated, that myosin accumulated at the lateral membranes of constricting cells (‘lateral myosin’) facilitates furrow invagination by exerting tension along the apical-basal axis of the cell (Brodland et al., 2010; Conte et al., 2012; Gracia et al., 2019; John and Rauzi, 2021). Finally, a number of computational models predict that mesoderm invagination requires additional mechanical input from outside of the mesoderm, such as “pushing” forces from the surrounding ectodermal tissue (Munoz et al., 2007; Conte et al., 2009; Allena et al., 2010; Brodland et al., 2010). These models are in line with the finding that blocking the movement of the lateral ectoderm by laser cauterization inhibits mesoderm invagination (Rauzi et al., 2015). A similar disruption of ventral furrow formation can also be achieved by increasing actomyosin contractility in the lateral ectoderm (Perez-Mockus et al., 2017). While these pioneer studies highlight the importance of cross-tissue coordination during mesoderm invagination, the actual mechanical mechanism that drives the folding of the mesodermal epithelium and the potential role of the surrounding ectodermal tissue remain to be elucidated.”

      One of the motivations for us to develop experimental approaches to detect compression in the ectoderm (original Figure 4) and to disrupt the ectoderm (original Figure 6) is the lack of direct evidence demonstrating the mechanical contribution of the ectoderm to mesoderm invagination. Several studies have shown that manipulations of the ectodermal tissue can impair ventral furrow formation. One study shows that preventing the movement of the lateral ectoderm, by anchoring ectodermal cell apices to the vitelline membrane, blocks ventral furrow invagination(Rauzi et al., 2015). Another study shows that upregulation of apical myosin contractility in the lateral ectodermal tissues can inhibit or even reverse the furrow invagination process (Perez-Mockus et al., 2017). These results indicate that an increase in the resistance to mesoderm movement can impair mesoderm invagination. However, this would be expected even if the ectoderm does not provide active mechanical input to facilitate mesoderm invagination. Therefore, these experiments, while very informative, did not provide direct evidence for a role of ectodermal compression in mesoderm invagination.

      Another motivation for us to examine potential mechanisms outside of the mesoderm is the observation that ventral furrow invagination continues even when both apical myosin and lateral myosin are disrupted after Ttrans (Late Group embryos). This result indicates that factors other than apical or lateral myosin must be responsible for the invagination of the furrow in Late Group embryos. In the revised manuscript, we used a modeling approach to demonstrate that lateral myosin and ectodermal compression may function in parallel to promote the invagination of the ventral furrow (Figure 7). In the revised Discussion, we propose that “ventral furrow formation is mediated through a joint action of multiple mechanical inputs. Apical constriction drives initial indentation of ventral furrow, which primes the tissue for folding, whereas the subsequent rapid folding of the furrow is promoted by bistable characteristic of the mesoderm and by lateral myosin contractions in the constricting cells.”

      They generated an optogenetic tool, "Opto-Rho1DN", to inhibit Rho1 through light-dependent plasma membrane recruitment of a dominant negative form of Rho1 (Rho1DN). The specificity of local inactivation of Myosin was tested on apical myosin before and during invagination. They observed a strong reduction of Myosin II recruitment and a phenotype that mimicks Rok inhibition. They found that acute loss of myosin contractility during most of the lengthening phase results in immediate relaxation of the constricted tissue, but similar treatment near or after the lengthening-shortening transition does not impede invagination. They conclude that the second part of furrow invagination is not due to myosin activities at the apical or lateral cortices of the mesodermal cells and that actomyosin contractility is required in the early but not the late phase of furrow formation. This part regarding the temporal requirement of Myosin during invagination brings novelty in the field since it has never been tested before.

      We thank the reviewer for the comment on the novelty of our work.

      They observe that ectodermal cells shorten their apico-basal axis prior to Ttrans, and that compression from the ectoderm is independent of ventral furrow formation since it still occurs even if invagination is inhibited.

      They further develop two types of simulations to test theoretically the importance of compressive stress in the invagination process. The theoretical part would need to be further developed and discussed. They would need to integrate all the different components that have been shown to be essential for the invagination (not only apical constriction) and the dynamic aspect of the vertex model has to be clearly explained.

      We thank the reviewer for the suggestions on the modeling parts. In the energy-based vertex model (the Polyakov model, original Figure 3), two previously identified mechanisms, apical constriction and basal relaxation, have been implemented in the model to drive lengthening-shortening cell shape change and furrow invagination. Following the reviewer’s suggestions, we have modified the Polyakov model to include additional mechanisms that have been shown to facilitate ventral furrow invagination. In particular, we focused our analysis on the role of lateral myosin in the constricting cells on furrow invagination (Figure 7). Please refer to our response to the combined comments for details (in the section “ Additional modeling analysis to test the known mechanisms for mesoderm invagination”).

      As for the dynamic vertex model presented in our original manuscript (original Figure 5), as detailed in our response to Reviewer #1’s comment #5, since the revised manuscript is focused on the bistable characteristics of the mesoderm during gastrulation rather than tissue dynamics during the folding process, we decide to leave this part out of our revised manuscript as suggested by the editor.

    1. Author Response

      Reviewer #1 (Public Review):

      This thorough study expands our understanding of BMP signaling, a conserved developmental pathway, involved in processes diverse such as body patterning and neurogenesis. The authors applied multiple, state-of-art strategies to the anthozoan Nematostella vectensis in order to first identify the direct BMP signaling targets - bound by the activated pSMAD1/5 protein - and then dissect the role of a novel pSMAD1/5 gradient modulator, zwim4-6. The list of target genes features multiple developmental regulators, many of which are bilaterally expressed, and which are notably shared between Drosophila and Xenopus. The analysis identified in particular zswim4-6 a novel nuclear modulator of the BMP pathway conserved also in vertebrates. A combination of both loss-of-function (injection of antisense morpholino oligonucleotide, CRISPR/Cas9 knockout, expression of dominant negative) and gain-of-function assays, and of transcriptome sequencing identified that zwim acts as a transcriptional repression of BMP signaling. Functional manipulation of zswim5 in zebrafish shows a conserved role in modulating BMP signaling in a vertebrate.

      The particular strength of the study lies in the careful and thorough analysis performed. This is solid developmental work, where one clear biological question is progressively dissected, with the most appropriate tools. The functional results are further validated by alternative approaches. Data is clearly presented and methods are detailed. I have a couple of comments.

      1) I was intrigued - as the authors - by the fact that the ChiP-Seq did not identify any known BMP ligand bound by pSMAD1/5. Are these genes found in the published ChiP-Seq data of the other species used for the comparative analysis? One hypothesis could be that there is a change in the regulatory interactions and that the initial set-up of the gradient requires indeed a feedback loop, which is then turned off at later gastrula. In this case, immunoprecipitation at early gastrula, prior to the set-up of the pSMAD1/5 gradient, could reveal a different scenario. Alternately, the regulation could be indirect, for example, through RGM, an additional regulator of BMP signaling expressed on the side of lower BMP activity, which is among the targets of the ChiP-Seq. This aspect could be discussed. Additionally, even if this is perhaps outside the scope of this study, I think it would be informative to further assess the effect of ZSWIM manipulation on RGM (and vice versa).

      Indeed, BMP genes are direct BMP signaling targets in Drosophila (dpp) (Deignan et al., 2016, https://doi.org/10.1371/journal.pgen.1006164) and frog (bmp2, bmp4, bmp5, bmp7) (Stevens et al., 2021, https://doi.org/10.1242/dev.145789). Of all these ligands, only the dorsally expressed Xenopus bmp2 is repressed by BMP signaling, while another dorsally expressed Xenopus BMP gene admp is not among the direct targets. All other BMP genes listed here are expressed in the pMad/pSMAD1/5/8-positive domain and are activated by BMP signaling.

      In Nematostella, we do not find BMP genes among the ChIP-Seq targets, but this is not that surprising considering the dynamics of the bmp2/4, bmp5-8 and chordin expression, as well as the location of the pSMAD1/5-positive cells. In late gastrulae/early planulae, Chordin appears to be shuttling BMP2/4 and BMP5-8 away from their production source and over to the gdf5-like side of the directive axis (Genikhovich et al., 2015; Leclere and Rentsch, 2014). By 4 dpf, chordin expression stops, and BMP2/4 and BMP5-8 start to be both expressed AND signal in the mesenteries. If bmp2/4 and bmp5-8 expression were directly suppressed by pSMAD1/5 (as is the case chordin or rgm expression), this mesenterial expression would not be possible. Therefore, in our opinion, it is most likely that at late gastrula and early planula the regulation of bmp2/4 and bmp5-8 expression by BMP signaling is indirect. We do not have an explanation for why gdf5-like (another BMP gene expressed on the “high pSMAD1/5” side) is not retrieved as a direct BMP target in our ChIP data. Since we do not understand well enough how BMP gene expression is regulated, we do not discuss this at length in the manuscript.

      As the Reviewer suggested, we analyzed the effect of ZSWIM4-6 KD on the expression of rgm. Expectedly, since it is expressed on the “low BMP side”, its expression was strongly expanded (Figure 6 - Figure Supplement 4)

      2) I do not fully understand the rationale behind the choice of performing the comparative assays in zebrafish: as the conservation was initially identified in Xenopus, I would have expected the experiment to be performed in frog. Furthermore, reading the phylogeny (Figure 4A), it is not obvious to me why ZSWIM5 was chosen for the assay (over the other paralog ZSWIM6). Could the Authors comment on this experiment further?

      The comparison was done in zebrafish because we were planning to generate zswim5 mutants, whose analysis is currently in progress. ZSWIM6 is not expressed at the developmental stages we were interested in, while ZSWIM5 was, based on available zebrafish expression data (White et al., 2017):

      Reviewer #2 (Public Review):

      The authors provide a nice resource of putative direct BMP target genes in Nematostella vectensis by performing ChIP-seq with an anti-pSmad1/5 antibody, while also performing bulk RNA-seq with BMP2/4 or GDF5 knockdown embryos. Genes that exhibit pSmad1/5 binding and have changes in transcription levels after BMP signaling loss were further annotated to identify those with conserved BMP response elements (BREs). Further characterization of one of the direct BMP target genes (zswim4-6) was performed by examining how expression changed following BMP receptor or ligand loss of function, as well as how loss or gain of function of zswim4-6 affected development and BMP signaling. The authors concluded that zswim4-6 modulates BMP signaling activity and likely acts as a pSMAD1/5 dependent co-repressor. However, the mechanism by which zswim4-6 affects the BMP gradient or interacts with pSMAD1/5 to repress target genes is not clear. The authors test the activity of a zswim4-6 homologue in zebrafish (zswim5) by over-expressing mRNA and find that pSMAD1/5/9 labeling is reduced and that embryos have a phenotype suggesting loss of BMP signaling, and conclude that zswim4-6 is a conserved regulator of BMP signaling. This conclusion needs further support to confirm BMP loss of function phenotypes in zswim5 over-expression embryos.

      Major comments

      1) The BMP direct target comparison was performed between Nematostella, Drosophila, and Xenopus, but not with existing data from zebrafish (Greenfeld 2021, Plos Biol). Given the functional analysis with zebrafish later in the paper it would be nice to see if there are conserved direct target genes in zebrafish, and in particular, is zswim5 (or other zswim genes) are direct targets. Since conservation of zswim4-6 as a direct BMP target between Nematostella and Xenopus seemed to be part of the rationale for further functional analysis, it would also be nice to know if this is a conserved target in zebrafish.

      Thank you for the suggestion. In the paper by Greenfeld et al., 2021, zebrafish zswim5 was downregulated approximately 2.4x in the bmp7 mutant at 6 hpf, while zswim6 was barely expressed and not affected at this stage. We added this information to the text of the manuscript. Expression of several other zebrafish zswim genes was also affected in the bmp7 mutant, but these genes do not appear relevant for our study since their corresponding orthologs are not identified as pSMAD1/5 ChIP-Seq targets in Nematostella. Notably, zebrafish zzswim5 is not clearly differentially expressed in BMP or Chd overexpression conditions (See Supplementary file 1 in Rogers et al. 2020). Importantly, in the paper, we wanted to compare ChiP-Seq data with ChIP-Seq data, however, unfortunately, no ChIP-Seq data for pSMAD1/5/8 is currently available for zebrafish, thus precluding comparisons.

      Related to this, in the discussion it is mentioned that zswim4/6 is also a direct BMP target in mouse hair follicle cells, but it wasn't obvious from looking at the supplemental data in that paper where this was drawn from.

      Please see Supplementary Table 1, second Excel sheet labeled “Mx ChIP_Seq” in Genander et al., 2014, https://doi.org/10.1016/j.stem.2014.09.009. Zswim4 has a single pSMAD1 peak associated with it, Zswim6 has two.

      2) The loss of zswim4-6 function via MO injection results in changes to pSmad1/5 staining, including a reduction in intensity in the endoderm and gain of intensity in the ectoderm, while over-expression results in a loss of intensity in the ectoderm and no apparent change in the endoderm. While this is interesting, it is not clear how zswim4-6 is functioning to modify BMP signaling, and how this might explain differential effects in ectoderm vs. endoderm. Is the assumption that the mechanism involves repression of chordin? And if so one could test the double knockdown of zswim4-6 and chordin and look for the rescue of pSad1/5 levels or morphological phenotype.

      We do not think that the mechanism of the ZSWIM4-6 action is via repression of Chordin. As loss of chordin leads to the loss of pSMAD1/5 in Nematostella (Genikhovich et al., 2015), the proposed experiment is, unfortunately, not feasible to test this hypothesis. Currently, we see two distinct effects of the modulation of zswim4-6 expression. First, it affects the pSMAD1/5 gradient, possibly by destabilizing nuclear SMAD1/5, as has been proposed by Wang et al., 2022 for the vertebrate Zswim4. This is in line with our results shown on Fig. 6C-F’ and Fig. 6-Figure supplement 3. In our opinion, the reaction of the genes expressed on the “high BMP” side of the directive axis to the overexpression or KD of ZSWIM4-6 (Fig. 6I-K’, 6N-P’) can be explained by these changes in the pSMAD1/5 signaling intensity. Secondly, zswim4-6 appears to promote pSMAD1/5-mediated gene repression. This is in line with the reaction of the genes expressed on the “low BMP” side of the directive axis (Fig. 6G-H’, 6L-M’, Fig. 6-Figure Supplement 4). These genes are repressed by BMP signaling, but they expand their expression upon zswim4-6 KD in spite of the increased pSMAD1/5. Our ChiP experiment (Fig. 6Q) supports this view.

      3) Several experiments are done to determine how zswim4-6 expression responds to the loss of function of different BMP ligands and receptors, with the conclusion being that swim4-6 is a BMP2/4 target but not a GDF5 target, with a lot of the discussion dedicated to this as well. However, the authors show a binary response to the loss of BMP2/4 function, where zswim4-6 is expressed normally until pSmad1/5 levels drop low enough, at which point expression is lost. Since the authors also show that GDF5 morphants do not have as strong a reduction in pSmad1/5 levels compared to BMP2/4 morphants, perhaps GDF5 plays a positive but redundant role in swim4-6 expression. To test this possibility the authors could inject suboptimal doses of BMP2/4 MO with GDF5 MO and look for synergy in the loss of zswim4-6 expression.

      Thanks for this great suggestion! We performed this experiment (Fig. 5H’’-L) and indeed, a suboptimal dose of BMP2/4MO + GDF5lMO results in a complete radialization of the embryo and abolished zswim4–6, similar to the effect of a high dose of BMP2/4. This result suggests that rather than being a ligand-specific signaling function, GDF5-like signaling alone still provides sufficiently high pSmad1/5 levels to activate zswim4-6 expression to apparent wildtype levels, demonstrating the sensitivity of this gene to even very low amounts of BMP signaling.

      4) The zswim4-6 morphant embryos show increased expression of zswim4-6 mRNA, which is said to indicate that zswim4-6 negatively regulates its own expression. However in zebrafish translation blocking MOs can sometimes stabilize target transcripts, causing an artifact that can be mistakenly assumed to be increased transcription (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7162184/). Some additional controls here would be warranted for making this conclusion.

      Thanks for raising this important experimental consideration. To-date, we do not have any evidence for MO-mediated transcript stabilization in Nematostella, and we have not found such data in the literature on models other than zebrafish. mRNA stabilization by the MO also seemed unlikely because we were unable to KD zswim4-6 using several independent shRNAs - an effect we frequently observe with genes, whose activity negatively regulates their own expression. However, to test the possibility that zswim4-6MO binding stabilizes zswim4-6 mRNA, we injected mRNA containing the zswim4-6MO recognition sequence followed by the mCherry coding sequence (zswim4-6MO-mCherrry) with either zswim4-6MO or control MO. We could clearly detect mCherry fluorescence at 1 dpf if control MO was co-injected with the mRNA, but not if zswim4-6MO was coninjected with the mRNA. At 2 dpf (the stage at which we showed upregulation of zswim4-6 upon zswim4-6MO injection on Fig. 6I-I’), zswim4-6MO-mCherrry mRNA was undetectable by in situ hybridization with our standard FITC-labeled mCherry probe independent of whether zswim4-6MO-mCherrry mRNA was co-injected with the control MO or ZSWIM4-6MO, while hybridization with the FITC-labeled FoxA probe worked perfectly.

      Author response image 1.

      We are currently offering two alternative hypothesis for the observed increase in zswim4-6 levels in the paper rather than stating explicitly that ZSWIM4-6 negatively regulates its own expression: “The KD of zswim4-6 translation resulted in a strong upregulation of zswim4-6 transcription, especially in the ectoderm, suggesting that ZSWIM4-6 might either act as its own transcriptional repressor or that zswim4-6 transcription reacts to the increased ectodermal pSMAD1/5 (Fig. 6I-I’).” Given the sensitivity of zswim4-6 to even the weakest pSMAD1/5 signal (zswim4/6 is expressed upon GDF5-like KD, which drastically reduces pSMAD1/5 signaling intensity (see Fig. 1 and 2 in Genikhovich et al., 2015, http://doi.org/10.1016/j.celrep.2015.02.035 and Fig. 6-Figure supplement 3 of this paper), the latter option (that it reacts to the increased ectodermal pSMAD1/5) is, in our opinion, clearly the more probable one.

      5) Zswim4-6 is proposed to be a co-repressor of pSmad1/5 targets based on the occupancy of zswim4-6 at the chordin BRE (which is normally repressed by BMP signaling) and lack of occupancy at the gremlin BRE (normally activated by BMP signaling). This is a promising preliminary result but is based only on the analysis of two genes. Since the authors identified BREs in other direct target genes, examining more genes would better support the model.

      We suggest that ZSWIM4-6 may be a co-repressor of pSMAD1/5 targets because it is a nuclear protein (Fig. 4G), whose knockdown results in the expansion of the ectodermal expression of several genes repressed by pSMAD1/5 in spite of the expansion of pSMAD1/5 itself (Fig. 6G-H’, 6L-M’, Fig. 6-Figure Supplement 4). Our limited ChIP analysis supports this idea by showing that ZSWIM4-6 is bound to the pSMAD1/5 site of chordin (repressed by pSMAD1/5) but not on gremlin (activated by pSMAD1/5). We agree that adding the analysis of more targets in order to challenge our hypothesis would be good. However, given technical limitations (having to inject many thousands of eggs with the EF1a::ZSWIM4-6-GFP plasmid in order to get enough nuclei to extract sufficient immunoprecipitated chromatin for qPCR on 3 genes (chordin, gremlin, GAPDH) for each biological replicate, it is currently unfortunately not feasible to test more genes. It will be of great interest for follow up studies to generate a knock-in line with tagged zswim4-6 to analyze target binding on a genome-wide scale. We stress in the discussion that currently the power of our conclusion is low.

      6) The rationale for further examination of zswim4-6 function in Nematostella was based in part on it being a conserved direct BMP target in Nematostella and Xenopus. The analysis of zebrafish zswim5 function however does not examine whether zswim5 is a BMP target gene (direct or indirect). BMP inhibition followed by an in situ hybridization for zswim5 would establish whether its expression is activated downstream of BMP.

      In the paper by Greenfeld et al., 2021, zebrafish zswim5 was downregulated approximately 2.4x in the bmp7 mutant at 6 hpf. However, this gene was not among the 57 genes, which were considered to be direct BMP targets because their expression was affected by bmp7 mRNA injection into cycloheximide-treated bmp7 mutants (Greenfeld et al., 2021). We added this information to the text of the manuscript.

      7) Although there is a reduction in pSmad1/5/9 staining in zebrafish injected with zswim5 mRNA, it is difficult to tell whether the resulting morphological phenotypes closely resemble zebrafish with BMP pathway mutations (such as bmp2b). More analysis is warranted here to determine whether stereotypical BMP loss of function phenotypes are observed, such as dorsalization of the mesoderm and loss of ventral tail fin.

      We agree, and we have tuned down all zebrafish arguments. Analyses of zswim5 mutants are currently ongoing.

    1. Author Response

      Reviewer #3 (Public Review):

      1) Validation of reagents: The authors generated a pY1230 Afadin antibody claiming that (page 6) "this new antibody is specific to tyrosine phosphorylated Afadin, and that pY1230 is targeted for dephosphorylation by PTPRK, in a D2-domain dependent manner". The WB in Fig 1B shows a lot of background, two main bands are visible which both diminish in intensity in ICT WT pervanadate-treated MCF10A cell lysates. The claim that the developed peptide antibody is selective for pY1230 in Afadin would need to be substantiated, for instance by pull down studies analysed by pY-MS to substantiate a claim of antibody specificity for this site. However, for the current study it would be sufficient to demonstrate that pY1230 is indeed the dephosphorylated site. I suggest therefore including a site directed mutant (Y1230F) that would confirm dephosphorylation at this site and the ability of the antibody recognizing the phosphorylation state at this position.

      We would like this antibody to be a useful and freely accessible tool in the field and have taken on board the request for additional validation. To this end we have significantly expanded Supplementary Figure 2 (now Figure 1 - figure supplement 2) and included a dedicated section of the results as follows: 1. We have now included information about all of the Afadin antibodies used in this study, since Afadin(BD) appears to be sensitive to phosphorylation (Figure 1 - figure supplement 2A). 2. We have demonstrated that the Afadin pY1230 antibody detects an upregulated band in PTPRK KO MCF10A cells, consistent with our previous tyrosine phosphoproteomics (Figure 1 - figure supplement 2B). This indicates that the antibody can be used to detect endogenous Afadin phosphorylation. 3. We have included two new knock down experiments demonstrating the recognition of Afadin by our antibody (Figure 1 - figure supplement 2C). There appear to be two Afadin isoforms recognised in HEK293T cells by both the BD and pY1230 antibody, consistent with previous reports (Umeda et al. MBoC, 2015). We have highlighted these in the figure. 4. We have performed mutagenesis to demonstrate the specificity of the antibody. We tagged Afadin with a fluorescent protein tag, reasoning that it would cause a shift in molecular weight that could be resolved by SDS PAGE, as is the case. We noted that the phosphopeptide used spans an additional tyrosine, Y1226, which has been detected as phosphorylated (although to a much lower extent than Y1230) on Phosphosite plus. The data clearly show that Afadin cannot be phosphorylated when Y1230 is mutated to a phenylalanine (compared to CIP control), indicating that this is the predominant site recognised by the antibody. In addition, the endogenous pervanadate-stimulated signal is completely abolished by CIP treatment (Figure 1 - figure supplement 2D). 5. We have included densitometric quantification of the dephosphorylation assay shown in Figure 1B, which was part of a time course and shows preferential dephosphorylation by the PTPRK ICD compared to the PTPRK D1. The signal stops declining with time, which could indicate antibody background, or an inaccessible pool of Afadin-pY1230 (Figure 1 - figure supplement 2E). 6. To further demonstrate that this site is modulated by PTPRK in post-confluent cells, we have used doxycycline (dox)-inducible cell lines generated in Fearnley et al, 2019. Upon treatment with 500 ng/ml Dox for 48 hours PTPRK is induced to lower levels than wildtype, however, normalized quantification of the Afadin pY1230 against the Afadin (CST) signal clearly indicates downregulation by PTPRK WT, but not the catalytically inactive mutant (Figure 1 - figure supplement 2F and 2G). Together these data strengthen our assertion that this antibody recognises endogenously phosphorylated Afadin at site Y1230, which is modulated in vitro and in cells by PTPRK phosphatase activity. For clarity, we have highlighted and annotated the relevant bands in figures. We have also included identifiers for each Afadin total antibody was used in particular experiments.

      2) The authors claim that a short, 63-residue predicted coiled coil (CC) region, is both necessary and sufficient for binding to the PTPRK-ICD. The region is predicted to have alpha-helical structure and as a consequence, a helical structure has been used in the docking model. Considering that the authors recombinantly expressed this region in bacteria, it would be experimentally simple confirming the alpha-helical structure of the segment by CD or NMR spectroscopy.

      To clarify, the helical structure in the docking model was independently predicted by several sequence and structural analysis programmes including AlphaFold2, RobettaFold, NetSurfP and as annotated in Uniprot (as a coiled coil). We did not stipulate prior to the AF2 prediction that it was helical. Isolated short peptides frequently adopt helical structure, therefore prediction of a helix within the context of the full Afadin sequence is, in our opinion, stronger evidence than CD of an isolated fragment.

      3) Only two mutants have been introduced into PTPRK-ICD to map the Afadin interaction site. One of the mutations changes a possibly structurally important residues (glycine) into a histidine. Even though this residue is present in PTPRM, it does not exclude that the D2 domain no longer functionally folds. Also the second mutation represents a large change in chemical properties and the other 2 predicted residues have not been investigated.

      The residues that were selected for mutation are all localised to the protein surface and therefore are unlikely to be involved in stable folding of PTPRK. In support of the correct folding of the mutated PTPRK, we include in Figure 1 below SEC elution traces for wild-type and mutant D2 showing that they elute as single symmetric peaks at the same elution volume as the WT protein. This is consistent with them having a similar shape and size, and not being aggregated or unfolded.

      Figure 1. PTPRK-D2 wild-type and mutant preparative SEC elution profiles. A280nm has been normalised to help illustrate that the different proteins elute at the same volume. The main peak from these samples was used for binding assays in the main paper.

      Furthermore, the yield for the double mutant was very high (4 mg of pure protein from a 2 L culture, see A280 value in graph below), whereas poorly folded proteins tend to have significantly reduced yields. This protein was also very stable over time whereas unfolded proteins tend to degrade during or following purification.

      Figure 2. Analytical SEC elution profile for the PTPRK-D2 DM construct showing the very high yield consistent with a well-folded, stable protein.

      Finally, we have carried out thermal melt curves of the WT and mutant PTPRK D2 domains showing that they all possess melting temperatures between 39.3°C and 41.7°C, supporting that they are all equivalently folded. We include these data as an additional Supplementary Figure (Figure 4 - figure supplement 3) in the paper.

      4) The interface on the Afadin substrate has not been investigated apart from deleting the entire CC or a central charge cluster. Based on the docking model the authors must have identified key positions of this interaction that could be mutated to confirm the proposed interaction site.

      We have now made and tested several additional mutations within both the Afadin-CC and PTPRK-D2 domains to further validate the AF2 predicted model of the complex.

      For Afadin-CC we introduced several single and double mutations along the helix including residues predicted to be in the interface and residues distal from the interface. These mutations and the pulldown with PTPRK are described in the text and are included as additional panels to a modified Figure 3. All mutations have the expected effect on the interaction based on the predicted complex structure. To help illustrate the positions of these mutations we have also included a figure of the interface with the residues highlighted.

      For the PTPRK-D2 we have also introduced two new mutations, one buried in the interface (F1225A) and one on the edge of the interface encompassing a loop that is different in PTPRM (labelled the M-loop). GST-Afadin WT protein was bound to GSH beads and tested for their ability to pulldown WT and mutated PTPRK. These new mutations (illustrated in the new Figure 4 – figure supplement 2) further support the model prediction. F1225A almost completely abolishes binding as predicted, while the M-loop retains binding. These mutations and their effects are now described in the main text and the pull-down data, including controls and retesting of the original DM mutant, are included as panel H in a newly modified Figure 4 focussed solely on the PTPRK interface.

      5) A minor point is that ITC experiments have not been run long enough to determine the baseline of interaction heats. In addition, as large and polar proteins were used in this experiment, a blank titration would be required to rule out that dilution heats effect the determined affinities.

      All control experiments including buffer into buffer, Afadin into buffer and buffer into PTPRK were carried out at the same time as the main binding experiment and are shown below overlaid with the binding curve. These demonstrate the very small dilution heats consistent with excellent buffer matching of the samples.

      We were able to obtain excellent fits to the titration curves by fitting 1:1 binding with a calculated linear baseline (see Figure 2B,D). Very similar results were obtained by fitting to the sum (‘composite’) of fitted linear baselines obtained for the three control experiments for each titration.

    1. Author Response:

      Reviewer #2 (Public Review):

      There is now a considerable body of knowledge about the genetic and cellular mechanisms driving the growth, morphogenesis and differentiation of organs in experimental organisms such as mouse and zebrafish. However, much less is known about the corresponding processes in developing human organ systems. One powerful strategy to achieve this important goal is to use organoids derived from self-renewing, bona fide progenitor cells present in the fetal organ. The Rawlins' lab has pioneered the long-term culture of organoids derived from multipotent epithelial progenitors located in the distal tips of the early human lung. They have shown that clonal cell "lines" can be derived from the organoids and that they capable of not only long-term self-renewal but also limited differentiation in vitro or after grafting under the kidney capsule of mice. Here, they now report a strategy to efficiently test the function of genes in the embryonic human lung, regardless of whether the genes are actively transcribed in the progenitor cells. The strengths of the paper are that the authors describe a number of different protocols (work-flows), based on Crisper/Cas9 and homology directed repair, for making fluorescent reporter alleles (suitable for cell selection) and for inducible over-expression or knockout of specific genes. The so-called "Easytag" protocols and results are carefully described, with controls. The work will be of significant interest to scientists using organoids as models of many human organ systems, not just the lung. The weaknesses are that they authors do not show that their lines can undergo differentiation after genetic manipulation, and therefore do not provide proof of principle that they can determine the function in human lung development of genes known to control mouse lung epithelial differentiation. It would also be of general interest to know whether their methods based on homologous recombination are more accurate (fewer incorrect targeting events or off target effects) than methods recently described for organoid gene targeting using non homologous repair.

      We thank Reviewer #2 for capturing the key advances of our toolbox for understanding gene function using a tissue organoid system and the constructive suggestions for the manuscript.

      We agree with the Reviewer that it would strengthen the current manuscript if we could differentiate the genetically targeted organoids. Therefore, as a proof of concept, we have successfully differentiated the SOX9 reporter organoids into the alveolar lineage (New figure: Figure 2-figure supplement 1g, shown above). We have also tested the dual SMAD inhibition approach recently reported for basal cell differentiation (Miller et al., 2020). However, this has led to massive cell death even in WT organoids (data not shown). We reason that this might be because our organoids are ~8 pcw, whereas in the literature ~12 pcw organoids were used. We believe that efficient airway differentiation will take a long time to optimise for our organoids and is therefore beyond the scope of this manuscript.

      In regard to the Easytag workflow in comparison with the recent CRISPR-HOT method using non-homologous end joining (Artegiani et al., 2020), we consider our approach as a complement to the CRISPR-HOT approach. This can be reflected in the following points: (1) The Organoid Easytag workflow allows precise N-terminal tagging of endogenous genes, exemplified by N-terminal tagging of ACTB. This is not possible using CRISPR-HOT as large pieces of plasmid DNA would disrupt the targeted gene; (2) The Organoid Easytag workflow is based on HDR and the efficient insertion sites for exogenous genes are within a ~30-bp window of the gRNA cleavage sites (Kwart et al., 2017), which gives more flexibility for choosing gRNAs compared with CRISPR-HOT tagging; (3) The Organoid Easytag workflow gives researchers more control of where and how the targeted sites can be modified, and offers a minimal change to the targeted genomic region, whereas CRISPR-HOT introduces large pieces of backbone plasmids, which potentially increases the risk of gene dysregulation. However, HDR requires cells to be at the G2/M phase of the cell cycle, therefore heavily relying on fast cycling cells to gain the most efficient targeting. CRISPR-HOT has the great advantage of not depending on a specific cell cycle stage and therefore being more efficient in slow cycling cells. With this said, we do believe that the efficiency would very much rely on the context, including the cell type used and locus targeted, as a recent report suggested targeting efficiency is influenced also by genomic context (Schep et al., 2021).

      In summary, when N-terminal tagging, minimal changes and precise control of targeting is desired, Organoid Easytag is more favourable; whereas when targeting slowly cycling cells, CRISPR-HOT has its strength. Therefore, we consider these two methods as complementary approaches that will both be of benefit to organoid-based research. We have summarised this comparison into a simple table (New table: Figure 2-figure supplement 5f)

      Figure 2-figure supplement 5(f). A comparison of Organoid Easytag and CRISPR-HOT methods (Artegiani et al., 2020).

      Reviewer #3 (Public Review):

      Sun et al have assembled, modified, and applied a series of existing gene editing tools to tissue-derived human fetal lung organoids in a workflow they have termed "Organoid Easytag". Using approaches that have previously been applied in iPSCs and other cell models in some cases including organoids, the authors demonstrate: 1) endogenous loci can be targeted with fluorochromes to generate reporter lines; 2) the same approach can be applied to genes not expressed at baseline in combination with an excisable, constitutively active promoter to simplify identification of targeted clones; 3) that a gene of interest could be knocked-out by replacing the coding sequence with a fluorescent reporter; 4) that knockdown or overexpression can be achieved via inducible CRISPR interference (CRISPRi) or activation (CRISPRa). In the case of CRISPRi, the authors alter existing technology to lessen unwanted leaky expression of dCas9-KRAB. While these tools have previously been applied in other models, their assembly and demonstrated application to tissue-derived organoids here could facilitate their use in tissue-derived organoids by other groups.

      Limitations of the study include:

      1) is demonstrated application of these technologies to a limited set of gene targets;

      2) a lack of detail demonstrating the efficiency and/or kinetics of the approaches demonstrated.

      While access to human fetal lung organoids is likely not available to many or most researchers, it is probable that the principles applied here could carry over to other organoid models.

      We thank the Reviewer for accurately summarising the details of our manuscript and positive comments on its potential to facilitate tissue-derived organoid related research. We are very grateful for the Reviewer’s detailed and constructive comments to help strengthen our manuscript.

      In regard to the limitations pointed out by Reviewer #3, we have systematically tested the kinetics of the inducible CRISPRi knockdown effect and its reversibility using CD71 and SOX2 (New figure: Figure 3-figure supplement 2). At the same time, we have generated SOX9 reporter human foetal intestinal organoids using the Easytag workflow to further demonstrate it can be applied to another organoid system. As suggested by Reviewer #3, we also attempted to implement the inducible CRISPRi system in HBECs. However, due to their sensitivity to lentiviral transduction, infected HBECs died shortly after transduction with gRNA lentivirus. We believe that further optimisation of DNA delivery approach is required for implementation of the inducible CRISPRi/CRISPRa systems in HBECs (perhaps nucleofection and PiggyBac-based vectors).

    1. Author Response:

      Reviewer #1:

      After infection, new HIV-particles assemble at the host cell plasma membrane in a process that requires the viral protein Gag. Here, Inamdar et al. showed that a component of the host cell, the membrane curvature-inducing protein IRSp53, contributes to efficiently promote the formation of viral particles in synergy with the viral Gag protein.

      In cells depleted of IRSp53, the formation of HIV-1 Gag viral-like particles (VLPs) was compromised. The authors showed in compelling electron micrographs that the formation of VLPs was arrested at about half stage of particle budding. Biochemical data (co-IPs and analysis of VLPs and HIV particle content), super-resolution nanoscopy (single molecule localization microscopy) data, and in vitro biophysics measurements (in GUVs), all seem to indicate a functional connection between Gag and the iBAR-domain containing protein IRSp53. The combination of the different techniques and approaches is a clear strength of this manuscript. However, to my opinion, the interpretation of some of the experimental data is somehow limited by the lack of some appropriate controls (that are lacking for different reasons, as the authors state in some parts of the text). These are:

      1) Specificity of the IRSp53 siRNA. Although the authors showed that the siRNA used can deplete the expression of the protein (both endogenous and ectopic), they did not presented any rescue experiments of the phenotypes (or corroboration with different siRNA oligoes).

      We have tried several different commercial and home-designed siRNA targeting IRSp53 from different companies (providing single siRNA and multiple siRNA mix): we have summarizing all in the Figure R1 (see below). One can see that indeed only 2 siRNA were effective in extinguishing IRSp53 gene: one from Invitrogen on endogenous IRSp53 and ectopic IRSp53-GFP and one from Dharmacon that was only effective on ectopic IRSp53-GFP, as revealed by Western Blot (Fig R1A). Furthermore, the specificity of the siRNA was challenge by testing siRNA IRSp53 on human IRSp53-GFP and on mouse I-BAR-GFP in HEK293T transfected cells and visualized by fluorescence microscopy. Results show in figure R1B that only siIRSp53 is able to extinguished human IRSp53-GFP and not mouse I- BAR-GFP. SiIRTKS and siCtrl are not extinguishing any of these genes. Overall these results confirm the specificity of IRSp53 siRNA-mediated knockdowns.

      Figure R1: Specificity of siRNA-mediated knockdowns: (A) Western blots of HEK293T cells lysates probed with anti-IRSp53 antibody (and house-keeping gene GAPDH) showing a series of different siRNA IRSp53 (and siRNA Control, CTRL from Invitrogen, Dharmacon or Sigma) on endogenous and ectopic IRp53 genes in human HEK293T cells and their efficacy in specifically down regulating IRSp53. (B) siRNA IRSp53 from Invitrogen was tested for its specificity in extinguishing human IRSp53-GFP protein expressed in transfected HEK293T cells, but not mouse I-BAR-GFP, and as compare to siRNA control and IRTKS, revealed by fluorescence imaging (GFP).

      To further answer the reviewers’ comments, we also perform one rescue experiment of the phenotype as shown in Figure R2 below. We observed that, upon co-transfection of pGag+pIRSp53- GFP+siRNA IRSp53 (lane 2), about 50% of the ectopic IRSp53-GFP was extinguished (since this construct is not siRNA resistant), leaving 50% of this ectopic protein expressed in the cells. In this context, one can observe that Gag-VLP release is ~50% (lane 2), similar to the condition pGag+siCTRL (lane 3). When we compare this to pGag+siIRSp53 (lane 4) which is reduced by 2-3 fold (data from Figure 1b of the manuscript), we can say that the remaining IRSp53-GFP in the Lane 2 seems to rescue the defect caused by extinction of the endogenous IRSp53. In the condition pGag+pIRSp53- GFP +siCTRL, VLP-Gag release was slightly reduced. This is an atypical rescue experiment since we do not have an IRSp53-GFP that is resistant to the siRNA IRSp53 used in this study (Figure R1B), but it suggests that if IRSp53-GFP is overexpressed in the presence of Gag and the siRNA IRSp53, VLP-Gag release is at a normal 50% level in contrast to the absence of IRSp53-GFP (compare lane 2 with lane 4). Unfortunately, due to limited time and by the siRNA IRSp53 out of stock, and the delay in supply, we could only provide one experiment. We thus decided to show it for answering the reviewers but not as part of a figure in the final manuscript.

      Figure R2: Rescue of siRNA IRSp53 knock-down with overexpression of IRSp53-GFP: 293T cell were transfected with pGag, pIRSp53 and siRNA control (siCTRL, lane 1) or siRNA IRSp53 (lane 2); cell lysat and VLP wre loaded on SDS-PAGE gels and immunoblots were revealed with anti-GFP (for IRSp53-GFP) and anti-CAp24 (for HIV-1 Gag). One graph on the left shows the percentage of IRSp53-GFP expression upon siRNA IRSp53 cell treatment (lane 2) as compare to the siRNA CTRL (lane 1). The graph on the right shows the resulting gel quantification for the % of Gag-VLP release upon siRNA IRSp53 cell treatment (lane 2) as compare to the siRNA CTRL (lane 1) in the presence of IRSp53-GFP over-expression, or without (lane 3 and 4, as in Figure 1b). N=1 rescue experiment.

      2) In the co-IPs (IRSp53 IP + Gag co-IP) there is no assessment of the IRSp53 IP efficiency in the different conditions. The authors argued that IgG signal masking precluded them from doing that.

      See the new figure 2. In the new figure 2b, we have assess the IP/co-IP of IRSp53-GFP/Gag efficiency by adding a complete experiment showing that an anti-GFP is able to pull down IRSp53- GFP very efficiently (lanes 2 and 3) and co-IP Gag efficiently (lane 3) accordingly to the input and remaining flowthrough. Using IRSp53-GFP and an anti-GFP antibody, we could bypass the IgG signal masking the endogenous IRSp53 with the IRSp53 antibody’s IP.

      3) The authors observed an increase in the membrane-bound pool of IRSp53 when Gag is present (Fig. 2c). It is not clear whether this is specific for IRSp53 or other IBAR proteins can also be more membrane-bound as a result of Gag expression.

      See the new figure 2. In the new figure 2d, we have re-loaded all the gel fractions on new SDS- PAGE gels and probed the corresponding immunoblots for Gag, IRSp53, IRTKS, Tsg101 and the cellular markers, Lamp2 (for membrane fractions) and ribosomal S6 protein (for cytosolic fractions). One can see that after quantification of the IRSp53 versus IRTKS bands in the HEK293T cell control and in the Gag expressing cells, only IRSp53 is increasing at the cell membranes upon Gag expression and not IRTKS.

      Reviewer #3:

      Inamdar et al. used biochemical and microscopy assays to investigate the role of I-BAR domain host proteins on HIV-1 assembly and release from HEK 293T and Jurkat cells. They show that siRNA knockdown of IRSp53, but not a similar I-BAR domain protein IRTKS, inhibits HIV-1 particle release from 293T cells after transfection of the HIV-1 provirus or HIV-1 Gag in cells. The authors then show that HIV-1 Gag associates with IRSp53 in the host cell membrane and cytoplasm, using biochemical assays and super resolution microscopy. In addition, IRSp53 is incorporated into HIV-1 particles along with other previously identified host proteins. Then using in vitro-derived membrane vesicles ("giant unilamellar vesicles" or GUVs), the authors indicate that HIV-1 Gag can associate with IRSp53, particularly on highly curved structures.

      The conclusions are largely supported data, with the virology and biochemical results being particularly strong, but the mechanistic studies in GUVs appear somewhat preliminary and are not entirely clear. The GUV experiments would benefit from better quantification of measurements and manipulation to simulate actual cellular scenarios. In addition, while it is appreciated that the HEK 293T cell line is convenient for biochemical and imaging studies, they are not biologically relevant HIV-1 target cells. While the authors present examples of reproducibility of their results in a CD4+ T cell line, these data are buried in the supplemental figures, whilst it would have been better to highlight them and perhaps include primary CD4+ T cells.

      1) Immortalized cell lines do not always recapitulate primary cells. It is unclear what the role of IRSp53 is in the membrane curvature of CD4+ T cells and whether expression levels and localization are consistent with Jurkat T cells.

      Please consider the general responses to the Editors, which is:

      We have published that IRSp53 (using siRNA) is involved in HIV-1 particle release on primary T cells (PBMC derived T cells) in Thomas et al, JVI 2015, so high probability is that it would be the same in different cell type, transfected HEK293T cells, transfected or infected Jurkat T cells and infected primary T cells. But we have not done the extensive super-resolution microscopy on infected primary T cells because this would require time overconsuming study. We are currently proceeding in setting up condition with an infectious HIV-1 virus carrying mEOS2 photoactivable protein for being able to infect primary T cells and go on for further research using infectious relevant system and super- resolution microscopy, but it is not ready for this current manuscript as it would require months of extra work and experiments.

      Although, we agree with the reviewer #3 that the localization of Gag in Jurkat T cells and in primary CD4 Tc cells is different at the cell level (in primary T cells HIV-1 Gag is more polarized at uropods, as referred in the literature – see for an example Bedi et al/Ono’s Lab), but at the nanoscopic level of the budding sites, chances are that it would be similar but it need to be checked in future studies.

      2) Description of some of the microscopy measurements could be improved. In lines 204-206 of the text and Figure S5, it is unclear how the localization of precision was determined to be approximately 16 nm for PALM-STORM.

      These lines have been changed in the main text as they were not mandatory to understand how we determine the size of the VLP clusters. However, we have now detailed in figure S5 how we measure localisation precision.

      The following text has been added to the legend of the figS5:

      “Distribution of localisation precisions for PALM (in green) or STORM (in red) as given by Thunderstorm analysis in Fiji : Localisation precision distribution exhibit maxima at 16 nm and a mean±sd value of 20±5 nm for PALM, and a maxima of 26 nm, corresponding to a mean±sd value of 27±10 nm for STORM. The localization precision is obtained by eq 17 of (Thompson et al., 2002).”

      As well as the reference of the original paper (Thompson et al. 2002, Biophysical Journal).

      In Figure 4b, it is understood from the text (lines 252-256) that the red bars denote the Mander's coefficient for colocalization of the GFP-tagged proteins with Gag-mCherry (presumably the average of multiple experiments with standard deviations or errors of the mean, although this is not stated in the figure legend), it is unclear what the green bars are showing.

      Yes, the red bars denote the Mander's coefficient for colocalization of the Gag-mCherry with the GFP-proteins, and the green bar denote for colocalization of the GFP-tagged proteins with Gag- mCherry, showing for more than 300 green and red vesicles, thant indeed all the Gag-VLP are green in the case of IRSp53-GFP (red bar) but that not all the GFP-IRSp53-GFP “green” vesicles are (+) for Gag: this indicates that vesicles produced by transfected HEK cells produced GAG/IRSp53 VLP but also IRSp53-GFP vesicles. Thanks to the reviewer to point this out. We added the explanation in the main text (page 12, lanes 272-282) and in the figure legend of Figure 4b.

      Also, the histograms for IRSp53 and IRTKS colocalized with Gag look similar in Figure S10, suggesting that they are not different in Jurkat cells, but this is not addressed.

      Yes. We have now addressed this particular point in the global response to the reviewers. Indeed, the figure 3 and 4 were remodelled into new figure 3 showing, in the same figure, HEK and Jurkat cells results and in figure 4 the simulations results. Overall, the PALM/STORM microscopy analysis results on Gag/IRSp53 colocalization are very similar in both cell types.

      3) GUVs are first referenced on page 7 after description of Figure 2, the significance of which is confusing to the reader. However, the actual experimental data are described on pages 12-13 and Figures 5 and S11. A better description of these structures would be warranted for an audience that is unfamiliar with them. In addition, the biologic concentrations of I-BAR proteins at cell membranes are not provided and it is unclear what conditions used in Figures 5 and S11 represent a "normal CD4+ T cell" situation. It appears that the advantage of this in vitro system is that different factors can be provided or removed to simulate different cellular scenarios. For example, relatively low IRSp53 concentrations may simulate siRNA knockdown experiments in Figure 1, which could recapitulate those results that less viral particles are released from the membrane. In addition, the authors state that HIV-1 Gag preferentially colocalizes with IRSp53 as the tips of the GUV tubular structures (Figure 5b,c), but this is not actually shown or quantified. Similar quantification as shown in Figure 1e could be performed to strengthen this argument.

      We thank the review for pointing this out. We now described all the GUV result in section 5.

      Considering the biological concentrations of I-BAR proteins in cells, to the best of our knowledge, there is no measurement of it. We thus could not relate concentrations used in the GUV experiments with those in cells.

      We could not perform quantification as in Figure 1e because the majority of the tubes in GUVs were moving too rapidly, preventing us from acquiring images with higher spatial resolution (see Fig. S11, and Movie 2 and 3). However, we would like to point out that the Gag signals appeared dotty inside GUVs (see Fig. S11, and Movie 2 and 3), which is very different from the signals of I-BAR that are clearly along the tubes (see Fig. S10c). Moreover, for tubes that were not moving too fast, we found that for all the tubes (17 tubes), Gag signals are exclusively located at the tips of the tubes (see new Fig. 6d). Also, the sorting maps shown in Fig. 6c and Fig. S10 d indicate the relative accumulations of Gag at the tips of the tubes. To make it clearer that the Gag signals were located at the tips of the tubes, in the current manuscript, we have added the new Fig. S11, Movie 1, 2 and 3, and included zoom-in images in Fig. 6b, 6c and a new Fig. 6d. Also, we have included the quantitation results (17 tubes) in the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The lateral entorhinal cortex (LEC) receives direct inputs from the olfactory bulb (OB) but their odor response properties have not been well characterized despite a recent increase in interests in the role of LEC in olfactory behaviors. In this study, Bitzenhofer and colleagues provide unprecedented details of odor response properties of layer 2 cells in LEC. The authors first show that LEC neurons respond to odors with a rapid burst of activity time-locked to inhalation onset, similarly to the piriform cortex (PCx), but distinct from the OB. Firing rates of LEC ensembles conveyed information about odor identify whereas timing of spikes odor intensity. The authors then examined the difference between two major cell types in LEC layer 2 - fan cells and pyramidal neurons, and found that, on average, fan cells responded earlier than pyramidal neurons, and pyramidal neurons, but not fan cells, changed their peak timing in response to changes in concentrations, providing a basis for temporal coding of odor concentrations. Additionally, the authors show that inactivation of LEC impairs odor discrimination based on either identify or intensity, and demonstrate different cellular properties of fan cells and pyramidal neurons. Finally, the authors also examined the odor response properties of hippocampal CA1 neurons, and showed that odor identify can be decoded by firing rate responses, while decoding of odor concentration depended on spike timing.

      The authors performed a large amount of experiments, and provide an impressive set of data regarding odor response properties of LEC layer 2 neurons in a cell type specific manner. The results reported are very interesting, and will be a point of reference for future studies on odor coding and processing in the LEC. The manuscript is clearly written, and data are well analyzed and presented clearly. I have only relatively minor concerns or suggestions.

      1. The authors infer the time at which "mice could discriminate odors" from the time at which d-prime becomes significantly different between baseline and odor stimulation conditions (line 111 and line 121). However, the statistical test applied to these data does not guarantee that an observer can accurately discriminate odors. For example, a small p-value can be obtained even when discrimination accuracy is only slightly above chance if there are many trials. The statement such as "mice could discriminate two odors by as early as 225 ms after inhalation onset" (line 111) can be misleading because this might sound as if mice can accurately discriminate odors at this timepoint, while this is not necessarily the case (as indicated by the d-prime value).

      We have added plots of performance accuracy over time under control conditions (LED off) to Figure 2-supplement 1. These plots of fraction of correct responses (binned every 50 ms) show that mice (n = 6) are making choices significantly different from chance within 200 ms of odor inhalation. We changed the wording in the Results to now say: “Moreover, by analyzing lick timing, we determined that the discriminability measure d’ became significantly different under control conditions as early as 225 ms after inhalation onset and performance accuracy increased within 200 ms of inhalation (Fig. 2b, Figure 2-supplement 1).”

      1. Optogenetic identification can be a little tricky when identifying excitatory neurons as in this study. Please discuss some rational or difficulty regarding how to distinguish those that are activated directly by light from those activated indirectly (i.e. synaptically). Do the results hold if the authors use only those that the authors are more confident about identification?

      We only used the cells that were confidently identified using a combination of two criteria. First, tagged cells had to show a significant increase in firing (p_Rate <0.01) during the 5 ms LED illumination period versus 100 randomly selected time windows before LED stimulation. Cells also had to respond with a fixed latency to reduce the chance of including cells recruited by polysynaptic excitation. Further, we used the stimulus associated spike latency test (SALT) as detailed in Kvitsiani et al., 2013. To be judged as tagged, units had to show significantly less spike jitter during the 5 ms LED illumination than 100 randomly selected time windows before LED stimulation (p_SALT<0.01). Only those cells with BOTH p_Rate<0.01 and p_Salt<0.01 were considered as tagged (both methods typically agreed for most cells). Moreover, slice work testing synaptic connections between LEC layer 2 cells found extremely low levels of connectivity between fan and pyramidal cells Nilssen et al., J. Neuroscience, 2018. This makes it unlikely that LED-induced firing of fan or pyramidal cells would recruit indirectly (synaptically) excited cells.

      1. The authors sort odor response profiles by peak timing, and indicate that odor responses peak at different timing that tiles respiration cycles. However, this analysis does not indicate the reliability of peak timing. Sorting random activity by "peak timing" could generate similar figure. One way to show the reliability or significance of peaks is to cross-validate. For instance, one can use a half of the trials to sort, and plot the rest of the trials. If the peak timing is reliable, the original pattern will be replicated by the other half, and those neurons that are not reliable will lose their peaks. Please use such a method so that we can evaluate the reliability of peaks.

      We analyzed the data as suggested by this reviewer as shown below (Author response image 1). Plotting only the odd trials sorted by the odd trials in the dataset (top) looked identical to the data from all trails used in Figure 1g. More importantly, plotting only the even trials sorted by the odd trials (bottom), though noisier due to trial-by-trial variation, showed the same general structure of tiling throughout the respiration cycle for OB cells.

      Author response image 1

      Reviewer #2 (Public Review):

      In this study, Bitzenhofer et al recorded odor-evoked activity in the LEC and examined the coding of odor identity and intensity using extracellular recordings in head-fixed mice, and used the standard suite of quantitative tools to interpret these data (decoding analyses, dimensionality reduction, etc). In addition, they performed behavioral experiments to show the necessity of LEC in odor identity and intensity discrimination, and deploy some elegant and straightforward 'circuit-busting' slice physiology experiments to characterize this circuit. Importantly, they performed some of their experiments in Ntng1-cre and Calb-cre mice, which allowed them to differentiate between the two major classes of LEC principal neurons, fan cells and pyramidal cells, respectively. Many of their results are contrasted with what has previously been observed in the piriform cortex (PCx), where odor coding has been studied much more extensively.

      Their major conclusions are:

      Cells in the LEC respond rapidly to odor stimuli. Within the first 300 ms after inhalation, odor identity is encoded by the ensemble of active neurons, while odor intensity (more specifically, responses to different concentrations) is encoded by the timing of the LEC response; specifically, the synchrony of the response. These coding strategies have been described in the PCx by Bolding & Franks. Bolding also found two populations of responses to different concentrations: one population of responses was rapid and barely changed with concentration and the second population of responses had onset latencies that decreased with increasing concentration. Roland et al also found two populations of responses using calcium imaging in anesthetized mice: one population of responses was concentration-dependent and another population was 'concentration-invariant'. However, neither Bolding nor Roland were able to determine whether these populations of responses emerged from distinct populations of cells. Here, the authors elegantly register these two response types in LEC to different cell types: fan cells respond early and stably, and pyramidal cells response latencies decrease with concentration. This is a novel and important finding. They also showed that, unlike PCx or LEC where concentration primarily affects timing rather than rate/number, odor concentration in CA1 is only reflected in the timing of responses.

      Using optogenetic suppression of LEC in a 2AFC task, the authors purport to show that LEC is required for both the discrimination of odor identity and odor intensity. If true, this is an important result, but see below.

      In slice experiments, the authors characterize the differential connectivity of fan and pyramidal cells to direct olfactory bulb input, input from PCx, and inhibitory inputs from SOM and PV cells. This work is elegant, novel, and important, although it is a little out of place in this manuscript. As such, their findings are irrelevant/orthogonal to the rest of the results in this study. But fine.

      The simultaneous recordings from three different stations along the olfactory pathway are impressive.

      Major concern

      My major concern with this manuscript regards the behavioral experiments. The authors show that blue light over the LEC in GAD2-Cre/Ai32 mice completely abolishes (i.e. to chance) the mouse's ability to perform a 2AFC task discriminating between either two different odorants or one odorant at different concentrations. Their interpretation is that LEC is required for rapid odor-driven behavior. The sensory component of the task is so easy, and the effect is so striking that I find this result surprising and almost too good to be true. The authors do control for a blue-light distraction effect by repeating the experiments in mice that don't express ChR2, but do not control for the effect of rapidly shutting down a large part of the sensory/limbic system. If they did this experiment in the bulb I would be impressed with how clean the result was but not conceptually surprised by the outcome. I think a different negative control is needed here to convince me that the LEC is necessary for this simple sensory discrimination task. For example, the authors could activate all the interneurons (i.e. use this protocol) in another part of the brain, ideally in the olfactory pathway not immediately upstream of the LEC, and show that the behavior is not affected.

      This reviewer suggests a negative control experiment for the effects we observe on behavior when optogenetically silencing LEC. However, we disagree that it would be informative to silence other olfactory pathways in search of those that do not affect behavior. Our strong effects on behavior are also in complete agreement with recent findings that muscimol inactivation of LEC abolishes discrimination of learned odor associations (Extended Data Figure 8, Lee et. al., Nature, 2021).

      More specifically, both the presentation and the interpretation of the data are confusing. First, there is a lack of detail about the behavioral task. I was not sure exactly when the light comes on and goes off, when the cue was presented, and when the reward was presented. In the manuscript they say (line 108) "…used to suppress activity during odor delivery on a random subset…". There is nothing more about this in the figure legend or Methods. The only clue to this is the dotted line in the 'LED On' example at the bottom of Fig. 2a. The authors also say that (line 660) "Trials were initiated with a 50 ms tone." When exactly was the tone presented? In the absence of any other information, I assume it was presented at odor onset. When was the reward presented? Lines 106-7 say "Mice were free to report their choice (left or right lick) at any time within 2 s of odor onset." Presumably this means the reward was presented to one of the ports for 2 seconds, starting at odor onset.

      The LED is applied during odor delivery, the 50 ms tone immediately precedes odor delivery, and water reward is dispensed after the first lick at the correct lick port during the choice period. The choice period begins with the odor onset and odor delivery is terminated by the first lick at either the correct or incorrect port. If there is no lick at either port, odor delivery lasts 1s and is followed by an extended choice period (terminated by correct or incorrect lick) lasting 1s. To clarify the behavior protocol, we have included a schematic of the trial structure in Figure 2-supplement 1.

      These details matter because the authors want to claim that "LEC is essential for rapid odor-driven behavior." The data presented in support of this claim are (1) that mice perform this task at chance levels in LED On trials, presumably based on which port the mouse licked first (this is the 'essential' part), and (2) that in control in LED Off trials, d' becomes statistically different from baseline after ~200 ms (this is the 'rapid' part).

      To further support the argument that LEC is required for rapid odor-driven behavior, we now show a plot of % correct responses over time from first odor inhalation.

      On first reading, these suggested that shutting off LEC makes odor discrimination worse and/or slower. However, the supplementary data clarifies several things. First, the mice never Miss (Fig.2S.2a & c), meaning then they always lick. Second, in LED Off trials (F2S2 & e), the mice make few mistakes, and these only occur immediately after inhalation, presumably meaning the mice occasionally guess, possibly in response to the auditory cue. Thus, the mean time to lick is much shorter for Error trials than Correct trials. To state the obvious, the mice often wait >300 ms before they lick, and when they do wait, they never make mistakes. Now, in the LED On trials, the mice almost always lick within the first 300 ms and perform at chance levels, with the distribution of lick times for Correct and Error trials almost overlapping. In fact, although the authors claim LEC is required for rapid odor discrimination, the mean time to lick on Correct trials appears to decrease in LED On trials. This makes me think that the mice are making ballistic guesses in response to the tone in LED On cases, which doesn't necessarily implicate a dependence on LEC for odor discrimination.

      We do not believe that mice are making ballistic guesses in response to the tone for LED on trials. First, although a 50 ms tone immediately precedes odor delivery, all data in Figure 2-supplement 1 shows lick times aligned to the first inhalation of odor. Thus, time 0 ms is not the tone or subsequent odor onset but rather a variable time point coinciding with the first odor inhalation (the delay from odor onset to first inhalation is ~300 ms, the average respiration interval under our conditions). In fact, we excluded trials if mice made premature licks between the time of odor onset and first odor inhalation. We re-analyzed these trials to test the reviewer’s idea that mice were more likely to make fast ballistic guesses when the LEC was silenced. However, we saw no evidence that mice made more premature licks in trials with LED on (Author response image 2).

      Author response image 2

      The authors' interpretation of their data would be more solid if, for example, there were a delay between the auditory cue and odor delivery and/or if the reward was only available with some delay after the odor offset. Here, however, it seems just as likely as not that the mice are making ballistic guesses in response to the tone in LED On cases, which doesn't necessarily involve dependence on LEC for odor discrimination. Here, the divergence of d' from baseline in the control (i.e LED Off) condition seems mostly because mice take longer to correctly discriminate under control conditions. While this is not formally contradictory to LEC is essential for rapid odor-driven behavior", it is nevertheless a bit contrived and misleading. An interesting (thought) experiment is what would happen if the authors presented a tone but no odor. I would guess that the mice would continue licking randomly in Light On trials.

      While a delay between odor delivery and reward would have been useful for some aspects of interpreting the behavior, we would have lost the ability to examine the role of LEC in response timing. To address this reviewer’s concern, we have added a section to the Discussion mentioning caveats related to the interpretation of experiments using acute optogenetic silencing to understand behavior.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the reviewer for carefully reading of the manuscript and for the insightful criticisms and comments. In the following we address them point by point.

      The community assembly process is modelled in a very specific way, and the manuscript would benefit from an expanded ecological motivation of the processes that are being mimicked, and thereby explain more clearly what taxonomic level of organization is being considered.

      We follow the more recent trait-based approach that shifts the focus from species (and the many traits by which they differ from one another) to groups of species that share the same values of selected functional traits. Since the general context is ecosystem response to drier climates, we choose the functional traits to include a response trait associated with stress tolerance and an effect trait associated with biomass production. We further assume a tradeoff between the two traits which is well supported by earlier studies (see e.g. Angert et al. 2009, https://doi.org/10.1073/pnas.0904512106). So, indeed, the choice we make in characterizing the community is quite specific, but it is highly relevant to the ecological context considered of dryland plant communities where plants compete primarily for water and light. The taxonomic level we consider is species except that we group them in a manner that is more transparent to questions of ecosystem function, ignoring differences between species that are not significant to these questions.

      We expanded considerably the text in the section “Modeling spatial assembly of dryland plant communities” to clarify the ecological motivation of the processes we model.

      In addition, it would be useful if the authors could provide further clarification as to what extent the community diversity dynamics can be separated from total biomass dynamics of patterned water-limited ecosystems given the current approach. These points are explained in further detail below.

      The model describes the dynamics of all functional groups, which provides the biomass distribution 𝐵 = 𝐵(𝜒) in trait space (in the case of patterned states we first integrate over space). That distribution contains information about various community-level properties, including functional diversity (richness, evenness) as figure 3 in the revised manuscript illustrates, and total biomass, which is the area below the distribution curve. The two types of dynamics are tightly connected and cannot be separated, but in principle the approach can be used to study the relationships between diversity and total biomass by calculating biomass distributions along the rainfall gradient and extracting the two properties from the distributions.

      We added in the section “Modeling spatial assembly of dryland plant communities” the information that the biomass distribution also contains information about the total biomass.

      First, it was not entirely clear to this reviewer how the reaction parts of the model equations determine the optimal trait value χ, and how this value varies as a function of precipitation.

      The ‘optimal’ trait value 𝜒𝑚𝑎𝑥 is determined by the interspecific interactions that the model captures, which divide into ‘direct’ and ‘indirect’ interactions. The direct interactions are captured by the dependence of the growth rate Λ𝑖 of the ith functional group (see Eq. (1a)) on the aboveground biomass values of all functional groups, Λ𝑖 = Λ𝑖(𝐵1,𝐵2,… , 𝐵𝑁) (see Eq. (2)). This dependence represents competition for light (taller plants are better competitors) and includes the effect of self-shading. The indirect interactions are through the water uptake term in the soil-water equation (1b) (2nd term from right) and the water dependence of the biomass growth term in Eq. (1a). These terms represent competition for water. For a given precipitation value 𝑃 the net effect of these interspecific interactions result in a particular functional group 𝜒𝑚𝑎𝑥 which is most abundant. For spatially uniform vegetation, as 𝑃 is increased 𝜒𝑚𝑎𝑥 moves to lower values. The precipitation increases surface water (Eq. (1c)) and consequently the amount of water 𝐼𝐻 infiltrating into the soil. The increased soil water gives competitive advantage to species investing in growth, mainly because they better compete for light as they grow taller, and therefore 𝜒𝑚𝑎𝑥 decreases.

      … it is then not immediately clear why the most successful trait class is not outcompeting the other classes.

      With the current model and parameters set the most successful trait does eventually outcompete all other traits, when trait diffusion is set to zero, 𝐷𝜒 = 0. This is, however, a very long process because the most successful trait suffers from self-shading at late growth stages, which slows down its growth and allows nearby traits to survive for a long time. Choosing a finite but very small 𝐷𝜒 values that represent mutations occurring on evolutionarily long times counteracts the exclusion process and results in a stationary asymptotic community, as Fig. 3 in the revised manuscript shows (this behavior is reminiscent of optical solitons, where self-focusing instability is balanced by dispersion). We note that modeling stronger growth-inhibiting factors, such as pathogens, by including a factor of the form (1 − 𝐵𝑖/𝐾) to the growth rate, results in an asymptotic stationary community also for 𝐷𝜒 = 0 (see also earlier studies Nathan et al. 2016, Yizhaq et al. 2020).

      We revised original Fig. 4 (now Fig. 3) by adding a new part (Fig. 3a) that shows the exclusion process for 𝐷𝜒 = 0, and the effect of the counter-acting process of trait diffusion, which results in an asymptotic distribution of finite width (Fig. 3b) from which community level properties such as functional diversity can be derived. We also extended the text in section “Modeling spatial assembly of dryland plant communities” (last paragraph) to clarify the two counter-acting processes of exclusion because of interspecific competition for water and light, and trait diffusion driven by mutations, which together culminate in an asymptotic biomass distribution along the 𝜒 axis of finite width.

      The authors model trait adaptation through a diffusion approximation between trait classes. That is, every timestep, a small amount of biomass flows from the class with higher biomass to the neighboring trait class with lower biomass. From an ecological point of view, it seems that this process is describing adaptation of vegetation that is already present, so this process seems to be limited to intraspecific phenotypic plasticity. From the text, however, it seems that the trait classes correspond to higher taxonomic levels of organization, when describing shifts from fast growing to stress-tolerant species, for example. It is not entirely clear, however, how biomass flows as assumed in the model could occur at these higher levels of organization.

      We do not study in this work adaptation through diffusion in trait space. That kind of adaptive dynamics can indeed be studied with the current model, but with different initial conditions, namely, initial conditions corresponding to a single resident trait where the biomass of all other traits is zero. The resulting dynamics of mutations and succession are then very slow, occurring on evolutionarily long time scales set by the small value of 𝐷𝜒 (e.g. 10−6). In this study the initial conditions represent the presence of all traits, even if at very low biomass values that may represent a pool of seeds that germinate once environmental conditions allow. For a given precipitation value 𝑃, the functional traits we consider determine which functional groups (of species) overcome environmental filtering and grow, and which of the growing traits survive the competition for water and light. These are relatively fast processes, occurring on ecological time scales, which determine the emerging community. At longer times this community is further shaped by slow processes of interspecific competition among species of similar traits and by trait diffusion (mutations). A final remark about phenotypic changes: although in general 𝜒 can be interpreted as representing different phenotypes, the choice of very small values for 𝐷𝜒 cannot represent relatively fast phenotypic changes and restricts the context to mutations at the taxonomic level of species.

      We added an explanation in the 3rd paragraph of the section “Modeling spatial assembly of dryland plant communities” of the need to consider mutations and the role they play in our study.

      Combining the observations from the previous two points, there is a concern that for a given level of precipitation, there is a single trait class with optimal biomass/lowest soil water level that is dominant, with the neighboring trait classes being sustained by the diffusion of biomass from the optimal class to neighboring inferior classes. This would seem a bit problematic, as it would mean that most classes are not a true fit for the environment, and only persist due to the continuous inflow of biomass. Taking a clue from the previous papers of the authors, it seems this may not be the case, though. Specifically, in the paper by Nathan et al. (2016) it seems that all trait classes are started at low initial biomass density, and the resulting steady state (in the absence of biomass flows between classes) seems to show similar biomass profiles as shown in Figs. 4,5 and 7 of the current paper. While the current model formulation seems slightly different, similar results may apply here. Indeed, keeping all trait classes at non-zero (but low) density, and when the (abiotic and biotic) environment permits, let each class increase in biomass seems like the most straightforward approach to model community assembly dynamics. Given the above discussion about these trait classes competing for a single resource (soil water), and one trait class being able to drive this resource availability to the lowest level, it would then be useful to readers to explain why multiple trait classes can coexist here, and how(for spatial uniform solutions) the equilibrium soil water level with multiple trait classes present compares to the equilibrium soil water level when only the optimal trait class is present. Furthermore, if results as presented in Nathan et al. (2016) indeed hold in the current case, perhaps it means that the biomass profile responses as shown in e.g. Fig. 5 would also occur if there was no biomass flow between trait classes included, but that the time needed to adjust the profile would take much longer as compared to when the drift term/second trait derivative is included. In summary, further clarification of what the biomass flows between classes represent, and the role it plays in driving the presented results would be useful for readers.

      As explained in the reply to previous comments the asymptotic community is tuned by a balance between two slow counter-acting processes, interspecific competition among similar traits and mutations over evolutionarily long time scales. However, the community structure is largely determined by much faster processes of environmental filtering and interspecific competition among widely distinct traits, as all traits are initially present. Indeed, comparing the biomass distributions in new Fig. 3, with and without trait diffusion indicates that the community composition, as measured by 𝜒𝑚𝑎𝑥, is the same. Trait diffusion, however, does affect functional diversity, along with environmental factors. In that sense the emerging community is a true fit for the environment.

      We thank the reviewer for these thoughtful comments, which helped us realize that our presentation of these issues was too concise and unclear. We believe that the new extended section on modeling spatial assembly of dryland plant communities, and the new figure 3a clarify these issues.

      In addition, it would be useful for readers to understand to what extent the shifts in average trait values and functional diversity can be decoupled from the biomass and soil water responses to changes in precipitation that would occur in a model with only a single biomass variable. For example, early studies on self-organization in semi-arid ecosystems already showed that the shift toward a patterned state involved the formation of patches with higher biomass, and higher soil water availability, as compared to the preceding spatially uniform state, and that the biomass in these patches remains relatively stable under decreasing rainfall, while their geometry changes (e.g. Rietkerket al. 2002). It has also been observed that for a given environmental condition, biomass in vegetation patches tends to increase with pattern wavelength (e.g. Bastiaansen and Doelman 2018; Bastiaansen et al. 2018). Given the model formulation, one wonders whether higher biomass in the single variable model is not automatically corresponding to higher abundance of faster growing species and a higher functional diversity (as the diffusion of biomass can cover a broader range when starting from higher mass in the optimal trait class). There are some indications in the current work that the linkage is more complicated, for example, the biomass peak in Fig. 7c is lower, but also broader as compared to the distribution of Fig. 7b, but it is currently not entirely clear how this result can be explained (for example, it might be the case that in the spatially patterned states, the biomass profiles also vary in space).

      We are not sure we understand what the reviewer means by “decoupled”, but much insight indeed can be gained from a study of a model for a single functional group (trait) and observing the behaviors described by the reviewer. In fact, these behaviors, which some of us are familiar with from numerical studies, motivated parts of the current study. Higher biomass in vegetation patches (compared to uniform vegetation) in the single trait model does not automatically imply a shift to faster growing species; in principle the stress-tolerant species that already reside in the system when uniform vegetation destabilizes to a periodic pattern can simply grow denser. To answer this and additional questions we need to take into account interspecific interactions by studying the full community model. As to Fig. 7b,c, the behavior appears to be opposite to that described by the reviewer: the biomass pick in Fig. 7c is higher and narrower than that in Fig. 7b, not lower and broader. This is because of the much larger domain of the patterned state as compared with that of the uniform state, which increases the abundance of low-𝜒 species, i.e. species investing in growth.

      The increase of biomass in vegetation patches with pattern wavelength for given environmental conditions, as observed by Bastiaansen et al. 2018, is actually another mechanism for increasing functional diversity. This is because the water stress at the patch center is higher than that in the outer patch areas and thus forms favorable conditions for stress tolerant species while the outer areas form favorable conditions for fast growing species.

      We added a new paragraph in the Discussion and Conclusion section (last paragraph in the subsection Insight III) where we discuss the effect of coexisting periodic patterns of different wavelengths on functional diversity and ecosystem management. We also added citations to the references the reviewer mentioned.

      The possibility of hybrid states, where part of the landscape is in a spatially uniform state, while the other part of the landscape is in a patterned state, is quite interesting. To better understand how such states could be leveraged in management strategies, it would be useful if a bit more information could be provided on how these hybrid states emerge, and whether one can anticipate whether a perturbation will grow until a fully patterned state, or whether the expansion will halt at some point, yielding the hybrid state. It seems that being able to distinguish this case would be necessary in the design of planning and management strategies

      The hybrid states appear in the bistability range of the uniform and patterned vegetation states, and typically occupy most of this range. Their appearance is related to the behavior of ‘front pinning’ in bistability ranges of uniform and patterned states in general. Front pinning refers to fronts that separate a uniform domain and a periodic-pattern domain, which remain stationary in a range of a control parameter (precipitation in our case). This is unlike fronts that separate two uniform states, which always propagate in one direction or another and can be stationary only at a single parameter value – the Maxwell point. Thus, an indication that a given landscape may have the whole multitude of hybrid states is the presence of a front (ecotones) that separates uniform and patterned vegetation. If that front appears stationary over long period of times (on average), this is a strong indication.

      We added a new paragraph in the subsection Insight III of the Discussion and conclusion section to clarify this point.

      Also, in Fig. 3a, the region of parameter space in which hybrid states occur is not very large; it is not entirely clear whether the full range of hybrid states is left out here for visual considerations, or whether these states only occur within this narrow range in the vicinity of the Turing instability point.

      As pointed out in the reply to the previous comment the hybrid states are limited to the bistability range of uniform and patterned vegetation, which is not wide. However, this should not necessarily restrictma nagement of ecosystem services by nonuniform biomass removal, as such management will have similar effects on community structure also outside the bistability range where front propagate slowly.

      The new paragraph we added also addresses this point.

      Reviewer #2 (Public Review):

      We thank the reviewer for carefully reading the manuscript and for the constructive criticisms and comments. In the following we address them point by point.

      1) Model presentation.

      It would be better to explain the model in ecological terms first, clarifying parameter biological meaning and justifying their choice. In doing so, creating a specific 'Methods' section, which now is lacking, would be of help too. Authors should clarify whether and how the model follows the conservation of mass principle involving precipitation and evapotranspiration. Are root growth and seed dispersal included for this purpose? Why they are not referred to any further in the analysis and discussion? Why a specific term for plant transpiration is not included, or is to somehow phenomenologically incorporated into the growth-tolerance tradeoff? In doing so, authors should also pay attention to water balance as above (H) and below (W) ground water are not independent from each other.

      We added a Methods section, which in eLife is placed at the end of the manuscript. The section includes the model equations and more detailed explanations in ecological terms of various parts of the model. We also added Table 1 with a list of all model parameters, their descriptions, units and numerical values used in the simulations. Presenting the model at the end of the manuscript suits more technical information about the model, but not essential information that is needed for understanding the results. We therefore kept the subsection “A model for spatial assembly of dryland plant communities” in the Results section, where we present that information.

      There is no conservation of mass in the model (and all other models of this kind) simply because the system that we consider is open. In particular, it does not include the atmosphere, which constitute part of the system’s environment. Including the atmosphere as additional state variables in the model, capturing the feedback of evapotranspiration on the atmosphere, would make the model too complicated for the kind of analysis we perform. So, although the model contains parts that represent mass conservation such as the terms describing below- and above-ground water transport, water mass is not conserved. The biomass variables represent aboveground biomass of living plants or plant parts and are not conserved either as biomass production involve biochemical reactions that convert inorganic substances coming from the system’s environment (atmosphere and the soil) into organic ones, while plant mortality involves organic matter that leaves the system.

      Roots in the model platform we consider are modeled indirectly through their relation to aboveground biomass. That relation constitutes one of the scale-dependent feedbacks that produce a Turing instability to vegetation patterns, the so-called root-augmentation feedback (see Meron 2019, Physics Today), but in this particular study we eliminate this feedback for simplicity. The scale-dependent feedback that we do consider is the so-called infiltration feedback, associated with biomass-dependent infiltration rate that produces overland water flow towards vegetation patches, as explained in the subsection “A model for spatial assembly of dryland plant communities”. It will be interesting indeed to extend the study in the future to include also the root-augmentation feedback.

      We assume short-range seed dispersal and take it into account through biomass “diffusion” terms (obtained as approximations of dispersal kernels assuming narrow kernels). These terms play important roles in the scale-dependent feedback that induces the Turing instability, as is explained in earlier papers which we cite. Plant transpiration is modeled through the water uptake term in the equation for the soilwater 𝑊. Indeed above-ground water 𝐻 and below-ground water 𝑊 are not independent; the infiltration term IH in the equations for both state variables account for this dependence in a unidirectional manner (loss of 𝐻 and gain of 𝑊). As we do not include the atmosphere in the model the other direction, namely, evapotranspiration that increases air humidity and affects rainfall, is not accounted for. The neglect of this effect can be justified for sparse dryland vegetation.

      These good points have already been discussed in many earlier papers as well as in the book Nonlinear Physics of Ecosystems (Meron 2015), and we cannot address them all in this paper. We did however add several clarifications in the section Modeling spatial assembly of dryland plant communities and in the new Methods section, including the consideration of the atmosphere as the system’s environment quantified by the precipitation parameter 𝑃.

      Another unclear point is that growth rates for the same plant functional groups are assumed to be constant among different species within the same group and are confounded by biomass production. Why is that the case? Furthermore, how many different species are characterizing each functional group? How are interspecific interactions accounted for (more specifically, see comment below)?

      In the trait-based approach we focus on just two functional traits, related to growth rate and tolerance to water stress, ignoring differences in other traits that distinguish species. That is, a given functional group consists of species that share the same values of the two selected functional traits (to a given precision determined by 𝑁), taking all other traits represented in the model to be equal. In this approach we do not care about how many species belong to each functional group, only their total biomass. We wish to add that simplifying assumptions of this kind are necessary if we want the model to be mathematically tractable and capable of providing deep insights by mathematical analysis.

      We expanded the discussion of the trait-based approach in the section Modeling spatial assembly of dryland plant communities and added relevant references (second paragraph).

      Finally, stress tolerance is purely phenomenological. There is no actual mechanism/parameter describing it. Rather, it "simply" appears as low/high mortality, which in turn is said to be due to high/low tolerance. This leads to a sort of circularity between mortality and tolerance. Yet, mortality can occur due to other biophysical factors (e.g. disturbance, fire, herbivory, pathogens). A drawback of this assumption is that a mechanism of drought tolerance is often to invest in belowground organs, including roots. However, according to the proposed model, it turns out that fast growing species with low investment in tolerance also have high investment in roots; vice versa, tolerant species have low investment in roots. This is a bit counterintuitive and not well biologically supported.

      First, we agree with the reviewer that our approach is purely phenomenological, as we model tolerance to water stress by a single parameter that lumps together the effects of various physiological mechanisms. That parameter can be distinguished from other factors affecting mortality by regarding the constant 𝑀𝑚𝑎𝑥 in Eq. (3) as representing several contributions. Since we do not study the effects of these other factors we can absorb them in 𝑀𝑚𝑎𝑥 for mathematical simplicity. Tolerance to water stress is not necessarily associated with roots. Plants can better tolerate water stress by reducing transpiration through stomatal closure, regulating leaf water potential, or develop hydraulically independent multiple stems that lead to a redundancy of independent conduits and higher resistance to drought (see Schenk et al. 2008 - https://doi.org/10.1073/pnas.0804294105).

      We added a discussion in the Methods section (5th paragraph, “Tolerance to water stess …”) of the simple form by which we model tolerance to water stress through the mortality parameter.

      2) Parameter choice.

      N = 128 is an extremely high number for plant functional groups. It is even quite unrealistic to have 128 species per square meter, so this value is not very reasonable. Please run the model and report results with more realistic N (e.g from 4-64) as well as with different sets of N values keeping all other parameters constant.

      We wish to clarify two points: 1) N=128 does not imply 128 functional groups per square meter; the emerging community has much lower functional richness (FR) as the average FR is around 0.25, meaning only 128 × 0.25 = 32 functional groups. 2) The model results, as reflected by the key metrics 𝜒𝑚𝑎𝑥, 𝐹𝑅, and 𝐹𝐸, are independent of the particular value of N (for N values sufficiently large), as Figures IA and IB below show. The biomass 𝐵𝑖 of each functional group, however, does change (Figure IA) because by changing N we change the range of traits Δ𝜒 = 1/𝑁 that belong to a given functional group. But if we look at the biomass density in trait space 𝑏𝑖, related to 𝐵𝑖 through the relation 𝐵𝑖 = 𝑏𝑖Δ𝜒, then also the biomass density is independent of 𝑁 as Figure IB shows. So, even if in practice there are less functional groups and thus species as considered in the model studies, the results are not affected by that. On the other hand, choosing higher 𝑁 values provides smoother curves and nicer presentation of our results.

      Figure IA

      Figure IB

      We added a discussion of this issue in the Methods section after Eq. (2).

      Gamma (rate of water uptake by plants' roots): why is it in that unit of m^2/kg * y? Why are you now considering the area (and not the volume) per biomass unit?

      The vegetation pattern formation model we study, like most other models of this kind, does not explicitly capture the soil depth dimension. Accordingly, W is interpreted as the soil-water content in the soil volume below a unit ground area within the reach of the plant roots. In practice W has units kg/m2, like B, and since Γ𝑊𝐵 should have the same units as 𝜕𝑊/𝜕𝑡 (see Eq. 1b), Γ must have the units of (𝐵𝑡)−1.

      A is not defined in the text.

      We now define it in Table 1 (see Methods section).

      M min: why 0.5 mortality? Having M max set to 0.9, please consider a lower mortality value set to 0.1, and please report evidence(hopefully) demonstrating the robustness of results to such change.

      The results are robust to the particular values of 𝑀𝑚𝑖𝑛 and 𝑀𝑚𝑎𝑥, except that there are combinations of these two parameters for which the biomass distributions are pushed towards the edge of the 𝜒 domain, which make the presentation of the results less clear. Figure II shows results of recalculations of the distribution 𝐵 = 𝐵(𝜒) for 𝑀𝑚𝑖𝑛 = 0.1, as requested (using 𝑀𝑚𝑎𝑥 = 0.15) for 3 different precipitation values. As the reviewer can see there’s no qualitative change in the results: lower precipitation push a uniform community to stress tolerant species (higher 𝜒), while the formation of patterns at yet lower precipitation push the community back to fast growing species (low 𝜒).

      Figure II

      K_min and K_max are in two different units, and should both be kg/m^2.

      Thanks, we fixed this typo in Table 1.

      Values of precipitation (P, mean annual precipitation) are not reported.

      The precipitation parameter is variable, as is now stated in Table 1, and therefore was not include it in the list of parameters’ values used. Whenever a particular precipitation value has been used our intention was to state it in the caption of the corresponding figure. This was done in Figs. 5,6,7, but indeed not in Fig. 4 (Fig. 3 in revised ms.). The insets on the right side of Fig. 3 (Fig. 4 in revised ms.) where also calculated for particular precipitation values, but that information is not essential as the intention is to show typical forms of the various solution branches, which do not qualitatively change along the branches (i.e. at different P values).

      We added the precipitation value (P=180mm/y) at which all the biomass distributions shown in new Fig. 3 (Fig. 4 in original ms) were calculated.

      3) Results presentation and interpretation.

      Parameter range of precipitation in figure 3 is odd. Why in one case precipitation ranges from 0 to 160 while in another it is only 60-120? Furthermore, in paragraph 198-213 and associated results in fig. 5. the Choice of precipitation values is somehow discordant from the previous model. Please provide motivation for this choice, clarify and uniformize it.

      In Fig. 3b (Fig. 4b in revised ms) we restricted the precipitation range to 60-120 as the curves, which are limited to 0 < 𝜒 < 1 (by the definition of 𝜒), do not extend to 𝑃 < 60 and to 𝑃 > 120. Extending the range to 0 < 𝑃 < 160 would make the figure less compact and nice as it will contain blank parts with no information.

      We are not sure we understand what the reviewer means by “is somehow discordant from the previous model”. The motivation of the choices we made for the precipitation values P=150, 100 and 80 was to show the shift of a spatially uniform community to a higher 𝜒 value as the precipitation is decreased to a lower value (from 150 to 100), and the shift back to a lower 𝜒 value at yet lower precipitation (80) past the Turing instability.

      Finally, authors seem to create confusion around community composition, which is defined as the (taxonomic) identity of all different species inhabiting a community. Notably, it is remarkably different from the x_max parameter used in the model, which as a matter of fact is just the value of the most productive (notably, not necessarily the most abundant) functional group.

      We thank the reviewer for this comment. Since all the emerging communities in the model studies are pretty localized around the value of 𝜒𝑚𝑎𝑥, that value does contain information about the identity of other functional groups in the community when complemented by FR (functional richness) and FE (functional evenness). More significantly to our study, shifts in 𝜒𝑚𝑎𝑥 represent the shifts in community composition we focus on in this study, i.e. shifts towards fast growing species or towards stress-tolerant species.

      We modified the description of the community-level properties that can be derived from the biomass distribution in trait space (see modified text towards the end of the section “Modeling spatial assembly …” and also the caption of Fig. 3b), explaining that both functional diversity and community composition can be described by several metrics, and clarifying the significance of 𝜒𝑚𝑎𝑥 in describing community-composition shifts.