5,945 Matching Annotations
  1. Oct 2025
    1. Author Response

      We are grateful for the constructive comments of the reviewers. Here is a provisional response to major questions.

      To Question 1, we appreciate that you point out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig 2H-2I, in previous manuscript) whose morning anticipation still exist at some level (Fig a) though the decreases of morning anticipation index (Fig b) and advanced evening activity were not as pronounced as observed in han5304 (Fig 3C Hyun et al., 2005), our response is that the difference between pan-neuronal knockout of PDFR by unmodified Cas9 might be caused by the limited efficiency of unmodified Cas9 in our conditional system. We will adjust the relevant conclusions in the revised version, and these findings underscore the necessity to enhance the efficiency of the original Cas9

      Author response image 1.

      To Question 2, that some expression profiles of clock neurons are not consistent with previous reports, such as Dh31 and ChAT in s-LNvs, our response is that the differences can be attributed to the variation in expression patterns between 3’ terminal KI-LexA (used in this gene expression dissection) and KO-GAL4, KI-GAL4, or transgenic GAL4. We have indeed observed differences when identical sites were inserted in frame with Gal4 or LexA.

      To Question 3, that our description of advanced morning anticipation versus no morning anticipation with the term "opposite" is not accurate enough, our response is that we will modify that. Mutants of CNMa or CNMaR exhibit advanced morning activity, suggesting an inhibitory role of CNMa/CNMaR. Mutants of Pdf/Pdfr, on the other hand, showed no morning anticipation, indicating a promoting role in morning anticipation.

      To Question 4, whether we have generated transgenic UAS-sgRNA flies for all CCT genes or only a subset, our response is that we have indeed generated UAS-sgRNA flies for all CCT genes.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptor of citalopram in the previous report, the authors focused on exploring the potential of the immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against a tumor. Although the data is informative, the rationale for working on additional mechanisms and logical links among different parts is not clear. In addition, some of the conclusion is also not fully supported by the current data.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed an immune regulatory role on TAM via a new target C5aR1 in HCC.

      Thank you for your constructive comments. We believe that further investigation into the mechanisms by which citalopram modulates TAM function could provide valuable insights into its potential role in HCC therapy.

      Weaknesses:

      (1) The authors concluded that citalopram had a 'potential immune-dependent effect' based on the tumor weight difference between Rag-/- and C57 mice in Figure 1. However, tumor weight differences may also be attributed to a non-immune regulatory pathway. In addition, how do the authors calculate relative tumor weight? What is the rationale for using relative one but not absolute tumor weight to reflect the anti-tumor effect?

      We appreciate your insights into the potential contributions of non-immune regulatory pathways to the observed tumor weight differences between Rag-/- and C57 mice, and we will further address this issue in our discussion. The relative tumor weight was calculated by assigning an arbitrary value of 1 to the Rag1<sup>-/-</sup> mice in the DMSO treatment group, with all other tumor weights expressed relative to this baseline. As suggested, we will include absolute tumor weight data in our revised manuscript.

      (2) The authors used shSlc6a4 tumor cell lines to demonstrate that citalopram's effects are independent of the conventional SERT receptor (Figure 1C-F). However, this does not entirely exclude the possibility that SERT may still play a role in this context, as it can be expressed in other cells within the tumor microenvironment. What is the expression profiling of Slc6a4 in the HCC tumor microenvironment? In addition, in Figure 1F, the tumor growth of shSlc6a4 in C57 mice displayed a decreased trend, suggesting a possible role of Slc6a4.

      To identify the expression patterns of Slc6a4 in different cellular contexts within the HCC tumor microenvironment, we will conduct a thorough screening of HCC datasets that include single-cell sequencing analysis. The possible role of Slc6a4 on tumor growth will be verified with in vitro loss-of-function experiments.

      (3) Why did the authors choose to study phagocytosis in Figures 3G-H? As an important player, TAM regulates tumor growth via various mechanisms.

      Thank you for your question. We focused on this aspect because citalopram targets C5aR1-expressing TAM. C5aR1 is a receptor for complement component C5a, and complement components play a significant role in mediating the phagocytosis process in macrophages. In the revised manuscript, we will emphasize this rationale clearly.

      (4) The information on unchanged deposition of C5a has been mentioned in this manuscript (Figures 3D and 3F), the authors should explain further in the manuscript, for example, C5a could bind to receptors other than C5aR1 and/or C5a bind to C5aR1 by different docking anchors compared with citalopram.

      Thank you for your insightful comment. First, we will investigate the docking anchors involved in the binding of C5a to C5aR1 and compare these interactions with those of C5aR1 and citalopram. Additionally, we will discuss the potential binding of C5a to other receptors, providing a broader perspective on the signaling mechanisms.

      (5) Figure 3I-M - the flow cytometry data suggested that citalopram treatment altered the proportions of total TAM, M1 and M2 subsets, CD4+ and CD8+T cells, DCs, and B cells. Why does the author conclude that the enhanced phagocytosis of TAM was one of the major mechanisms of citalopram? As the overall TAM number was regulated, the contribution of phagocytosis to tumor growth may be limited.

      As suggested, we will restate the conclusion to enhance clarity and better articulate the relationship between citalopram treatment, TAM populations, and their phagocytic activity. Thank you for your valuable input.

      (6) Figure 4 - what is the rationale for using the MASH-associated HCC mouse model to study metabolic regulation in CD8+T cells? The tumor microenvironment and tumor growth would be quite different. In addition, how does this part link up with the mechanisms related to C5aR1 and TAM? The authors also brought GLUT1 back in the last part and focused on CD8+T cell metabolism, which was totally separated from previous data.

      We chose the MASH-associated HCC mouse model because it closely mimics the etiology of metabolic-associated fatty liver disease (MAFLD), which is a significant contributor to the development of cirrhosis and HCC. The inclusion of CD8<sup>+</sup> T cells in our study is based on the understanding that citalopram targets GLUT1, which plays a crucial role in glucose uptake. CD8<sup>+</sup> T cell function is heavily reliant on glycolytic metabolism, making it essential to investigate how citalopram’s effects on GLUT1 influence the metabolic pathways and functionality of these immune cells. The data presented in this section primarily aim to demonstrate how citalopram influences peripheral 5-HT levels, which subsequently affects CD8<sup>+</sup> T cell functionality. By linking these findings, we will clarify how citalopram impacts both TAM and CD8<sup>+</sup> T cells. In the revised manuscript, we will enhance the background information and provide relevant data support to avoid any gaps.

      (7) Figure 5, the authors illustrated their mechanism that citalopram regulates CD8+T cell anti-tumor immunity through proinflammatory TAM with no experimental evidence. Using only CD206 and MHCII to represent TAM subsets obviously is not sufficient.

      As suggested, more relevant experimental data will be included in the revised manuscript to better characterize the TAM populations and their roles in mediating the effects of citalopram on CD8<sup>+</sup> T cells.

      Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target. However, certain aspects of experimental design and clinical relevance could be further developed to strengthen the study's impact.

      Thank you for your thoughtful review and constructive feedback, and we look forward to improving our manuscript accordingly.

      Strength:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a thorough strategy for HCC therapy. By emphasizing the potential for existing drugs like citalopram to be repurposed, the study also underscores the feasibility of translational applications.

      Your insights reinforce the significance of our findings, and we will ensure that these points are clearly articulated in the revised manuscript to enhance its impact.

      Major weaknesses/suggestions:

      The dataset and signature database used for GSEA analyses are not clearly specified, limiting reproducibility. The manuscript does not fully explore the potential promiscuity of citalopram's interactions across GLUT1, C5aR1, and SERT1, which could provide a deeper understanding of binding selectivity. The absence of GLUT1 knockdown or knockout experiments in macrophages prevents a complete assessment of GLUT1's role in macrophage versus tumor cell metabolism. Furthermore, there is minimal discussion of clinical data on SSRI use in HCC patients. Incorporating survival outcomes based on SSRI treatment could strengthen the study's translational relevance.

      By addressing these limitations, the manuscript could make an even stronger contribution to the fields of cancer immunotherapy and drug repurposing.

      We appreciate your valuable suggestions. As suggested, we will take the following actions:

      (1) GSEA analysis: we will clearly specify the datasets and signature databases used for the GSEA in the revised manuscript.

      (2) Exploration of binding selectivity: we recognize the importance of exploring the potential promiscuity of citalopram’s interactions across GLUT1, C5aR1, and SERT1. As suggested, we will include a more detailed analysis of these interactions, which will help elucidate binding selectivity and its implications for therapeutic outcomes.

      (3) GLUT1 knockdown in macrophages: to address the gap in our assessment of GLUT1’s role in macrophages, we will incorporate GLUT1 knockdown or knockout experiments in macrophages upon citalopram treatment. Moreover, a DARTS assay for GLUT1 in THP-1 cells will be conducted.

      (4) Clinical data on SSRI use in HCC patients: Related data have been reported previously in PMID: 39388353 (Cell Rep. 2024 Oct 22;43(10):114818.). As detailed below:

      “SSRIs use is associated with reduced disease progression in HCC patients

      We determined whether SSRIs for alleviating HCC are supported by real-world data. A total of 3061 patients with liver cancer were extracted from the Swedish Cancer Register. Among them, 695 patients had been administrated with post-diagnostic SSRIs. The Kaplan-Meier survival analysis suggested that patients who utilized SSRIs exhibited a significantly improved metastasis-free survival compared to those who did not use SSRIs, with a P value of log-rank test at 0.0002. Cox regression analysis showed that SSRI use was associated with a lower risk of metastasis (HR = 0.78; 95% CI, 0.62-0.99).”

      Author response image 1.

    1. Author Response

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The sgRNAs used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 1. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1) This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      2) The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      3) Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The sgRNAs used to generate PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data in Author response image 2.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 3. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      4) FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      5) All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      6) Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.

    1. Author Response:

      Reviewer #1 (Public review):

      The authors of this study use electron microscopy and 3D reconstruction techniques to study the morphology of distinct classes of Drosophila sensory neurons *across many neurons of the same class.* This is a comprehensive study attempting to look at nearly all the sensory neurons across multiple sensilla to determine a) how much morphological variability exists between and within neurons of different and similar sensory classes, and 2) identify dendritic features that may have evolved to support particular sensory functions. This study builds upon the authors' previous work, which allowed them to identify and distinguish sensory neuron subtypes in the EM volumes without additional staining so that reconstructed neurons could reliably be placed in the appropriate class. This work is unique in looking at a large number of individual neurons of the same class to determine what is consistent and what is variable about their class-specific morphologies.

      This means that in addition to providing specific structural information about these particular cells, the authors explore broader questions of how much morphological diversity exists between sensory neurons of the same class and how different dendritic morphologies might affect sensory and physiological properties of neurons.

      The authors found that CO2-sensing neurons have an unusual, sheet-like morphology in contrast to the thin branches of odor-sensing neurons. They show that this morphology greatly increases the surface area to volume ratio above what could be achieved by modest branching of thin dendrites, and posit that this might be important for their sensory function, though this was not directly tested in their study. The study is mainly descriptive in nature, but thorough, and provides a nice jumping-off point for future functional studies. One interesting future analysis could be to examine all four cell types within a single sensilla together to see if there are any general correlations that could reveal insights about how morphology is determined and the relative contributions of intrinsic mechanisms vs interactions with neighboring cells. For example, if higher than average branching in one cell type correlated with higher than average branching in another type, if in the same sensilla. This might suggest higher extracellular growth or branching cues within a sensilla. Conversely, if higher branching in one cell type consistently leads to reduced length or branching in another, this might point to dendrite-dendrite interactions between cells undergoing competitive or repulsive interactions to define territories within each sensilla as a major determinant of the variability.

      We thank the reviewer for the insightful comments and appreciation for our study.

      Reviewer #2 (Public review):

      Summary:

      The manuscript employs serial block‐face electron microscopy (SBEM) and cryofixation to obtain high‐resolution, three‐dimensional reconstructions of Drosophila antennal sensilla containing olfactory receptor neurons (ORNs) that detect CO2. This method has been used previously by the same lab in Gonzales et. al, 2021. (https://elifesciences.org/articles/69896), which had provided an exemplary model by integrating high-resolution EM with electrophysiology and cell-type-specific labeling.

      We thank the reviewer for expressing appreciation for our published study.

      The previous study ended up correlating morphology with activity for multiple olfactory sensillar types. Compared to the 2021 study, this current manuscript appears somewhat incomplete and lacks integration with activity.

      We thank the reviewer for their feedback. However, we would like to clarify that our previous study did not correlate morphology with activity to a greater extent than the current study. Both employed the same cryofixation, SBEM-based approach without recording odor-induced activity, but the focus of the current work is fundamentally different. While the previous study examined multiple sensillum types, the current study concentrates on a single sensillum type to address a distinct biological question regarding morphological heterogeneity. We appreciate the opportunity to clarify this distinction, and we hope that the revised manuscript more clearly conveys the unique scope and contributions of this study.

      In fact older studies have also reported two-dimensional TEM images of the putative CO2 neuron in Drosophila (Shanbhag et al., 1999) and in mosquitoes (McIver and Siemicki, 1975; Lu et al, 2007), and in these instances reported that the dendritic architecture of the CO2 neuron was somewhat different (circular and flattened, lamellated) from other olfactory neurons.

      We thank the reviewer for pointing this out. As noted in both the Introduction and Discussion sections, previous studies—including those cited by the reviewer—suggested that CO2-sensing neurons may have a distinct dendritic morphology. However, those earlier studies lacked the means to definitively link the observed morphology to CO2 neuron identity.

      In contrast, our study assigns neuronal identity based on quantitative morphometric measurements, allowing us to confidently associate the unique dendritic architecture with CO2 neurons. Furthermore, we extend previous observations by providing full 3D reconstructions and nanoscale morphometric analyses, offering a much more comprehensive and definitive characterization of these neurons. We believe this represents a significant advancement over earlier work.

      The authors claim that this approach offers an artifact‐minimized ultrastructural dataset compared to earlier. In this study, not only do they confirm this different morphology but also classify it into distinct subtypes (loosely curled, fully curled, split, and mixed). This detailed morphological categorization was not provided in prior studies (e.g., Shanbhag et al., 1999 ).

      We thank the reviewer for acknowledging the significance of our study.

      The authors would benefit from providing quantitative thresholds or objective metrics to improve reproducibility and to clarify whether these structural distinctions correlate with distinct functional roles.

      We thank the reviewer for raising this point. However, we would like to clarify that assigning neurons to strict morphological subtypes was not the primary aim of our study. In practice, dendritic architectures can be highly complex, with individual neurons often displaying features characteristic of multiple subtypes. This is precisely why we included a “mixed” subtype category—to acknowledge and capture this morphological heterogeneity rather than impose rigid classification boundaries.

      Our intent in defining subtypes was not to imply discrete functional classes, but rather to highlight the range of morphological variation observed across ab1C neurons. While we agree that exploring potential correlations between structure and function is an important future direction, the current study focuses on characterizing this diversity using 3D reconstruction and morphometric analysis. We hope this clarifies the purpose and scope of our morphological categorization.

      Strengths:

      The study makes a convincing case that ab1C neurons exhibit a unique, flattened dendritic morphology unlike the cylindrical dendrites found in ab1D neurons. This observation extends previous qualitative TEM findings by not only confirming the presence of flattened lamellae in CO₂ neurons but also quantifying key morphometrics such as dendritic length, surface area, and volume, and calculating surface area-to-volume ratios. The enhanced ratios observed in the flattened segments are speculated to be linked to potential advantages in receptor distribution (e.g., Gr21a/Gr63a) and efficient signal propagation.

      We thank the reviewer for appreciating the significance our current study.

      Weaknesses:

      While the manuscript offers valuable ultrastructural insights and reveals previously unappreciated heterogeneity among CO₂-sensing neurons, several issues warrant further investigation in addition to the points made above.

      (1) Although this quantitative approach is robust compared to earlier descriptive reports, its impact is somewhat limited by the absence of direct electrophysiological data to confirm that ultrastructural differences translate into altered neuronal function. A direct comparison or discussion of how the present findings align with the functional data obtained from electrophysiology would strengthen the overall argument.

      We thank the reviewer for this comment. We would like to clarify, however, that our study does not claim that the observed morphological heterogeneity necessarily leads to functional diversity. Rather, we consider this as a possible implication and discuss it as a potential question for future research. This idea is raised only in the Discussion section, and we are carefully not to present functional diversity as a conclusion of our study. Nonetheless, we have reviewed the relevant paragraph to ensure the language remains cautious and does not overstate our interpretation.

      We also acknowledge the significance of directly linking ultrastructural features to neuronal function through electrophysiological recordings. However, at present, it is technically challenging to correlate the nanoscale morphology of individual ORNs with their functional activity, as this would require volume EM imaging of the very same neurons that were recorded via electrophysiology. Currently, there is no dye-labeling method compatible with single-sensillum recording and SBEM sample preparation that allows for unambiguous identification and segmentation of recorded ORNs at the necessary ultrastructural resolution.

      To acknowledge this important limitation, we have added a paragraph in the Discussion section, as suggested, to clarify the current technical barriers and to highlight this as a promising direction for future methodological advances.

      (2) Clarifying the criteria for dendritic subtype classification with quantitative parameters would enhance reproducibility and interpretability. Moreover, incorporating electrophysiological recordings from ab1C neurons would provide compelling evidence linking structure and function, and mapping key receptor proteins through immunolabeling could directly correlate receptor distribution with the observed morphological diversity.

      Please see our response to the comment regarding the technical limitations of directly correlating ultrastructure with electrophysiological data.

      In addition, we would like to address the suggestion of using immunolabeling to map receptor distribution in relation to the 3D EM models. Currently, antibodies against Gr21a or Gr63a (the receptors expressed in ab1C neurons) are not available. Even if such antibodies were available, immunogold labeling for electron microscopy requires harsh detergent treatment to increase antibody permeability, damaging morphological integrity. These treatments would compromise the very morphological detail that our study aims to capture and quantify.

      (3) Even though Cryofixation is claimed to be superior to chemical fixation for generating fewer artifacts, authors need to confirm independently the variation observed in the CO2 neuron morphologies across populations. All types of fixation in TEMs cause some artifacts, as does serial sectioning. Without understanding the error rates or without independent validation with another method, it is hard to have confidence in the conclusions drawn by the authors of the paper.

      We thank the reviewer for raising concerns regarding potential artifacts in morphological analyses. However, we would like to clarify that cryofixation is widely regarded as a gold standard for ultrastructural preservation and minimizing fixation-induced artifacts, as supported by extensive literature. This is why we adopted high-pressure freezing and freeze substitution in our study.

      We have also published a separate methods paper (Tsang et al., eLife, 2018) directly comparing our cryofixation-based protocol with conventional chemical fixation, demonstrating substantial improvements in morphological preservation (see the image below, adapted from Figure 2 of our 2018 eLife paper). This provides strong empirical support for the reliability of our approach.

      Author response image 1.

      Regarding the suggestion to validate observed morphological variation across populations: we note that determining the presence of artifacts requires a known ground truth, which is inherently unavailable as we could not measure the morphometrics of fly olfactory receptor neurons in their native state. In the absence of such a benchmark, we have instead prioritized using the best-available preparation methods and high-resolution imaging to ensure structural integrity.

      Addressing these concerns and integrating additional experiments would significantly bolster the manuscript's completeness and advancement.

      We appreciate the reviewer’s feedback. As discussed in our responses to the specific comments above, certain suggested experiments are currently limited by technical constraints, particularly in the context of high-resolution volume EM for insect tissues enclosed in cuticles.

      Nevertheless, we have carefully addressed the reviewer’s concerns to the fullest extent possible within the scope of this study. We have revised the manuscript to clarify methodological limitations, added new explanatory content where appropriate, and ensured that our interpretations remain well grounded in the data. We hope these revisions strengthen the clarity and completeness of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the current manuscript entitled "Population-level morphological analysis of paired CO2- and odor-sensing olfactory neurons in D. melanogaster via volume electron microscopy", Choy, Charara et al. use volume electron microscopy and neuron reconstruction to compare the dendritic morphology of ab1C and ab1D neurons of the Drosophila basiconic ab1 sensillum. They aim to investigate the degree of dendritic heterogeneity within a functional class of neurons using ab1C and ab1D, which they can identify due to the unique feature of ab1 sensilla to house four neurons and the stereotypic location on the third antennal segment. This is a great use of volumetric electron imaging and neuron reconstruction to sample a population of neurons of the same type. Their data convincingly shows that there is dendritic heterogeneity in both investigated populations, and their sample size is sufficient to strongly support this observation. This data proposes that the phenomenon of dendritic heterogenity is common in the Drosophila olfactory system and will stimulate future investigations into the developmental origin, functional implications, and potential adaptive advantage of this feature.

      Moreover, the authors discovered that there is a difference between CO2- and odour-sensing neurons of which the first show a characteristic flattened and sheet-like structure not observed in other sensory neurons sampled in this and previous studies. They hypothesize that this unique dendritic organization, which increases the surface area to volume ratio, might allow more efficient Co2 sensing by housing higher numbers of Co2 receptors. This is supported by previous attempts to express Co2 sensors in olfactory sensory neurons, which lack this dendritic morphology, resulting in lower Co2 sensitivity compared to endogenous neurons.

      Overall, this detailed morphological description of olfactory sensory neurons' dendrites convincingly shows heterogeneity in two neuron classes with potential functional impacts for odour sensing.

      Strength:

      The volumetric EM imaging and reconstruction approach offers unprecedented details in single cell morphology and compares dendrite heterogeneity across a great fraction of ab1 sensilla.<br /> The authors identify specific shapes for ab1C sensilla potentially linked to their unique function in CO2 sensing.

      We thank the reviewer for the insightful comments and appreciation for our study.

      Weaknesses:

      While the morphological description is highly detailed, no attempts are made to link this to odour sensitivity or other properties of the neurons. It would have been exciting to see how altered morphology impacts physiology in these olfactory sensory cells.

      We agree that linking morphological variation to physiological properties, such as odor sensitivity, would be a highly valuable direction for future research. However, the aim of the current study is to provide an in-depth nanoscale characterization based on a substantial proportion of ab1 sensilla, highlighting morphological heterogeneity among homotypic ORNs.

      At present, it is technically challenging to correlate the nanoscale morphology of individual ORNs with their physiological responses, as this would require volume EM imaging of the exact neurons recorded via single-sensillum electrophysiology. Currently, no dye-labeling method exists that is compatible with both single-sensillum recording and the stringent requirements of SBEM sample preparation to allow for unambiguous identification and segmentation of recorded ORNs.

      To acknowledge this important limitation, we have added a paragraph in the Discussion section clarifying the current technical barriers and highlighting this as a promising area for future methodological development. Please also see our responses to the reviewer’s 4th comment below, where we present preliminary experiments examining whether odor sensitivity varies among homotypic ORNs.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors showed that enalapril was able to reduce cellular senescence and improve health status in aged mice. The authors further showed that phosphorylated Smad1/5/9 was significantly elevated and blocking this pathway attenuated the protection of cells from senescence. When middle-aged mice were treated with enalapril, the physiological performance in several tissues, including memory capacity, renal function, and muscle strength, exhibited significant improvement.

      Strengths:

      The strength of the study lies in the identification of the pSMAD1/5/9 pathway as the underlying mechanism mediating the anti-senescence effects of enalapril with comprehensive evaluation both in vitro and in vivo.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      Weaknesses:

      The major weakness of the study is the in vivo data. Despite the evidence shown in the in vitro study, there is no data to show that blocking the pSmad1/5/9 pathway is able to attenuate the anti-aging effects of enalapril in the mice. In addition, the aging phenotypes mitigation by enalapril is not evidenced by the extension of lifespan.

      Thanks for your comment. As suggested, we will feed LDN193189 to mice while using LDN193189 to block pSmad1/5/9, and will assess age-related phenotypes in the mice to demonstrate that the anti-aging effect of enalapril in mice is mediated through pSmad1/5/9.

      We only assess the improvement in the health status of the aging mice, which indicate that enalapril can extend the healthy lifespan of aging mice. This is because we believe that lifespan is controlled by genetics. Therefore, this study focuses solely on the improvement of health phenotypes in aging mice by enalapril.

      If it is necessary to show that NAC is able to attenuate enalapril effects in the aging mice. In addition, it would be beneficial to test if enalapril is able to achieve similar rescue in a premature aging mouse model.

      Thanks for your suggestion. To our knowledge, NAC is an inhibitor of ROS, which is consistent with the antioxidant effect of enalapril. Therefore, we believe that NAC will not diminish the effect of enalapril.

      For the premature aging mouse models, we examined the effect of enalapril on Lmna<sup>G609G</sup> mice and other premature aging models and found that the effect was relatively modest. This may be due to differences in the genetic background of premature aging mice, leading to a less pronounced effect of enalapril compared to its impact on naturally aged mice.

      Reviewer #2 (Public review):

      This manuscript presents an interesting study of enalapril for its potential impact on senescence through the activation of Smad1/5/9 signaling with a focus on antioxidative gene expression. Repurposing enalapril in this context provides a fresh perspective on its effects beyond blood pressure regulation. The authors make a strong case for the importance of Smad1/5/9 in this process, and the inclusion of both in vitro and in vivo models adds value to the findings. Below, I have a few comments and suggestions which may help improve the manuscript.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      A major finding in the study is that phosphorylated Smad1/5/9 mediates the effects of enalapril. However, the manuscript focused on the Smad pathway relatively abruptly, and the rationale behind targeting this specific pathway is not fully explained. What makes Smad1/5/9 particularly relevant to the context of this study?

      Thanks for your comment. As stated in the manuscript, after we found that enalapril could improve the cellular senescence phenotype, we screened and examined key targets in important aging-related signaling pathways, such as AKT, mTOR, ERK (Fig. S2A), Smad2/3 and Smad1/5/9 (Fig. 2A). We found that only the phosphorylation levels of Smad1/5/9 significantly increased after enalapril treatment. Therefore, the subsequent focus of this study is on pSmad1/5/9.

      Furthermore, their finding that activation of Smad1/5/9 leads to a reduction of senescence appears somewhat contradictory to the established literature on Smad1/5/9 in senescence. For instance, studies have shown that BMP4-induced senescence involves the activation of Smad1/5/8 (Smad1/5/9), leading to the upregulation of senescence markers like p16 and p21 (JBC, 2009, 284, 12153). Similarly, phosphorylated Smad1/5/8 has been shown to promote and maintain senescence in Ras-activated cells (PLOS Genetics, 2011, 7, e1002359). Could the authors provide more detailed mechanistic insights into why enalapril seems to reverse the typical pro-senescent role of Smad1/5/9 in their study?

      Thanks for your comment. The downstream regulatory network of BMP-pSmad1/5/9 is highly complex. The BMP-SMAD-ID axis has been mentioned in many studies, and its downstream signaling inhibits the expression of p16 and p21 (PNAS, 2016, 113(46), 13057-13062; Cell, 2003, 115(3), 281-292). Additionally, studies have also found that the Smad1-Stat1-P21 axis inhibits osteoblast senescence (Cell Death Discovery, 2022, 8:254). In our study, enalapril was found to increase the expression of ID1, which is a classic downstream target of pSmad1/5/9 (Cell Stem Cell, 2014, 15(5), 619-633). Therefore, pSmad1/5/9 inhibits cellular senescence markers such as p16, p21 and SASP through ID1, thereby promoting cell proliferation (Fig. 3). Furthermore, we also found that pSmad1/5/9 increases the expression of antioxidant genes and reduces ROS levels, exerting antioxidant effects (Fig. 4). Together, ID1 and antioxidant genes enable pSmad1/5/9 to exert its anti-aging effects.

      While the authors showed that enalapril increases pSmad1/5/9 phosphorylation, what are the expression levels of other key and related factors like Smad4, pSmad2, pSmad3, BMP2, and BMP4 in both senescent and non-senescent cells? These data will help clarify the broader signaling effects.

      Thanks for your suggestion. We observed an increase in Smad4 expression, while the levels of pSmad2 and pSmad3 remained unchanged after enalapril treatment (Fig. 2A). We will supplement data on the expression changes of these key factors in both senescent and non-senescent cells.

      They used BMP receptor inhibitor LDN193189 to pharmacologically inhibit BMP signaling, but it would be more convincing to also include genetic validation (e.g., knockdown or knockout of BMP2 or BMP4). This will help confirm that the observed effects are truly due to BMP-Smad signaling and not off-target effects of the pharmacological inhibitor LDN.

      Thanks for your suggestion. We will use shRNA or siRNA to knockdown BMP and examine the related changes to clarify the role of BMP-Smad signaling.

      I don't see the results on the changes in senescence markers p16 and p21 in the mouse models treated with enalapril. Similarly, the effects of enalapril treatment on some key SASP factors, such as TNF-α, MCP-1, IL-1β, and IL-1α, are missing, particularly in serum and tissues. These are important data to evaluate the effect of enalapril on senescence.

      Thanks for your comment. As for the markers p16 and p21, we observed no change in p16, while the changes in p21 varied across different organs and tissues. (Author response image 1). Nevertheless, behavioral experiments and physiological and biochemical indicators at the individual level consistently demonstrated the significant anti-aging effects of enalapril (Fig. 6).

      Author response image 1.

      p21(Cdkn1a) expression levels in organs of mice after enalapril feeding.

      We also examined the changes in SASP factors in the serum of mice after enalapril treatment. Notably, SASP factors such as CCL (MCP), CXCL and TNFRS11B showed significant decreases (Fig. 5C). The expression changes of SASP factors varied across different organs. In the liver, kidneys and spleen, the expression of IL1a and IL1b decreased, while TNFRS11B expression decreased in both the liver and muscles (Fig. 5B). Additionally, CCL (MCP) levels decreased in all organs (Fig. 5B).

      Given that enalapril is primarily known as an antihypertensive, it would be helpful to include data on how it affects blood pressure in the aged mouse models, such as systolic and diastolic blood pressure. This will clarify whether the observed effects are independent of or influenced by changes in blood pressure.

      Thanks for your comment. We measured the blood pressure in mice, and found no significant change in blood pressure after enalapril treatment, which has also been validated in other studies (J Gerontol A Biol Sci Med Sci, 2019, 74(8), 1149–1157). Therefore, our results are independent of changes in blood pressure.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Through a series of CRISPR-Cas9 screens, the GPX4 antioxidant pathway was identified as a critical suppressor of cold-induced cell death in hibernator-derived cells. Hamster BHK-21 cells exposed to repeated cold and rewarming cycles revealed five genes (Gpx4, Eefsec, Pstk, Secisbp2, and Sepsecs) as critical components of the GPX4 pathway, which protects against cold-induced ferroptosis. A second screen with continuous cold exposure confirmed the essential role of GPX4 in prolonged cold tolerance. GPX4 knockout lines exhibited complete cell death within four days of cold exposure, and pharmacological inhibition of GPX4 further increased cell death, underscoring the necessity of GPX4's catalytic activity in cold conditions.

      An additional CRISPR screen in human cold-sensitive K562 cells identified 176 genes for cold survival. The GPX4 pathway was found to confer significant resistance to cold in hibernators and human cells, with GPX4 loss significantly increasing cold-induced cell death.

      Comparing hamster and human GPX4, overexpression of GPX4 in human K562 cells, whether hamster or human GPX4, dramatically improved cold tolerance, while catalytically dead mutants showed no such effect. These findings suggest that GPX4 abundance is a key limiting factor for cold tolerance in human cells, and primary cell types show strong sensitivity to GPX4 loss, highlighting that differences in cold tolerance across species may be due to varying GPX4-mediated protection.

      Strengths:

      (1) Innovative Approach: The study employs a series of unbiased genome-wide CRISPR-Cas9 screens in both hibernator- and non-hibernator-derived cells to investigate the mechanisms controlling cellular cold tolerance. Notably, this is the first genome-scale CRISPR-Cas9 screen conducted in cells derived from a hibernator, the Syrian hamster.

      (2) Identification of the GPX4 Pathway: Identifying glutathione peroxidase 4 (GPX4) as a critical suppressor of cold-induced cell death significantly contributes to the field. Recently, GPX4 was also reported as a potent regulator of cold tolerance through overexpression screening (Sone et al.) in hamsters, which further supports this finding.

      (3) Improved Cold Viability Assessment: The study identifies an important technical artifact in using trypan blue to assess cell viability following cold exposure. It reveals that cells stained immediately after cold exposure retain the dye, inaccurately indicating cell death. By introducing a brief rewarming period before viability assessment, the authors significantly improve the accuracy of detecting cold-induced cell death. This refinement in methodology ensures more reliable results and sets a new standard for future research on cold stress in cells.

      Weaknesses:

      (1) Mechanisms Regulating GPX4 Levels: While the study highlights GPX4 levels as a major determinant of cellular cold tolerance, it does not discuss how these levels are regulated or why they differ between hibernators and non-hibernators. This omission leaves an important aspect of GPX4's role in cold tolerance unexplored.

      (2) Generalizability Across Species: Although the study demonstrates the role of GPX4 in several mammalian species, it does not investigate whether this mechanism extends to other vertebrates (e.g., fish and amphibians) that also face cold challenges. This limitation could restrict the broader evolutionary claims made by the study.

      (3) Variability in Cold Sensitivity Across Human Cell Lines: The study observes significant variability in cold tolerance among different human cell lines but does not explain these differences clearly. This leaves a key aspect of human cell cold sensitivity insufficiently addressed.

      We thank the reviewer for the positive evaluation and thoughtful comments on the manuscript. We acknowledge that our study does not delve into the mechanisms regulating GPX4 levels, including differences between hibernators and non-hibernators, differences between cell types, or the possibility that GPX4 levels are dynamically regulated by environmental conditions. We consider these as interesting open questions that could be addressed in future studies.

      While our study focused entirely on mammalian species, we agree that examining cold tolerance mechanisms across a broader range of vertebrates, including fish and amphibians, could enhance our evolutionary perspective. Interestingly, previous work has indicated that C.elegans adapt to cold temperatures through ferritin mediated Fe2+ detoxification. This suggests that cold induces Fe2+-mediated toxicity in C.elegans as well as mammalian cells, but that the mechanisms through which distantly related species counteract cold-mediated cell death may vary. 

      Finally, we agree that the variability in cold sensitivity across human cell lines could be further explored, and we will strongly consider conducting follow up experiments to examine the extent to which this variability is driven by levels of GPX4.

      We are grateful for these insightful comments, as they highlight important avenues for future research. Addressing these questions will enable a more comprehensive understanding of GPX4's role in cold tolerance and its evolutionary significance across diverse organisms.

      Reviewer #2 (Public review):

      Summary:

      Lam et al., present a very intriguing whole genome CRISPR screen in Syrian Hamster cells as well as K562 cells to identify key genes involved in hypothermia-rewarming tolerance. Survival screens were performed by exposing cells to 4C in a cooled CO2 incubator followed by a rewarming period of 30 minutes prior to survival analysis. In this paradigm, Syrian hamster-derived cell lines exhibit more robust survival than human cell lines (BHK-21 and HaK vs HT1080, HeLa, RPE1, and K562). A genome-wide Syrian hamster CRISPR library was created targeting all annotated genes with 10 guides/gene. LV transduction of the library was performed in BHK-21 cells and the survival screen procedures involved 3 cycles of 4C cold exposure x4 days followed by 2 days of re-warming.

      When compared to controls maintained at 37C, 9 genes were required for BHK-21 survival of cold cycling conditions and 5 of these 9 are known components of the GPX4 antioxidant pathway. GPX4 KO BHK-21 cells had reduced cell growth at 37C and profoundly worse cold tolerance which could be reduced by GPX4 expression. GPX4 inhibitors also reduced survival in cold. CRISPR KO screens and GPX4 KO in K562 cells revealed comparable results (though intriguingly glutathione biosynthesis genes were more critical to K562 cells than BHK-21 cells). Human or Syrian hamster GPX4 overexpression improved cold tolerance.

      Strengths:

      This is a very nicely written paper that clearly communicates in figures and text complicated experimental manipulations and in vitro genetic screening and cell survival data. The focus on GPX4 is interesting and relatively novel. The converging pharmacologic, loss-of-function, and gain-of-function experiments are also a strength.

      Weaknesses:

      A recently published article (Reference 43, Sone et al.) also independently explored the role of GPX4 in Syrian hamster cold tolerance through gain-of-function screening. Further exploration of the GPX4 species-specific mechanisms would be of great interest, but this is considered a minor weakness given the already very comprehensive and compelling data presented.

      We greatly appreciate the reviewer’s compliments and thoughtful comments on our manuscript. We agree with the reviewer that our approach (dual unbiased genome-scale screens in human and hamster cells) and the recent investigation by Sone et al (gain-of-function screening involving the insertion of hamster cDNA into human cells) mutually strengthen the importance of GPX4 in cold tolerance across cell types and species.

      Reviewer #3 (Public review):

      Summary:

      This work aims to address a fundamental biological question: how do mammalian cells achieve/lose tolerance to cold exposure? The authors first tried to establish an experimental system for cell cold exposure and evaluation of cell death and then performed genome-scale CRISPR-Cas9 screening on immortalized cell lines from Syrian Hamster (BHK-21) and human (K562) for key genes that are associated with cell survival during prolonged cold exposure. From these screenings, they focused on glutathione peroxidase 4 (GPX4). Using genetic modifications or pharmacological interventions, and multiple cell models including primary cells from various mammalian species, they showed that GPX4 proteins are likely to retain their activities at 4 {degree sign}C, functioning to prevent cold-induced cell ferroptosis.

      Strengths:

      (1) This paper is neatly written and hence easy to follow.

      (2) Experiments are well designed.

      (3) The data showing the overall good cell survival after a prolonged cold exposure or repeated cold-warm cycles are helpful to show the advantages of the experimental instruments and methods the authors used, and hence the validity of their results.

      (4) The CRISPR-Cas9 screening is a great attempt.

      (5) Multiple cell types from hibernating mammals (cold tolerant) and cold-intolerant species are used to test their findings.

      (6) Although some may argue that other labs have published works with different approaches that have pointed out the importance of GPX4 and ferroptosis in hamster cell survival from anoxia-reoxygenation or cold exposure models, hence hurting the novelty of this work, this reviewer thinks that it is highly valuable to have independent research groups and different methods/systems to validate an important concept.

      Weaknesses:

      (1) Only cell death was robustly surveyed; though cell proliferation was evaluated too in some experiments, other cellular functions, such as mitochondrial ATP production vs. glycolysis, and the extent of lipid peroxidation, could have been measured to reflect cellular physiology.

      Validations on complex tissues or in vivo systems would have further strengthened the work and its impact.

      CRISPR-Cas9 screening may have technical limitations as knock-out of some essential genes/pathways may lead to cell lethality during screening, and hence the relevance of these genes/pathways to cell cold tolerance may not be noted. From the data presented in this study, this reviewer thinks that the GPX4 pathway is likely a conserved mechanism for long-term cold survival, but not for cold sensitivity or acute cell death from cold exposure. In line with my such speculation, their CRISPR-Cas9 screening revealed genes in the GPX4 pathway from a relatively cold-sensitive human cell line, but the endogenous GPX4 pathway is seemingly operational in this cold-sensitive cell line. Also, these cells are viable after GPX4 knock-out. Dead cells from the acute cold exposure phase may detached, or their genomic DNAs have been severely damaged by the time of sample collection, hence not giving any meaningful sequencing reads. Crippling other factors/pathways such as FOXO1 (PMID: 38570500) or 5-aminolevulinic acid (ALA) metabolism (PMID: 35401816) have been shown to severely aggravate cold-induced cell death, including TUNEL-revealed DNA damage, within a much shorter time scale, whilst loss-function knockouts of FOXO1 or ALA Synthase 1 (ALAS1) are usually cell lethal. Thus, they and other possible essential genes may not be screenable from the current experimental protocol. These important points need to be taken into consideration by the authors.

      We thank the reviewer for highlighting the novelty of using genome-scale CRISPR-Cas9 screens and the validation of GPX4 function across cell types and mammalian species. 

      We acknowledge that our study primarily focused on measuring cell death using Trypan Blue dye exclusion. To validate the Trypan Blue assay, cell survival data was orthogonally measured using the LDH release assays (Fig. 1g). The proliferation potential of putatively live cells was assessed by counting the increase in live cells following 24 h at 37°C (Fig. 1b). Prompted by your question, we will add additional data to the final version of the manuscript in which we show that following 1 day at 4°C, K562 cells rapidly restarted their cell cycle and double in numbers every 21 hours (Author response image 1). This rate is indistinguishable from the replication rate of cells that were not previously exposed to 4°C, suggesting that the cells following cold exposure are both alive and functionally capable of replicating.

      Author response image 1.

      Population doubling time of K562 cells cultured at 37°C (pink) and cells that are rewarmed to 37°C following 1 day of 4°C exposure

      We agree that assessing additional cellular functions, such as mitochondrial ATP production, glycolysis, lipid metabolism and peroxidation could provide a more comprehensive understanding of cellular physiology under cold stress and would be valuable future studies. Similarly, we appreciate the suggestion to validate our findings in complex tissues or in vivo models. We recognize that such validation could strengthen the implications of our study and enhance its translational potential; however, due to their complexity, we believe that these additional studies are beyond the scope of our current study.

      We agree with the reviewer that CRISPR-Cas9 screens have limitations. For example our screen was designed to identify genes that are preferentially required for cellular fitness at 4°C versus 37°C. There are many genes that are required for cellular survival at 4°C as well as 37°C that are not discussed (Table S2, S5). Also, given that the screen is designed to disrupt a single gene per cell, genes that have redundant functions in cold-tolerance will likely be missed. Given the reviewer’s questions, we will expand the discussion of the paper to highlight limitations of the screen.

      We apologize for any lack of clarity about the methods we employed during the screen and will expand the methods section to provide further details. For example, for the BHK-21 screen we eliminated dead cells by sequencing cells that reattached after rewarming to 37°C for either 30 minutes (15 day cold exposure screen) or 24 hours (4°C cycling screen). Indeed, at the point of cell collection for both BHK-21 and K562 screens, the fraction of live cells was greater than 92% and 95%, respectively.  We respectfully disagree with the reviewer that our screens would miss genes that affect acute cold tolerance. Any cells that would have died either early or late during cold exposure would have not been sequenced, and thus the sgRNAs targeting a specific gene in those cells would appear depleted, regardless of whether these cells died early/acutely or later during cold exposure. 

      We thank the reviewer for pointing out two additionally highly relevant studies. Interestingly, the genes implicated in cold tolerance in these studies, FOXO1 and ALAS1, did not appear essential for survival at 37°C or 4°C  in BHK-21 or K562 cells. There are several possibilities that could explain this finding: 1) our screen may not have successfully knocked out these genes, 2) other proteins may have compensated for their loss, or 3) these pathways may regulate cold tolerance in some but not all cell types. We apologize that in the current version of the manuscript we did not reflect on these recent studies. We will expand our discussion to include their findings. 

      Once again, we are grateful for the reviewer’s insights, which have highlighted key areas for further exploration as well as pointed to specific ways to improve our manuscript.

    1. Author Response

      Joint Public Review

      The molecular composition of synaptic vesicles (SVs) has been defined in substantial detail, but the function of many SV-resident proteins are still unknown. The present study focused on one such protein, the 'orphan' SV-resident transporter SLC6A17. By utilizing sophisticated and extensive mouse genetics and behavioral experiments, the authors provide convincing support for the notion that certain SLC6A17 variants cause intellectual disability (ID) in humans carrying such genetic variations. This is an important and novel finding. Furthermore, the authors propose, based on LCMS analyses of isolated SVs, that SLC6A17 is responsible for glutamine (Gln) transport into SVs, leading to the provocative idea that Gln functions as a neurotransmitter and that deficits in Gln transport into SVs by SLC6A17 represents a key pathogenetic mechanism in human ID patients carrying variants of the SLC6A17 gene.

      This latter aspect of the present paper is not adequately supported by the experimental evidence so that the main conceptual claims of the study appear insufficiently justified at this juncture. Key weaknesses are as follows:

      A) Detection of Gln, along with classical neurotransmitters such as glutamate, GABA, or ACh, in isolated SV fractions does not prove that Gln is transported into SVs by active transport. Gln is quite abundant in extracellular compartments. Its appearance in SV samples can therefore also be explained by trapping in SVs during endocytosis, presence in other - contaminating - organelles, binding to membrane surfaces, and other processes. Direct assays of Gln uptake into SVs, which have the potential to stringently test key postulates of the authors, are lacking.

      We have conducted multiple control experiments to exclude the possibility of contamination.

      1). Western blot analysis of SLC6A17-HA immunoisolation (Figure 4D and Figure 4—figure supplement 1) has shown that this faction contained little other organelles and membranes. These results are strong argument that contaminations in our isolated fraction were in very low level.

      2). We then examined the proportion of SLC6A17 localized SVs through quantifying the co-localization of Syp and SLC6A17 by anti-Syp immunoisolation in Slc6a17-2A-HA-iCre mice. We found that SLC6A17 is predominately localized on SVs (with 98.7% compared with classical SV marker, Author response image 1A). This further showed that immunoisolated SLC6A17 fraction was mainly composed of SVs.

      3). We also analyzed other SV marker proteins such as Syt1 and Syb2 for IP-LC-MS, all results supported Gln enrichment (Author response image 1B).

      4). Importantly, immunoisolation of the SLC6A17P633R-HA protein, which caused SLC6A17 mislocalization away from the SVs (Figure 3B and Figure 3—figure supplement 1C, D), showed no Gln enrichment (Author response image 1C).

      5). Moreover, immunoisolation of AAV-PHP.eb overexpressed cytoplasmic membrane Gln transporter SLC38A1-HA did not show Gln enrichment (Author response image 1D).

      6). We also tested whether trafficking organelles such as the lysosome could enrich Gln. As is shown in Author response image 1E, immunoisolation of AAV-PHP.eb overexpressed TMEM192-HA did not show Gln enrichment. For active transport, we tested the effects of proton dissipator FCCP, v-ATPase inhibitor NEM and ΔpH dissipator nigercin. As is shown in Author response image 1F, 1G, Gln level was reduced by these inhibitors, supporting active transport of Gln.

      Author response image 1.

      Control experiments to test for contamination. A. Anti-Syp immunoisolation in Slc6a17-2A-HA-iCre mice. B. Quantification of Gln level in anti-Syt1 and anti-Syb2 immunoisolated fraction. C. Anti-HA immunoisolation in SLC6A7-2A-HA and anti-Slc6a17P633R mice. D. Anti-HA immunoisolation in AAV-PHP.eb-hSyn-SLC38A1-HA overexperssion mice. E. Anti-HA immunoisolation in AAV-PHP.eb-hSyn-TMEM192-HA overexperssion mice. F. Anti-HA immunoisolation in SLC6A7-2A-HA mice under FCCP (50 μM) and NEM (200 μM). G. Anti-Syp immunoisolation in wild type mice under FCCP (50 μM) and Nigercin (20 μM).

      B) The authors generated multiple potentially very useful genetic tools and models. However, the validation of these models is incomplete. Most importantly, it remains unclear whether the different mutations affect SLC6A17 expression levels, subcellular localization, or the expression and trafficking of other SV and synapse components.

      The verification of transgenic mouse line is described in the Material and Methods section of our manuscript. There are numerous literatures published for CRISPR mediated gene editing in animals and the off-target effect of CRISPR-Cas9 system is widely studied with optimized design tools developed by many groups (Platt et al., 2014; Chu et al., 2015, 2016; Liu et al., 2017; Gemberling et al., 2021; Singh et al., 2022). The gRNAs used for animal generation were chosen carefully based on publically available tools. Apart from basic genomic PCR sequencing of target regions of all gene edited mouse models, Southern blots were performed by Biocytogen company for Slc6a17-HA-2A-iCre and Slc6a17P633R mice to rule out random insertions. Expression levels in Slc6a17-KO and Slc6a17P633R mice were not affected, as shown in Figure R2. HA-tagged protein in Slc6a17-HA-2A-iCre and Slc6a17P633R mice were detected by immunoisolation, immunofluorescence, and fractionation (Figure 3, 4, Figure 3—figure supplement 1, Figure 4—figure supplement 1). Both showed localizations expected from previous reports ().

      C) Apart from the caveats mentioned above regarding Gln uptake into SVs, the data interpretation provided by the authors lacks stringency with respect to the biophysics of plasma membrane and SV transporters.

      The biophysics of SLC6A17 was carefully studied (Para et al 2008; Zaia and Reimer, 2009). Our work focused on in vivo biochemical results, not biophysics.

      Author response image 2.

      Verification of genetic mouse models. A. q-PCR verification of Slc6a17-KO mice; B. q-PCR verification of Slc6a17P633R mice; C. Example of genomic primer design for Slc6a17-HA-2A-iCre mice founder mice screen; D. Example of genomic PCR for Slc6a17-HA-2A-iCre mice founder mice screen; E. Southern blot performed for Slc6a17-HA-2A-iCre mice.

      Reference

      Chu, Van Trung et al. “Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells.” Nature biotechnology vol. 33,5 (2015): 543-8. doi:10.1038/nbt.3198

      Chu, Van Trung, et al. "Efficient generation of Rosa26 knock-in mice using CRISPR/Cas9 in C57BL/6 zygotes." BMC biotechnology 16.1 (2016): 1-15.

      Gemberling, Matthew P et al. “Transgenic mice for in vivo epigenome editing with CRISPR-based systems.” Nature methods vol. 18,8 (2021): 965-974. doi:10.1038/s41592-021-01207-2

      Liu, Edison T., et al. "Of mice and CRISPR: The post‐CRISPR future of the mouse as a model system for the human condition." EMBO reports 18.2 (2017): 187-193.

      Madisen, Linda, et al. "A robust and high-throughput Cre reporting and characterization system for the whole mouse brain." Nature neuroscience 13.1 (2010): 133-140.

      Parra, Leonardo A., et al. "The orphan transporter Rxt1/NTT4 (SLC6A17) functions as a synaptic vesicle amino acid transporter selective for proline, glycine, leucine, and alanine." Molecular pharmacology 74.6 (2008): 15211532.

      Platt, R.J., Chen, S., Zhou, Y., Yim, M.J., Swiech, L., Kempton, H.R., Dahlman, J.E., Parnas, O., Eisenhaure, T.M., Jovanovic, M., et al. (2014). CRISPR-Cas9 knockin mice for genome editing and cancer mode Yang, Hui, Haoyi Wang, and Rudolf Jaenisch. "Generating genetically modified mice using CRISPR/Cas-mediated genome engineering." Nature protocols 9.8 (2014): 1956-1968.ling. Cell 159, 440-455.

      Singh, Surender et al. “Opportunities and challenges with CRISPR-Cas mediated homologous recombination based precise editing in plants and animals.” Plant molecular biology, 10.1007/s11103-022-01321-5. 31 Oct. 2022, doi:10.1007/s11103-022-01321-5

      Zaia, K.A., and Reimer, R.J. (2009). Synaptic vesicle protein NTT4/XT1 (SLC6A17) catalyzes Na+-coupled neutral amino acid transport. J Biol Chem 284, 8439-8448.

    1. Author response:

      We would like to thank the editors and the reviewers for constructive feedback on our first version of the manuscript. Before submitting a fully revised version with detailed response to each point, we would like to provide a brief clarification on some of the key issues.

      Reviewer 2 raised a concern about the precision and specificity of holographic stimulation, regarding its potential effect on out-of-focus stimulation points and planes. We further verified whether the laser power at the targeted z-plane influences cells’ activity at nearby z-planes. As the Reviewer pointed out, the previous x- and y-axis shifts were tested by single-cell stimulation. This time, we stimulated five cells simultaneously, to match the actual experiment setup and assess potential artifacts in other planes. We observed no stimulation-driven activity increase in cells at a z-planed shifted by 20 µm (Author response image 1). This confirms the holographic stimulation accurately manipulates the pre-selected target cells and the effects we observe is not likely due to out-of-focus stimulation artifacts. It is true that not all of pre-selected cells showing significant response changes prior to the main experiment are effectively activated t every trial during the experiments. While further analyses will be included in the revised manuscript, we varied the target cell distances across FOVs, from nearby cells to those farther apart within the FOV. We have not observed a significant relationship between the target cell distances and stimulation effect. Lastly, cells within < 15 µm of the target were excluded to prevent potential excitation due to the holographic stimulation power. Given the spontaneous movements of the FOV during imaging sessions due to animal’s movement, despite our efforts to minimize them, we believe that any excitation from these neighboring neurons would be directly from the stimulation rather than the light pattern artifact itself.

      Author response image 1.

      Stimulation effect on five pre-selected cells at the target z-plane (left) and 20 µm off-target z-plane (right). No stimulation-driven effect was observed on the off-target cells.

      Reviewers also raised concerns regarding the interpretation of homeostatic balance. While we are working on further analyses to strengthen our findings based on the reviewers’ suggestions, the observed response changes in co-tuned neuronal ensembles, specifically during the processing of their preferred frequency information, highlights an interaction between sensory processing and network dynamics. We believe this specificity indicates a functional mechanism beyond broad suppression or simple inhibitory effects, possibly aligning with homeostatic principles in cortical circuits. Regarding the post-stimulation effect, it is true neither the stimulation nor the control condition showed further response changes during the post-stimulation session. For the control condition, this is likely due to the repetitive tone presentation that could already triggered neural adaptation to a plateau by first two imaging sessions (baseline and stimulation sessions), preventing further changes in the last session. However, as the stimulation condition induced a greater amplitude decrease during the stimulation session compared to the control condition, if this extra suppression had not persisted during the post-stimulation session, we would have expected response amplitudes to rebound, increasing between the stimulation and post-stimulation sessions, which was not the case. Therefore, we propose that the persistence of this rebalanced network state is more indicative of a potential homeostatic mechanism in response to the activity manipulation within the network.

    1. Author response:

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility, and clarity):

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized. The authors then study different aspects of Neisseriaendothelial interactions and benchmark the bacterial infection model against the best disease model available, a human skin xenograft mouse model, which is one of the great strengths of the paper. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and induces endothelial cytoskeleton rearrangements. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria. The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field, and I only have a few major comments and some minor.

      Major comments:

      Infection-on-chip. I would recommend the authors to change the terminology of "infection on chip" to better reflect their work. The term is vague and it decreases novelty, as there are multiple infection on chips models that recapitulate other infections (recently reviewed in https://doi.org/10.1038/s41564-024-01645-6) including Ebola, SARS-CoV-2, Plasmodium and Candida. Maybe the term "sepsis on chip" would be more specific and exemplify better the work and novelty. Also, I would suggest that the authors carefully take a look at the text and consider when they use VoC or to current term IoC, as of now sometimes they are used interchangeably, with VoC being used occasionally in bacteria perfused experiments.

      We thank Reviewer #1 for this suggestion. Indeed, we have chosen to replace the term "Infection-on-Chip" by "infected Vessel-on-chip" to avoid any confusion in the title and the text. Also, we have removed all the terms "IoC" which referred to "Infection-on-Chip" and replaced with "VoC" for "Vessel-on-Chip". We think these terms will improve the clarity of the main text.

      Author response image 1.

      F-actin (red) and ezrin (yellow) staining after 3h of infection with N. meningitidis (green) in 2D (top) and 3D (bottom) vessel-on-chip models.

      Fig 3 and Supplementary 3: Permeability. The authors suggest that early 3h infection with Neisseria do not show increase in vascular permeability in the animal model, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. This seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research (1) has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection and these experiments were not carried out. As discussed in the manuscript, bacteria induced permeability in mouse occurs at later time points, 16h post infection as shown previoulsy (2). As discussed in the manuscript, this difference between the xenograft model and the chip likely reflect the absence in the chip of various cell types present in the tissue parenchyma.

      The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      We thank the Reviewer #1 for this suggestion.

      • According to this recommendation, we imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. We have added the graph and the confocal images in Figure S4B and lines 418-419 of the revised manuscript.

      • We recently performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Fig. 1 of this response), it was not as obvious as other markers under these infected conditions and we did not include it in the main text. Interpretation of this result is not straight forward as for instance the substrate of the cells is different and it would require further studies on the behaviour of ERM proteins in these different contexts.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. I would suggest that the authors add a more extensive description of the protocol in methods. Could this technique be applied in other laboratories? If this is a major limitation, it should be listed in the discussion.

      Following the Reviewer’s comment, we introduced more detailed explanations regarding the photoablation:

      • L157-163 (Results): "Briefly, the chosen design is digitalized into a list of positions to ablate. A pulsed UV-LASER beam is injected into the microscope and shaped to cover the back aperture of the objective. The laser is then focused on each position that needs ablation. After introducing endothelial cells (HUVEC) in the carved regions,…"

      • L512-516 (Discussion): "The speed capabilities drastically improve with the pulsing repetition rate. Given that our laser source emits pulses at 10kHz, as compared to other photoablation lasers with repetitions around 100 Hz, our solution could potentially gain a factor of 100."

      • L1082-1087 (Materials and Methods): "…, and imported in a python code. The control of the various elements is embedded and checked for this specific set of hardware. The code is available upon request." Adding these three paragraphs gives more details on how photoablation works thus improving the manuscript.

      Minor comments:

      Supplementary Fig 2. The reference to subpanels H and I is swapped.

      The references to subpanels H and I have been correctly swapped back in the reviewed version.

      Line 203: I would suggest to delete this sentence. Although a strength of the submitted paper is the direct comparison of the VoC model with the animal model to better replicate Neisseria infection, a direct comparison with animal permeability is not needed in all vascular engineering papers, as vascular permeability measurements in animals have been well established in the past.

      The sentence "While previously developed VoC platforms aimed at replicating physiological permeability properties, they often lack direct comparisons with in vivo values." has been removed from the revised text.

      Fig 3: Bacteria binding experiments. I would suggest the addition of more methodological information in the main results text to guarantee a good interpretation of the experiment. First, it would be better that wall shear stress rather than flow rate is described in the main text, as flow rate is dependent on the geometry of the vessel being used. Second, how long was the perfusion of Neisseria in the binding experiment performed to quantify colony doubling or elongation? As per figure 1C, I would guess than 100 min, but it would be better if this information is directly given to the readers.

      We thank Reviewer #1 for these two suggestions that will improve the text clarity (e.g., L316). (i) Indeed, we have changed the flow rate in terms of shear stress. (ii) Also, we have normalized the quantification of the colony doubling time according to the first time-point where a single bacteria is attached to the vessel wall. Thus, early adhesion bacteria will be defined by a longer curve while late adhesion bacteria by a shorter curve. In total, the experiment lasted for 3 hours (modifications appear in L318 and L321-326).

      Fig 4: The honeycomb structure is not visible in the 3D rendering of panel D. I would recommend to show the actin staining in the absence of Neisseria staining as well.

      According to this suggestion, a zoom of the 3D rendering of the cortical plaque without colony had been added to the figure 4 of the revised manuscript.

      Line 421: E-selectin is referred as CD62E in this sentence. I would suggest to use the same terminology everywhere.

      We have replaced the "CD62E" term with "E-selectin" to improve clarity.

      Line 508: "This difference is most likely associated with the presence of other cell types in the in vivo tissues and the onset of intravascular coagulation". Do the authors refer to the presence of perivascular cells, pericytes or fibroblasts? If so, it could be good to mention them, as well as those future iterations of the model could include the presence of these cell types.

      By "other cell types", we refer to pericytes (3), fibroblasts (4), and perivascular macrophages (5), which surround endothelial cells and contribute to vessel stability. The main text was modified to include this information (Lines 548 and 555-570) and their potential roles during infection disussed.

      Discussion: The discussion covers very well the advantages of the model over in vitro 2D endothelial models and the animal xenograft but fails to include limitations. This would include the choice of HUVEC cells, an umbilical vein cell line to study microcirculation, the lack of perivascular cells or limitations on the fabrication technique regarding application in other labs (if any).

      We thank Reviewer #1 for this suggestion. Indeed, our manuscript may lack explaining limitations, and adding them to the text will help improve it:

      • The perspectives of our model include introducing perivascular cells surrounding the vessel and fibroblasts into the collagen gel as discussed previously and added in the discussion part (L555-570).

      • Our choice for HUVEC cells focused on recapitulating the characteristics of venules that respect key features such as the overexpression of CD62E and adhesion of neutrophils during inflammation. Using microvascular endothelial cells originating from different tissues would be very interesting. This possibility is now mentioned in the discussion lines 567-568.

      • Photoablation is a homemade fabrication technique that can be implemented in any lab harboring an epifluorescence microscope. This method has been more detailed in the revised manuscript (L1085-1087).

      Line 576: The authors state that the model could be applied to other systemic infections but failed to mention that some infections have already been modelled in 3D bioengineered vascular models (examples found in https://doi.org/10.1038/s41564-024-01645-6). This includes a capillary photoablated vascular model to study malaria (DOI: 10.1126/sciadv.aay724).

      Thes two important references have been introduced in the main text (L84, 647, 648).

      Line 1213: Are the 6M neutrophil solution in 10ul under flow. Also, I would suggest to rewrite this sentence in the following line "After, the flow has been then added to the system at 0.7-1 µl/min."

      We now specified that neutrophils are circulated in the chip under flow conditions, lines 1321-1322.

      Significance

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Its main limitations is the brief description of the photoablation methodology and more clarity is needed in the description of bacteria perfusion experiments, given their complexity. The manuscript will be of interest for the general infection community and to the tissue engineering community if more details on fabrication methods are included. My expertise is on infection bioengineered models.

      Reviewer #2 (Evidence, reproducibility, and clarity):

      Summary:

      The authors develop a Vessel-on-Chip model, which has geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells and flow perfusion to induce mechanical cues. This vessel could be infected with Neisseria meningitidis, as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse, which is the current gold standard for these studies, and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      Major comments:

      I have no major comments. The claims and the conclusions are supported by the data, the methods are properly presented and the data is analyzed adequately. Furthermore, I would like to propose an optional experiment could improve the manuscript. In the discussion it is stated that the vascular geometry might contribute to bacterial colonization in areas of lower velocity. It would be interesting to recapitulate this experimentally. It is of course optional but it would be of great interest, since this is something that can only be proven in the organ-on-chip (where flow speed can be tuned) and not as much in animal models. Besides, it would increase impact, demonstrating the superiority of the chip in this area rather than proving to be equal to current models.

      We have conducted additional experiments on infection in different vascular geometries now added these results figure 3/S3 and lines 288-305. We compared sheared stress levels as determined by Comsol simulation and experimentally determined bacterial adhesion sites. In the conditions used, the range of shear generated by the tested geometries do not appear to change the efficiency of bacterial adhesion. These results are consistent with a previous study from our group which show that in this range of shear stresses the effect on adhesion is limited (6) . Furthermore, qualitative observations in the animal model indicate that bacteria do not have an obvious preference in terms of binding site.

      Minor comments:

      I have a series of suggestions which, in my opinion, would improve the discussion. They are further elaborated in the following section, in the context of the limitations.

      • How to recapitulate the vessels in the context of a specific organ or tissue? If the pathogen is often found in the luminal space of other organs after disseminating from the blood, how can this process be recapitulated with this mode, if at all?

      For reasons that are not fully understood, postmortem histological studies reveal bacteria only inside blood vessels but rarely if ever in the organ parenchyma. The presence of intravascular bacteria could nevertheless alter cells in the tissue parenchyma. The notable exception is the brain where bacteria exit the bacterial lumen to access the cerebrospinal fluid. The chip we describe is fully adapted to develop a blood brain barrier model and more specific organ environments. This implies the addition of more cell types in the hydrogel. A paragraph on this topic has been added (Lines 548 and 552-570).

      • Similarly, could other immune responses related to systemic infection be recapitulated? The authors could discuss the potential of including other immune cells that might be found in the interstitial space, for example.

      This important discussion point has been added to the manuscript (L623-636). As suggested by Reviewer #2, other immune cells respond to N. meningitis and can be explored using our model. For instance, macrophages and dendritic cells are activated upon N. meningitis infection, eliminate the bacteria through phagocytosis, produce pro-inflammatory cytokines and chemokines potentially activating lymphocytes (7). Such an immune response, yet complex, would be interesting to study in our model as skin-xenograft mice are deprived of B and T lymphocytes to ensure acceptance of human skin grafts.

      • A minor correction: in line 467 it should probably be "aspects" instead of "aspect", and the authors could consider rephrasing that sentence slightly for increased clarity.

      We have corrected the sentence with "we demonstrated that our VoC strongly replicates key aspects of the in vivo human skin xenograft mouse model, the gold standard for studying meningococcal disease under physiological conditions." in lines 499-503.

      Strengths and limitations

      The most important strength of this manuscript is the technology they developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date, but allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis. However, the authors do not seem to present any new mechanistic insights obtained using this model. All the findings obtained in the infection-on-chip demonstrate that the model is equivalent to the human skin xenograft mouse model, and can offer superior resolution for microscopy. However, the advantages of the model do not seem to be exploited to obtain more insights on the pathogenicity mechanisms of N. meningitidis, host-pathogen interactions or potential applications in the discovery of potential treatments. For example, experiments to elucidate the role of certain N. meningiditis genes on infection could enrich the manuscript and prove the superiority of the model. However, I understand these experiments are time-consuming and out of the scope of the current manuscript. In addition, the model lacks the multicellularity that characterizes other similar models. The authors mention that the pathogen can be found in the luminal space of several organs, however, this luminal space has not been recapitulated in the model. Even though this would be a new project, it would be interesting that the authors hypothesize about the possibilities of combining this model with other organ models. The inclusion of circulating neutrophils is a great asset; however it would also be interesting to hypothesize about how to recapitulate other immune responses related to systemic infection.

      We thank Reviewer #2 for his/her comment on the strengths and limitations of our work. The difficulty is that our study opens many futur research directions and applications and we hope that the work serves as the basis for many future studies but one can only address a limited set of experiments in a single manuscript.

      • Experiments investigating the role of N. meningitidis genes require significant optimization of the system. Multiplexing is a potential avenue for future development, which would allow the testing of many mutants. The fast photoablation approach is particularly amenable to such adaptation.

      • Cells and bacteria inside the chambers could be isolated and analyzed at the transcriptomic level or by flow cytometry. This would imply optimizing a protocol for collecting cells from the device via collagenase digestion, for instance. This type of approach would also benefit from multiplexing to enhance the number of cells.

      • As mentioned above, the revised manuscript discusses the multicellular capabilities of our model, including the integration of additional immune cells and potential connections to other organ systems. We believe that these approaches are feasible and valuable for studying various aspects of N. meningitidis infection.

      Advance

      The most important advance of this manuscript is technical: the development of a model that proves to be equivalent to the most complex model used to date to study meningococcal systemic infections. The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Infection-on-chip model is completely in vitro, can be produced quickly, and allows to precisely tune the vessel’s geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area.

      Other vessel-on-chip models can recapitulate an endothelial barrier in a tube-like morphology, but do not recapitulate other complex geometries, that are more physiologically relevant and could impact infection (in addition to other non-infectious diseases). However, in the manuscript it is not clear whether the different morphologies are necessary to study or recapitulate N. meningitidis infection, or if the tubular morphologies achieved in other similar models would suffice.

      Audience

      This manuscript might be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. Thus, the tool presented (vessel-on-chip) can have great applications for a broad audience. However, even when the method might be faster and easier to use than other equivalent methods, it could still be difficult to implement in another laboratory, especially if it lacks expertise in bioengineering. Therefore, the method could be more of interest for laboratories with expertise in bioengineering looking to expand or optimize their toolbox. Alternatively, this paper present itself as an opportunity to begin collaborations, since the model could be used to test other pathogen or conditions.

      Field of expertise:

      Infection biology, organ-on-chip, fungal pathogens.

      I lack the expertise to evaluate the image-based analysis.

      References

      (1) Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      (2) Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-24797-z.

      (3) Mats Hellström, Holger Gerhardt, Mattias Kalén, Xuri Li, Ulf Eriksson, Hartwig Wolburg, and Christer Betsholtz. Lack of pericytes leads to endothelial hyperplasia and abnormal vascular morphogenesis. Journal of Cell Biology, 153(3):543–554, Apr 2001. ISSN 0021-9525. doi: 10.1083/jcb.153.3.543.

      (4) Arsheen M. Rajan, Roger C. Ma, Katrinka M. Kocha, Dan J. Zhang, and Peng Huang. Dual function of perivascular fibroblasts in vascular stabilization in zebrafish. PLOS Genetics, 16(10):1–31, 10 2020. doi: 10.1371/journal.pgen.1008800.

      (5) Huanhuan He, Julia J. Mack, Esra Güç, Carmen M. Warren, Mario Leonardo Squadrito, Witold W. Kilarski, Caroline Baer, Ryan D. Freshman, Austin I. McDonald, Safiyyah Ziyad, Melody A. Swartz, Michele De Palma, and M. Luisa Iruela-Arispe. Perivascular macrophages limit permeability. Arteriosclerosis, Thrombosis, and Vascular Biology, 36(11):2203–2212, 2016. doi: 10.1161/ATVBAHA. 116.307592.

      (6) Emilie Mairey, Auguste Genovesio, Emmanuel Donnadieu, Christine Bernard, Francis Jaubert, Elisabeth Pinard, Jacques Seylaz, Jean-Christophe Olivo-Marin, Xavier Nassif, and Guillaume Dumenil. Cerebral microcirculation shear stress levels determine Neisseria meningitidis attachment sites along the blood–brain barrier . Journal of Experimental Medicine, 203(8):1939–1950, 07 2006. ISSN 0022-1007. doi: 10.1084/jem.20060482.

      (7) Riya Joshi and Sunil D. Saroj. Survival and evasion of neisseria meningitidis from macrophages. Medicine in Microecology, 17:100087, 2023. ISSN 2590-0978. doi: https://doi.org/10.1016/j.medmic. 2023.100087.

    1. Author Response:

      Assessment note: “Whereas the results and interpretations are generally solid, the mechanistic aspect of the work and conclusions put forth rely heavily on in vitro studies performed in cultured L6 myocytes, which are highly glycolytic and generally not viewed as a good model for studying muscle metabolism and insulin action.”

      While we acknowledge that in vitro models may not fully recapitulate the complexity of in vivo systems, we believe that our use of L6 myotubes is appropriate for studying the mechanisms underlying muscle metabolism and insulin action. As mentioned below (reviewer 2, point 1), L6 myotubes possess many important characteristics relevant to our research, including high insulin sensitivity and a similar mitochondrial respiration sensitivity to primary muscle fibres. Furthermore, several studies have demonstrated the utility of L6 myotubes as a model for studying insulin sensitivity and metabolism, including our own previous work (PMID: 19805130, 31693893, 19915010).

      In addition, we have provided evidence of the similarities between L6 cells overexpressing SMPD5 and human muscle biopsies at protein levels and the reproducibility of the negative correlation between ceramide and Coenzyme Q observed in L6 cells in vivo, specifically in the skeletal muscle of mice in chow diet. These findings support the relevance of our in vitro results to in vivo muscle metabolism.

      Finally, we will supplement our findings by demonstrating a comparable relationship between ceramide and Coenzyme Q in mice exposed to a high-fat diet, to be shown in Supplementary Figure 4 H-I. Further animal experiments will be performed to validate our cell-line based conclusions. We hope that these additional results address the concerns raised by the reviewer and further support the relevance of our in vitro findings to in vivo muscle metabolism and insulin action.

      Points from reviewer 1:

      1. Although the authors' results suggest that higher mitochondrial ceramide levels suppress cellular insulin sensitivity, they rely solely on a partial inhibition (i.e., 30%) of insulin-stimulated GLUT4-HA translocation in L6 myocytes. It would be critical to examine how much the increased mitochondrial ceramide would inhibit insulin-induced glucose uptake in myocytes using radiolabel deoxy-glucose.

      Response: The primary impact of insulin is to facilitate the translocation of glucose transporter type 4 (GLUT4) to the cell surface, which effectively enhances the maximum rate of glucose uptake into cells. Therefore, assessing the quantity of GLUT4 present at the cell surface in non-permeabilized cells is widely regarded as the most reliable measure of insulin sensitivity (PMID: 36283703, 35594055, 34285405). Additionally, plasma membrane GLUT4 and glucose uptake are highly correlated. Whilst we have routinely measured glucose uptake with radiolabelled glucose in the past, we do not believe that evaluating glucose uptake provides a better assessment of insulin sensitivity than GLUT4.

      We will clarify the use of GLUT4 translocation in the Results section:

      “...For this reason, several in vitro models have been employed involving incubation of insulin sensitive cell types with lipids such as palmitate to mimic lipotoxicity in vivo. In this study we will use cell surface GLUT4-HA abundance as the main readout of insulin response...”

      1. Another important question to be addressed is whether glycogen synthesis is affected in myocytes under these experimental conditions. Results demonstrating reductions in insulin-stimulated glucose transport and glycogen synthesis in myocytes with dysfunctional mitochondria due to ceramide accumulation would further support the authors' claim.

      Response: We have carried out supplementary experiments to investigate glycogen synthesis in our insulin-resistant models. Our approach involved L6-myotubes overexpressing the mitochondrial-targeted construct ASAH1 (as described in Fig. 3). We then challenged them with palmitate and measured glycogen synthesis using 14C radiolabeled glucose. Our observations indicated that palmitate suppressed insulin-induced glycogen synthesis, which was effectively prevented by the overexpression of ASAH1 (N = 5, * p<0.05). These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism.

      These data will be added to Supplementary Figure 4K and the results modified as follows:

      “Notably, mtASAH1 overexpression protected cells from palmitate-induced insulin resistance without affecting basal insulin sensitivity (Fig. 3E). Similar results were observed using insulin-induced glycogen synthesis as an ortholog technique for Glut4 translocation. These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism (Sup. Fig. 5K). Importantly, mtASAH1 overexpression did not rescue insulin sensitivity in cells depleted…”

      We will add to the method section:

      “L6 myotubes overexpressing ASAH were grown and differentiated in 12-well plates, as described in the Cell lines section, and stimulated for 16 h with palmitate-BSA or EtOH-BSA, as detailed in the Induction of insulin resistance section.

      On day seven of differentiation, myotubes were serum starved in plain DMEM for 3 and a half hours. After incubation for 1 hour at 37C with 2 µCi/ml D-[U-14C]-glucose in the presence or absence of 100 nM insulin, glycogen synthesis assay was performed, as previously described (Zarini S. et al., J Lipid Res, 63(10): 100270, 2022).”

      1. In addition, it would be critical to assess whether the increased mitochondrial ceramide and consequent lowering of energy levels affect all exocytic pathways in L6 myoblasts or just the GLUT4 trafficking. Is the secretory pathway also disrupted under these conditions?

      Response: As the secretory pathway primarily involves the synthesis and transportation of soluble proteins that are secreted into the extracellular space, and given that the majority of cellular transmembrane proteins (excluding those of the mitochondria) use this pathway to arrive at their ultimate destination, we believe that the question posed by the reviewer is highly challenging and beyond the scope of our research. We will add this to the discussion:

      “...the abundance of mPTP associated proteins suggesting a role of this pore in ceramide induced insulin resistance (Sup. Fig. 6E). In addition, it is yet to be determined whether the trafficking defect is specific to Glut4 or if it affects the exocytic-secretory pathway more broadly…”

      Points from reviewer 2:

      1. The mechanistic aspect of the work and conclusions put forth rely heavily on studies performed in cultured myocytes, which are highly glycolytic and generally viewed as a poor model for studying muscle metabolism and insulin action. Nonetheless, the findings provide a strong rationale for moving this line of investigation into mouse gain/loss of function models.

      Response: The relative contribution of the anaerobic (glycolysis) and aerobic (mitochondria) contribution to the muscle metabolism can change in L6 depending on differentiation stage. For instance, Serrage et al (PMID30701682) demonstrated that L6-myotubes have a higher mitochondrial abundance and aerobic metabolism than L6-myoblasts. Others have used elegant transcriptomic analysis and metabolic characterisation comparing different skeletal muscle models for studying insulin sensitivity. For instance, Abdelmoez et al in 2020 (PMID31825657) reported that L6 myotubes exhibit greater insulin-stimulated glucose uptake and oxidative capacity compared with C2C12 and Human Mesenchymal Stem Cells (HMSC). Overall, L6 cells exhibit higher metabolic rates and primarily rely on aerobic metabolism, while C2C12 and HSMC cells rely on anaerobic glycolysis. It is worth noting that L6 myotubes are the cell line most closely related to adult human muscle when compared with other muscle cell lines (PMID31825657). Our presented results in Figure 6 H and I provide evidence for the similarities between L6 cells overexpressing SMPD5 and human muscle biopsies. Additionally, in Figure 3J-K, we demonstrate the reproducibility of the negative correlation between ceramide and Coenzyme Q observed in L6 cells in vivo, specifically in the skeletal muscle of mice in chow diet. Furthermore, we have supplemented these findings by demonstrating a comparable relationship in mice exposed to a high-fat diet, as shown in Supplementary Figure 4 H-I (refer to point 4). We will clarify these points in the Discussion:

      “In this study, we mainly utilised L6-myotubes, which share many important characteristics with primary muscle fibres relevant to our research. Both types of cells exhibit high sensitivity to insulin and respond similarly to maximal doses of insulin, with Glut4 translocation stimulated between 2 to 4 times over basal levels in response to 100 nM insulin (as shown in Fig. 1-4 and (46,47)). Additionally, mitochondrial respiration in L6-myotubes have a similar sensitivity to mitochondrial poisons, as observed in primary muscle fibres (as shown in Fig. 5 (48)). Finally, inhibiting ceramide production increases CoQ levels in both L6-myotubes and adult muscle tissue (as shown in Fig. 2-3). Therefore, L6-myotubes possess the necessary metabolic features to investigate the role of mitochondria in insulin resistance, and this relationship is likely applicable to primary muscle fibres”.

      We will also add additional data - in point 2 - from differentiated human myocytes that are consistent with our observations from the L6 models. Additional experiments are in progress to further extend these findings.

      1. One caveat of the approach taken is that exposure of cells to palmitate alone is not reflective of in vivo physiology. It would be interesting to know if similar effects on CoQ are observed when cells are exposed to a more physiological mixture of fatty acids that includes a high ratio of palmitate, but better mimics in vivo nutrition.

      Response: Palmitate is widely recognized as a trigger for insulin resistance and ceramide accumulation, which mimics the insulin resistance induced by a diet in rodents and humans. Previous studies have compared the effects of a lipid mixture versus palmitate on inducing insulin resistance in skeletal muscle, and have found that the strong disruption in insulin sensitivity caused by palmitate exposure was lessened with physiologic mixtures of fatty acids, even with a high proportion of saturated fatty acids. This was associated, in part, to the selective partitioning of fatty acids into neutral lipids (such as TAG) when muscle cells are exposed to physiologic lipid mixtures (Newsom et al PMID25793412). Hence, we think that using palmitate is a better strategy to study lipid-induced insulin resistance in vitro. We will add to results:

      “In vitro, palmitate conjugated with BSA is the preferred strategy for inducing insulin resistance, as lipid mixtures tend to partition into triacylglycerides (33)”.

      We are also performing additional in vivo experiments to add to the physiological relevance of the findings.

      1. While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. This could be more representative of toxicity rather than pathophysiology. It would be helpful to know if these same effects are observed with other manipulations that lower CoQ to a similar degree. If not, the discrepancies should be discussed.

      Response: We conducted a staining procedure using the mitochondrial marker mitoDsRED to observe the effect of SMPD5 overexpression on cell toxicity. The resulting images, displayed in the figure below (Author response image 1), demonstrate that the overexpression of SMPD5 did not result in any significant changes in cell morphology or impact the differentiation potential of our myoblasts into myotubes.

      Author response image 1.

      In addition, we evaluated cell viability in HeLa cells following exposure to SACLAC (2 uM) to induce CoQ depletion (left panel). Specifically, we measured cell death by monitoring the uptake of Propidium iodide (PI) as shown in the right panel. Our results demonstrated that Saclac-induced CoQ depletion did not lead to cell death at the doses used for CoQ depletion (Author response image 2).

      Author response image 2.

      Therefore, we deemed it improbable that the observed effect is caused by cellular toxicity, but rather represents a pathological condition induced by elevated levels of ceramides. We will add to discussion:

      “...downregulation of the respirasome induced by ceramides may lead to CoQ depletion. Despite the significant impact of ceramide on mitochondrial respiration, we did not observe any indications of cell damage in any of the treatments, suggesting that our models are not explained by toxic/cell death events.”

      1. The conclusions could be strengthened by more extensive studies in mice to assess the interplay between mitochondrial ceramides, CoQ depletion and ETC/mitochondrial dysfunction in the context of a standard diet versus HF diet-induced insulin resistance. Does P053 affect mitochondrial ceramide, ETC protein abundance, mitochondrial function, and muscle insulin sensitivity in the predicted directions?

      Response: We would like to note that the metabolic characterization and assessment of ETC/mitochondrial function in these mice (both fed a high-fat (HF) and chow diet, with or without P053) were previously published (Turner N, PMID30131496). In addition to this, we have conducted targeted metabolomic and lipidomic analyses to investigate the impact of P053 on ceramide and CoQ levels in HF-fed mice. As illustrated in the figures below (Author response image 3), the administration of P053 led to a reduction in ceramide levels (left panel) and an increase in CoQ levels (right panel) in HF-fed mice, which is consistent with our in vitro findings.

      Author response image 3.

      We will add to results:

      “…similar effect was observed in mice exposed to a high fat diet for 5 wks (Supp. Fig. 4H-I further phenotypic and metabolic characterization of these animals can be found in (41))”

      We will further perform more in-vivo studies to corroborate these findings.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Alonso-Calleja and colleagues explore the role of TGR5 in adult hematopoiesis at both steady state and post-transplantation. The authors utilize two different mouse models including a TGR5-GFP reporter mouse to analyze the expression of TGR5 in various hematopoietic cell subsets. Using germline Tgr5-/- mice it's reported that loss of Tgr5 has no significant impact on steady-state hematopoiesis, with a small decrease in trabecular bone fraction, associated with a reduction in proximal tibia adipose tissue, and an increase in marrow phenotypic adipocytic precursors. The authors further explored the role of stroma TGR5 expression in the hematopoietic recovery upon bone marrow transplantation of wild-type cells, although the studies supporting this claim are weak. Overall, while most of the hematopoietic phenotypes have negative results or small effects, the role of TGR5 in adipose tissue regulation is interesting to the field.

      We thank Reviewer 1 for having identified some strengths and weaknesses of our study. As summarized below, we will work to consolidate the weaknesses of our study.

      Strengths:

      • This is the first time the role of TGR5 has been examined in the bone marrow.

      • This paper supports further exploration of the role of bile acids in bone marrow transplantation and possible therapeutic strategies.

      Weaknesses:

      • The authors fail to describe whether niche stroma cells or adipocyte progenitor cells (APCs) express TGR5.

      We are currently working to address this question using our reporter model and expect to be able to provide the data in the next version of the reviewed preprint.

      • Although the authors note a significant reduction in bone marrow adipose tissue in Tgr5-/- mice, they do not address whether this is white or brown adipose tissue especially since BA-TGR5 signaling has been shown to play a role in beiging.

      The nature of BMAT and how it relates to brown, white or brown/beige adipose tissue has been a persistent question in the field. Our understanding is that BMAT is currently considered a distinct adipose depot that is neither white nor brown/beige. BMAT does not express UCP1 to an appreciable extent, with reports showing its expressing possibly detecting contamination by tissues surrounding bone (Craft et al., 2019). Beyond this consideration, as the regulated BMAT in TGR5-/- mice is almost absent, determination of the brown/beige vs white nature of the regulated BMAT remains technically challenging.

      In Figure 1, the authors explore different progenitor subsets but stop short of describing whether TGR5 is expressed in hematopoietic stem cells (HSCs).

      Figure 1 of the originally submitted manuscript described TGR5 expression in committed myeloid progenitors (CMP, GMP and MEP). Below we provide the requested data (expression in MPPs and HSCs in Author response image 1) and we have further expanded our data with the expression in megakaryocyte progenitors (MkProg - Lin-cKit+Sca1-CD41+CD150+) as shown in Author response image 2.

      Author response image 1.

      Frequencies of GFP+ cells in MPPs and HSCs in the BM of 8-12-week-old male TGR5:GFP mice and their controls (n=9 for Wild-type control mice, n=11 for TGR5:GFP mice). Results represent the mean ± s.e.m., n represents biologically independent replicates. Two-tailed Student’s t-test was used for statistical analysis. p-values (exact value) are indicated.

      Author response image 2.

      A, representative flow cytometry gating strategy used to identify megakaryocyte progenitors (MkProg) and GFP positivity in TGR5:GFP mice and their wild-type controls. B, frequencies of GFP+ cells in MkProg population in the BM of 8-12-week-old male TGR5:GFP mice and their controls (n=3 for Wild-type control mice, n=4 for TGR5:GFP mice). Results represent the mean ± s.e.m., n represents biologically independent replicates. Two-tailed Student’s t-test (B) was used for statistical analysis. p-values (exact value) are indicated.

      • Are there more CD45+ cells in the BM because hematopoietic cells are proliferating more due to a direct effect of the loss of Tgr5 or is it because there is just more space due to less trabecular bone?

      While we do not have direct evidence to address this question, we see approximately an average 20% increase in CD45+ cell counts in the baseline Tgr5-/- mice. The absolute volume of bone and BMAT lost in these animals does not account for 20% of the total volume of the medullary cavity, so we speculate that the increase in CD45+ counts is not due exclusively to an increase in available volume.

      • In Figure 4 no absolute cell counts are provided to support the increase in immunophenotypic APCs (CD45-Ter119-CD31-Sca1+CD24-) in the stroma of Tgr5-/- mice. Accordingly, the absolute number of total stromal cells and other stroma niche cells such as MSCs, ECs are missing.

      We initially chose not to report the total number of cells per leg, as the processing of the bones for stroma isolation is less homogenous than that of the HSPC populations (which we do by crushing whole bones with a mortar and pestle). Regardless of these considerations, the data for absolute counts of APCs (left panel), the stroma-enriched fraction (CD45-Ter119-CD31- - middle panel) and endothelial cells (CD45-Ter119-CD31+ - right panel) is provided in Author response image 3. Note that the number of cells plated for CFU-F and BMSC in vitro differentiation is constant between the genotypes, thus confirming the importance of ther elative abundance data shown in the submitted version of the manuscript. In conclusion, we have prioritized the data showing the relative overrepresentation of APC progenitors in the BM stroma as measured by flow cytometry in a per cell basis, which is in line with the functional in vitro data. Further studies could address the specific question through 3D wholemount studies once APC in situ markers are firmly characterized.

      Author response image 3.

      Left panel: absolute number of adipocyte progenitor cells (APCs) in the CD45-Ter119-CD31- BM stromal gate for bothTgr5+/+ and Tgr5−/− (n=5). Middle panel: absolute number of cells isolated from the stroma-enriched BM fraction (CD45-Ter119-CD31-) in the same mice. Right panel: absolute number of endothelial cells, defined as CD45-Ter119-CD31+, in the same BM isolates.

      • There are issues with the reciprocal transplantation design in Fig 4. Why did the authors choose such a low dose (250 000) of BM cells to transplant? If the effect is true and relevant, the early recovery would be observed independently of the setup and a more robust engraftment dataset would be observed without having lethality post-transplant. On the same note, it's surprising that the authors report ~70% lethality post-transplant from wild-type control mice (Fig 4E), according to the literature 200 000 BM cells should ensure the survival of the recipient post-TBI. Overall, the results even in such a stringent setup still show minimal differences and the study lacks further in-depth analyses to support the main claim.

      We thank the reviewer for this comment. On the one hand, we disagree on the relevance of the effect size, as Tgr5-/- mice recover from low levels of platelets significantly faster than the Tgr5+/+ controls. Underlining the relevance, in a clinical setting, G-CSF is administered to patients routinely even if the acceleration of recovery is of 1-2 days (Trivedi et al., 2009).

      From the point of view of the mortality, we agree that it is higher than expected. We have suffered from cases of swollen muzzles syndrome in our facilities that have greatly hampered our ability to perform myeloablation experiments (Garrett et al., 2019), as even sublethal doses have resulted in the appearance of severe side effects that are reasons for euthanasia under Swiss legislation. For example, a strong reduction in mobility requires immediate euthanasia. All experiments were performed blinded to genotype allocation, so we can reasonably exclude experimenter bias. Finally, it could be argued that mice with more marked symptomatology leading to euthanasia are more likely to have hematopoietic deficits, which in our case was mostly seen for Tgr5+/+animals. We have therefore chosen to report mortality together with the longitudinal assessment of peripheral blood counts.

      • Mechanistically, how does the loss of Tgr5 impact hematopoietic regeneration following sublethal irradiation?

      The question of a non-lethal hematopoietic stress is a very relevant one. Unfortunately, and as delineated in the previous point, we have been seriously conditioned by cases of swollen muzzles syndrome (Garrett et al., 2019) that have stopped us from proceeding with more irradiation studies. We will profit from the change of animal facility that will consolidate during the upcoming year Labora(tory of Regenerative Hematopoiesis) to address this point in follow-up studies.

      • Only male mice were used throughout this study. It would be beneficial to know whether female mice show similar results.

      We agree with this comment, and we expect to include the characterization of BM microenvironment (Figure 3 of the current manuscript) in females in the reviewed version of the manuscript when a suitable cohort becomes available.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, the authors examined the role of the bile acid receptor TGR5 in the bone marrow under steady-state and stress hematopoiesis. They initially showed the expression of TGR5 in hematopoietic compartments and that loss of TGR5 doesn't impair steady-state hematopoiesis. They further demonstrated that TGR5 knockout significantly decreases BMAT, increases the APC population, and accelerates the recovery upon bone marrow transplantation.

      Strengths: The manuscript is well-structured and well-written.

      We thank Reviewer #2 for this comment.

      Weaknesses: The mechanism is not clear, and additional studies need to be performed to support the authors' conclusion.

      We agree with Reviewer #2 that more studies are needed to understand what the role of TGR5 in the hematopoietic system is. We have been hampered in our studies of stress hematopoiesis because of frequent cases of swollen muzzles syndrome (Garrett et al., 2019), which has made difficult to continue with experiments involving myelosuppression (see response to Reviewer #1 as well). Further studies are planned or ongoing, including determining the role of the microbiome on the observed TGR5 bone and hematopoiesis stress phenotypes, but will be the focus of a separate study.

      References

      Craft, C.S., Robles, H., Lorenz, M.R., Hilker, E.D., Magee, K.L., Andersen, T.L., Cawthorn, W.P., MacDougald, O.A., Harris, C.A., Scheller, E.L., 2019. Bone marrow adipose tissue does not express UCP1 during development or adrenergic-induced remodeling. Sci Rep 9, 17427. https://doi.org/10.1038/s41598-019-54036-x

      Garrett, J., Sampson, C.H., Plett, P.A., Crisler, R., Parker, J., Venezia, R., Chua, H.L., Hickman, D.L., Booth, C., MacVittie, T., Orschell, C.M., Dynlacht, J.R., 2019. Characterization and Etiology of Swollen Muzzles in Irradiated Mice. Radiat Res 191, 31–42. https://doi.org/10.1667/RR14724.1

      Trivedi, M., Martinez, S., Corringham, S., Medley, K., Ball, E.D., 2009. Optimal use of G-CSF administration after hematopoietic SCT. Bone Marrow Transplant 43, 895–908. https://doi.org/10.1038/bmt.2009.75

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the mechanism of amyloid nucleation in a cellular system using their novel ratiometric measurements and uncover interesting insights regarding the role of polyglutamine length and the sequence features of glutamine-rich regions on amyloid formation. Overall, the problem is significant and being able to assess nucleation in cells is of considerable relevance. The data, as presented and analyzed, are currently still incomplete. The specific claims would be stronger if based on in vitro measurements that avoid the intricacies of specific cellular systems and that are more suitable for assessing sequence-intrinsic properties.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in a constructive public dialogue.

      Reviewer #1 (Public Review):

      The authors take on the challenge of defining the core nucleus for amyloid formation by polyglutamine tracts. This rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids. Using their unique assay, deployed in yeast, the authors attempt to infer the size of the nucleus that templates amyloid formation by polyQ. Further, through a series of sequence titrations, all studied using a single type of assay, the authors converge on an assertion stating that a single polyQ molecule is the nucleus for amyloid formation, that 12-residues make up the core of the nucleus, that it takes ca. 60 Qs in a row to unmask this nucleation potential, and that polyQ amyloid formation belongs to the same universality class as self-poisoned crystallization, which is the hallmark of crystallization from polymer melts formed by large, high molecular weight synthetic polymers. Unfortunately, the authors have decided to lean in hard on their assertions without a critical assessment of whether their findings stand up to scrutiny. If their findings are truly an intrinsic property of polyQ molecules, then their findings should be reconstituted in vitro. Unfortunately, careful and rigorous experiments in vitro show that there is a threshold concentration for forming fibrillar solids. This threshold concentration depends on the flanking sequence context on temperature and on solution conditions. The existence of a threshold concentration defies the expectation of a monomer nucleus. The findings disagree with in vitro data presented by Crick et al., and ignored by the authors. Please see: https://doi.org/10.1073/pnas.1320626110. These reports present data from very different assays, the importance of which was underscored first by Regina Murphy and colleagues. The work of Crick et al., provides a detailed thermodynamic framework - see the SI Appendix. This framework dove tails with theory and simulations of Zhang and Muthukumar, which explains exactly how a system like polyQ might work (https://doi.org/10.1063/1.3050295). The picture one paints is radically different from what the authors converge upon. One is inclined to lean toward data that are gleaned using multiple methods in vitro because the test tube does not have all the confounding effects of a cellular milieu, especially when it comes to focusing on sequence-intrinsic conformational transitions of a protein. In addition to concerns about the limitations of the DAmFRET method, which based on the work of the authors in their collaborative paper by Posey et al., are being stretched to the limit, there is the real possibility that the cellular milieu, unique to the system being studied, is enabling transitions that are not necessarily intrinsic to the sequence alone. A nod in this direction is the work of Marc Diamond, which showed that having stabilized the amyloid form of Tau through coacervation, there is a large barrier that limits the loss of amyloid-like structure for Tau. There may well be something similar going on with the polyQ system. If the authors could show that their data are achievable in vitro without anything but physiological buffers one would have more confidence in a model that appears to contradict basic physical principles of how homopolymers self-assemble. Absent such additional evidence, numerous statements seem to be too strong. There are also several claims that are difficult to understand or appreciate.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by protein crystallography. Folded proteins form crystals. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). To extrapolate these familiar examples to our present finding with polyQ, one need only appreciate the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with the ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim and in fact opposes the definition of a nucleus. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (see our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We apologize to the Pappu group for neglecting to cite Crick et al. 2013 in the current preprint. Contrary to the reviewer’s assessment, however, we find that the conclusions of this valuable study do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained above, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is inherently faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of our Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and likely akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). That Crick et al. also observed the formation of a relatively labile amyloid phase when the reactions were started with 50 uM peptide is unsurprising in light of the aforementioned kinetic advantage that large reaction volumes can confer to labile polymorphs, and that high concentrations (in this case, orders of magnitude higher than the likely physiological concentration of polyQ (Wild et al., 2015)) can favor the formation of labile amyloid polymorphs (Ohhashi et al., 2010). Indeed, a contemporaneous study by the Wetzel group using very similar peptide constructs and polyQ lengths -- but beginning with lower concentrations -- found that the relevant saturating concentrations for amyloid lie below their limit of detection of 100 nM (Sahoo et al., 2014).

      Rebuttals to other critiques

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we will modify the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Reviewer #2 (Public Review):

      Numerous neurodegenerative diseases are thought to be driven by the aggregation of proteins into insoluble filaments known as "amyloids". Despite decades of research, the mechanism by which proteins convert from the soluble to insoluble state is poorly understood. In particular, the initial nucleation step is has proven especially elusive to both experiments and simulation. This is because the critical nucleus is thermodynamically unstable, and therefore, occurs too infrequently to directly observe. Furthermore, after nucleation much faster processes like growth and secondary nucleation dominate the kinetics, which makes it difficult to isolate the effects of the initial nucleation event. In this work Kandola et al. attempt to surmount these obstacles using individual yeast cells as microscopic reaction vessels. The large number of cells, and their small size, provides the statistics to separate the cells into pre- and post-nucleation populations, allowing them to obtain nucleation rates under physiological conditions. By systematically introducing mutations into the amyloid-forming polyglutamine core of huntingtin protein, they deduce the probable structure of the amyloid nucleus. This work shows that, despite the complexity of the cellular environment, the seemingly random effects of mutations can be understood with a relatively simple physical model. Furthermore, their model shows how amyloid nucleation and growth differ in significant ways, which provides testable hypotheses for probing how different steps in the aggregation pathway may lead to neurotoxicity.

      In this study Kandola et al. probe the nucleation barrier by observing a bimodal distribution of cells that contain aggregates; the cells containing aggregates have had a stochastic fluctuation allowing the proteins to surmount the barrier, while those without aggregates have yet to have a fluctuation of suitable size. The authors confirm this interpretation with the selective manipulation of the PIN gene, which provides an amyloid template that allows the system to skip the nucleation event.

      In simple systems lacking internal degrees of freedom (i.e., colloids or rigid molecules) the nucleation barrier comes from a significant entropic cost that comes from bringing molecules together. In large aggregates this entropic cost is balanced by attractive interactions between the particles, but small clusters are unable to form the extensive network of stabilizing contacts present in the larger aggregates. Therefore, the initial steps in nucleation incur an entropic cost without compensating attractive interactions (this imbalance can be described as a surface tension). When internal degrees of freedom are present, such as the conformational states of a polypeptide chain, there is an additional contribution to the barrier coming from the loss of conformational entropy required to the adopt aggregation-prone state(s). In such systems the clustering and conformational processes do not necessarily coincide, and a major challenge studying nucleation is to separate out these two contributions to the free energy barrier. Surprisingly, Kandola et al. find that the critical nucleus occurs within a single molecule. This means that the largest contribution to the barrier comes from the conformational entropy cost of adopting the beta-sheet state. Once this state is attained, additional molecules can be recruited with a much lower free energy barrier.

      There are several caveats that come with this result. First, the height of the nucleation barrier(s) comes from the relative strength of the entropic costs compared to the binding affinities. This balance determines how large a nascent nucleus must grow before it can form interactions comparable to a mature aggregate. In amyloid nuclei the first three beta strands form immature contacts consisting of either side chain or backbone contacts, whereas the fourth strand is the first that is able to form both kinds of contacts (as in a mature fibril). This study used relatively long polypeptides of 60 amino acids. This is greater than the 20-40 amino acids found in amyloid-forming molecules like ABeta or IAPP. As a result, Kandola et al.'s molecules are able to fold enough times to create four beta strands and generate mature contacts intramolecularly. The authors make the plausible claim that these intramolecular folds explain the well-known length threshold (L~35) observed in polyQ diseases. The intramolecular folds reduce the importance of clustering multiple molecules together and increase the importance of the conformational states. Similarly, manipulating the sequence or molecular concentrations will be expected to manipulate the relative magnitude of the binding affinities and the clustering entropy, which will shift the relative heights of the entropic barriers.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The authors make an important point that the structure of the nucleus does not necessarily resemble that of the mature fibril. They find that the critical nucleus has a serpentine structure that is required by the need to form four beta strands to get the first mature contacts. However, this structure comes at a cost because residues in the hairpins cannot form strong backbone or zipper interactions. Mature fibrils offer a beta sheet template that allows incoming molecules to form mature contacts immediately. Thus, it is expected that the role of the serpentine nucleus is to template a more extended beta sheet structure that is found in mature fibrils.

      A second caveat of this work is the striking homogeneity of the nucleus structure they describe. This homogeneity is likely to be somewhat illusory. Homopolymers, like polyglutamine, have a discrete translational symmetry, which implies that the hairpins needed to form multiple beta sheets can occur at many places along the sequence. The asparagine residues introduced by the authors place limitations on where the hairpins can occur, and should be expected to increase structural homogeneity. Furthermore, the authors demonstrate that polyglutamine chains close to the minimum length of ~35 will have strict limitations on where the folds must occur in order to attain the required four beta strands.

      We are unsure how to interpret the above statements as a caveat. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      A novel result of this work is the observation of multiple concentration regimes in the nucleation rate. Specifically, they report a plateau-like regime at intermediate regimes in which the nucleation rate is insensitive to protein concentration. The authors attribute this effect to the "self-poisoning" phenomenon observed in growth of some crystals. This is a valid comparison because the homogeneity observed in NMR and crystallography structures of mature fibrils resemble a one-dimensional crystal. Furthermore, the typical elongation rate of amyloid fibrils (on the order of one molecule per second) is many orders of magnitude slower than the molecular collision rate (by factors of 10^6 or more), implying that the search for the beta-sheet state is very slow. This slow conformational search implies the presence of deep kinetic traps that would be prone to poisoning phenomena. However, the observation of poisoning in nucleation during nucleation is striking, particularly in consideration of the expected disorder and concentration sensitivity of the nucleus. Kandola et al.'s structural model of an ordered, intramolecular nucleus explains why the internal states responsible for poisoning are relevant in nucleation.

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this could prove confusing to some readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      To achieve these results the authors used a novel approach involving a systematic series of simple sequences. This is significant because, while individual experiments showed seemingly random behavior, the randomness resolved into clear trends with the systematic approach. These trends provided clues to build a model and guide further experiments.

      Reviewer #3 (Public Review):

      Kandola et al. explore the important and difficult question regarding the initiating event that triggers (nucleates) amyloid fibril growth in glutamine-rich domains. The researchers use a fluorescence technique that they developed, dAMFRET, in a yeast system where they can manipulate the expression level over several orders of magnitude, and they can control the length of the polyglutamine domain as well as the insertion of interfering non-glutamine residues. Using flow cytometry, they can interrogate each of these yeast 'reactors' to test for self-assembly, as detected by FRET.

      In the introduction, the authors provide a fairly thorough yet succinct review of the relevant literature into the mechanisms of polyglutamine-mediated aggregation over the last two decades. The presentation as well as the illustrations in Figure 1A and 1B are difficult to understand, and unfortunately, there is no clear description of the experimental technique that would allow the reader to connect the hypothetical illustrations to the measurement outcomes. The authors do not explain what the FRET signal specifically indicates or what its intensity is correlated to. FRET measures distance between donor and acceptor, but can it be reliably taken as an indicator of a specific beta-sheet conformation and of amyloid? Does the signal increase with both nucleation and with elongation, and is the signal intensity the same if, e.g., there were 5 aggregates of 10 monomers each versus 50 monomeric nuclei? Is there a reason why the AmFRET signal intensity decreases at longer Q even though the number of cells with positive signal increases? Does the number of positive cells increase with time? The authors state later that 'non-amyloid containing cells lacked AmFRET altogether', but this seems to be a tautology - isn't the lack of AmFRET taken as a proof of lack of amyloid? Overall, a clearer description of the experimental method and what is actually measured (and validation of the quantitative interpretation of the FRET signal) would greatly assist the reader in understanding and interpreting the data.

      We believe the difficulty in understanding the illustrations in Figure 1A and 1B is inherent to the subject. We agree that elaborating how DAmFRET works would help the reader, and will add a few sentences to this end. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We will revise the tautological statement by removing “non-amyloid containing”.

      The authors demonstrate that their assay shows that the fraction of cells with AmFRET signal increases strongly with an increase in polyQ length, with a 'threshold around 50-60 glutamines. This roughly correlates with the Q-length dependence of disease. The experiments in which asparagine or other amino acids are inserted at variable positions in the glutamine repeat are creative and thorough, and the data along with the simulations provide compelling support for the proposed Q zipper model. The experiments shown in Figure 5 are strongly supportive of a model where formation of the beta-sheet nucleus is within a monomer. This is a potentially important result, as there are conflicting data in the literature as to whether the nucleus in polyQ is monomer.

      We thank the reviewer for these comments. We wish to clarify one important point, however, concerning the correlation of our data with the pathological length threshold. As we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      I did not find the argument, that their data shows the Q zipper grows in two dimensions, compelling; there are more direct experimental methods to answer this question. I was also confused by the section that Q zippers poison themselves. It would be easier for the reader to follow if the authors first presented their results without interpretation. The data seem more consistent with an argument that, at high concentrations, non-structured polyQ oligomers form which interfere with elongation into structured amyloid assemblies - but such oligomers would not be zippers.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      Although some speculation or hypothesizing is perfectly appropriate in the discussion, overall the authors stretch this beyond what can be supported by the results. A couple of examples: The conclusion that toxicity arises from 'self-poisoned polymer crystals' is not warranted, as there is no relevant data presented in this manuscript. The authors refer to findings 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', but I cannot recall any evidence for this statement in the results section.

      We restricted any mention of toxicity to the introduction and a section in the discussion that is not worded as conclusive. Nevertheless, we will soften the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of the statements.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Bibliography

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301

      Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have planned several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We anticipate that these additions will significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these insightful comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      This is an important point. Although we show that CNO does not produce degeneration of DA neuron terminals, we do not exclude a contribution to the behavioral changes. We agree that this behavioral control is necessary, and will address it in revision with a CNO-only running wheel cohort.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and will complete these experiments in revision.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We will explicitly clarify which mice had access to a running wheel in our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps sterically prevented mice from having access to a running wheel in their home cage.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promotes degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not report an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than the VTA, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel.

      In addition, we are not aware of prior studies that have chronically activated DREADDs to produce neurodegeneration. Other studies have shown that acute excitotoxic stressors can produce neuronal degeneration, but the chronic increase in activity is central to our approach.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      As discussed in greater detail in the results section below, our data suggests this may not be a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and will expand on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking and the existing data are difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). Beyond what is already discussed in the manuscript, additional support for increased activity in PD models include:

      - Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      - Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      - Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We will include and further discuss these important examples in our revision.

      Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity. There will be additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models the possibility of chronically increased pacemaking, and interpretation of our results will be informed as we learn more about how the activity of DA neurons changes in humans in PD. We will discuss and elaborate on these important points in the revision.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. We will clarify this point in our revision. In addition, the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al, Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al, Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al, J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al, Annu Rev Pathol 2011, PMID: 21034221).

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020,  PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. While we briefly discuss calcium and neurodegeneration in the discussion, we will expand on this literature in both the introduction and discussion sections. We will carefully review and contextualize our work within existing frameworks of calcium and neurodegeneration (e.g. Surmeier & Schumacker, J Biol Chem 2013, PMID: 23086948; Verma et al., Transl Neurodegener 2022, PMID: 35078537). We believe that the novelty of our study lies in 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Supplemental Figure 1C, which was unchanged in CNO-treated animals compared to controls. We did not report the resting membrane potential because many of the DA neurons were spontaneously firing. However, we will report the initial membrane potential on first breaking into the cell for the whole cell recordings in the revision, which did not vary between groups. This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing of the neuron by the internal solution. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S1C). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (S1E).  This finding is also consistent with increased activity of the DA neurons. We will add discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this insightful comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Nonetheless, we believe that it conveys useful complementary data. As suggested, we will discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. As part of the planned experiments for our revision, we will perform an additional stereologic analysis to further assess the loss of SNc dopamine neurons. We will also review and ensure the images are representative.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We will include a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We will also include frequency and amplitude data for these recordings.   

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      We will review the expression of activity-related genes in our dataset, although we must keep in mind that these genes may behave differently in the context of chronic activation as opposed to acutely increased activity. We will also include experiments assessing striatal dopamine levels by HPLC in the revision.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing to early PD samples when there is more limited SNc DA neuron loss. Please note the numbers of DA neurons within the areas we have selected for sampling (Figure at right). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration and patients where degeneration is ongoing during their disease.

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV). Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by increasing intracellular calcium to increase neuronal excitability, and our results show increased Ca2+ by fiber photometry and changes to Ca2+-related genes, strongly suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point, as we acknowledged in the text. Additionally, we have planned revision experiments involving chronic isradipine treatment to further test the role of calcium in the mechanism of degeneration in this model.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we can sample SN DA neurons in early PD (see figure above), and in our view there is great value for such comparisons. We agree that discussion of appropriate caveats is warranted and this will be clearly addressed in the revision.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we will include additional electrophysiology experiments and add discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150) while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020,  PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We will amend our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we will revise the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

    1. Author response:

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.

      (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author Response

      Response to Reviewer 1:

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths: The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for acknowledging the strengths of this work. The assumption to average gene expression data across individual cells within a given cell type was made in response to the inherent limitations of, for example, the mouse retina dataset, where individual cell-level connectivity and gene expression data are not profiled jointly (the second scenario in our paper). This approach was a necessary compromise to facilitate the analysis at the cell type level. However, in datasets where individual cell-level connectivity and gene expression data are matched, such as the C.elegans dataset referenced below, our model can be applied to achieve single-cell resolution (the first scenario in our paper), offering a more detailed understanding of genetic underpinnings in neuronal connectivity.

      Weaknesses: The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      I value the suggestion of validating the propose method in another dataset. In response, I found the C.elegans dataset in the references the reviewer suggested below a good candidate for this purpose, and I plan to explore this dataset and incorporate findings in the revised manuscript. I understand the concerns regarding the PCA methodology and its potential impact on the model’s resolution and reproducibility. In response, alternative methods, such as regularization techniques, will be explored to address these issues. Additionally, I agree that enhancing the clarity and readability of Figure 5, as well as including a more comprehensive discussion of the model’s limitations, would significantly strengthen the manuscript.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      Thank you for highlighting the importance of comparing our model with others, particularly those mentioned in your comments. After reviewing these papers, I find that our bilinear model aligns closely with the methods described, especially in [1, 2]. To see this, let’s start with Equation 1 in Kovács et al. [2]:

      In this equation, B represents the connectivity matrix, while X denotes the gene expression patterns of individual neurons in C.elegans. The operator O is the genetic rule operator governing synapse formation, linking connectivity with individual neuronal expression patterns. It’s noteworthy that the work of Barabási and Barabási [1] explores a specific application of this framework, focusing on O for B that represents biclique motifs in the C.elegans neural network.

      To identify the the operator O, the authors sought to minimize the squared residual error:

      with regularization on O.

      Adopting the notation from our bilinear model paper and using Z to represent the connectivity matrix, the above becomes

      Coming back to the bilinear model formulation, the optimization problem, as formulated for the C.elegans dataset where individual neuron connectivity and gene expression are accessible, takes the form:

      where we consider each neuron as a distinct neuronal type. In addition, we extend the dimensions of X and Y to encompass the entire set of neurons in C.elegans, with X = Y ∈ Rn×p, where n signifies the total number of neurons and p the number of genes. Accordingly, our optimization challenge evolves into:

      Upon comparison with the earlier stated equation, it becomes clear that our approach aligns consistently with the notion of O = ABT. This effectively results in a decomposition of the genetic rule operator O. This decomposition extends beyond mere mathematical convenience, offering several substantial benefits reminiscent of those seen in the collaborative filtering of recommendation systems:

      • Computational Efficiency: The primary advantage of this approach is its improvement in computational efficiency. For instance, solving for O ∈ Rp×p necessitates determining p2 entries. In contrast, solving for A ∈ Rp×d and B ∈ Rp×d involves determining only 2pd entries, where p is the number of genes, and d is the number of latent dimensions. Assuming the existence of a lower-dimensional latent space (d << p) that captures the essential variability in connectivity, resolving A and B becomes markedly more efficient than resolving O. Additionally, from a computational system design perspective, inferring the connectivity of a neuron allows for caching the latent embeddings of presynaptic neurons XA or postsynaptic neurons XB with a space complexity of O(nd). This is significantly more space-efficient than caching XO or OXT, which has a space complexity of O(np). This difference is particularly notable when dealing with large numbers of neurons, such as those in the entire mouse brain. The bilinear modeling approach thus enables effective handling of large datasets, simplifying the optimization problem and reducing computational load, thereby making the model more scalable and faster to execute.

      • Interpretability: The separation into A for presynaptic features and B for postsynaptic features provides a clearer understanding of the distinct roles of pre- and post- synaptic neurons in forming the connection. By projecting the pre- and post- synaptic neurons into a shared latent space through XA and YB, one can identify meaningful representations within each axis, as exemplified in different motifs from the mouse retina dataset. The linear characteristics of A and B facilitate direct evaluation of each gene’s contribution to a latent dimension. This interpretability, offering insights into the genetic factors influencing synaptic connections, is beyond what O could provide itself.

      • Flexibility and Adaptability: The bilinear model’s adaptability is another strength. Much like collaborative filtering, which can manage very different user and item features, our bilinear model can be tailored to synaptic partners with genetic data from varied sources. A potential application of this model is in deciphering the genetic correlates of long-range projectomic rules, where pre- and post-synaptic neurons are processed and sequenced separately, or even involving post-synaptic targets being brain regions with genetic information acquired through bulk sequencing. This level of flexibility also allows for model adjustments or extensions to incorporate other biological factors, such as proteomics, thereby broadening its utility across various research inquiries into the determinants of neuronal connectivity.

      In the study by Taylor et al. [3], the authors introduced a generalization of differential gene expressions (DGE) analysis called network DGE (nDGE) to identify genetic determinants of synaptic connections. It focuses on genes co-expressed across pairs of neurons connected, compared with pairs without connection.

      As the authors acknowledged in the method part of the paper, nDGE can only examine single genes co-expressed at synaptic terminals: "While the nDGE technique introduced here is a generalization of standard DGE, interrogating the contribution of pairs of genes in the formation and maintenance of synapses between pairs of neurons, nDGE can only account for a single co-expressed gene in either of the two synaptic terminals (pre/post)."

      In contrast, the bilinear model offers a more comprehensive analysis by seeking a linear combination of gene expressions in both pre- and post-synaptic neurons. This model goes beyond the scope of examining individual co-expressed genes, as it incorporates different weights for the gene expressions of pre- and post-synaptic neurons. This feature of the bilinear model enables it to capture not only homogeneous but also complex and heterogeneous genetic interactions that are pivotal in synaptic connectivity. This highlights the bilinear model’s capability to delve into the intricate interactions of synaptic gene expression.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      I value your appraisal. In response, additional validation of the bilinear model on a second dataset will be undertaken.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      Thank you for your positive assessment of the potential impact of the study.

      Response to Reviewer 2:

      Summary: In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths: This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      1. The study lacks experimental validation of the model’s prediction results.

      Thank you for pointing out the importance of experimental validation. I acknowledge that the current version of the study is focused on the development and validation of the computational model, using the datasets presently available to us. Moving forward, I plan to collaborate with experimental neurobiologists. These collaborations are aimed at validating our model’s predictions, including the delta-protocadherins mentioned in the paper. However, considering the extensive time and resources required for conducting and interpreting experimental results, I believe it is more pragmatic to present a comprehensive experimental study, including the design and execution of experiments informed by the model’s predictions, in a separate follow-up paper. I intend to include a paragraph in the discussion of this paper outlining the future direction for experimental validation.

      1. The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      I recognize the importance of assessing the model across different neuronal systems. In response to similar feedback from Reviewer 1, I am keen to extend the study to include the C.elegans dataset mentioned earlier. The results from applying our bilinear model to the second dataset will be incorporated into the revised manuscript.

      1. The proposed method relies on the availability of neuronal connectomic data for model training, which may be limited or absent in certain brain connectivity settings.

      The concern regarding the dependency of our model on the availability of connectomic data is valid. While complete connectomes are available for organisms like C.elegans and Drosophila, and efforts are underway to map the connectome of the entire mouse brain, such data may not always be accessible for all research contexts. Recognizing this limitation, part of the ongoing research is to explore ways to adapt our model to the available data, such as projectomic data. Furthermore, our bilinear model is compatible with trans-synaptic virus-based sequencing techniques [4, 5], allowing us to leverage data from these experimental approaches to uncover the genetic underpinnings of neuronal connectivity. These initiatives are crucial steps towards broadening the applicability of our model, ensuring its relevance and usefulness in diverse brain connectivity studies where detailed connectomic data may not be readily available.

      References

      [1] Dániel L. Barabási and Albert-László Barabási. A genetic model of the connectome. Neuron, 105(3):435–445, 2020.

      [2] István A. Kovács, Dániel L. Barabási, and Albert-László Barabási. Uncovering the genetic blueprint of the c. elegans nervous system. Proceedings of the National Academy of Sciences, 117(52):33570–33577, 2020.

      [3] Seth R. Taylor, Gabriel Santpere, Alexis Weinreb, Alec Barrett, Molly B. Reilly, Chuan Xu, Erdem Varol, Panos Oikonomou, Lori Glenwinkel, Rebecca McWhirter, Abigail Poff, Manasa Basavaraju, Ibnul Rafi, Eviatar Yemini, Steven J. Cook, Alexander Abrams, Berta Vidal, Cyril Cros, Saeed Tavazoie, Nenad Sestan, Marc Hammarlund, Oliver Hobert, and David M. 3rd Miller. Molecular topography of an entire nervous system. Cell, 184(16):4329–4347, 2021.

      [4] Nicole Y. Tsai, Fei Wang, Kenichi Toma, Chen Yin, Jun Takatoh, Emily L. Pai, Kongyan Wu, Angela C. Matcham, Luping Yin, Eric J. Dang, Denise K. Marciano, John L. Rubenstein, Fan Wang, Erik M. Ullian, and Xin Duan. Trans-seq maps a selective mammalian retinotectal synapse instructed by nephronectin. Nat Neurosci, 25(5):659–674, May 2022.

      [5] Aixin Zhang, Lei Jin, Shenqin Yao, Makoto Matsuyama, Cindy van Velthoven, Heather Sullivan, Na Sun, Manolis Kellis, Bosiljka Tasic, Ian R. Wickersham, and Xiaoyin Chen. Rabies virusbased barcoded neuroanatomy resolved by single-cell rna and in situ sequencing. bioRxiv, 2023.

    1. Author response:

      Reviewer #1:

      The only minor weakness that I found is the assumption of independence of bacterial species, which is expressed as the well-stirred approximation. One could imagine that bacterial species might cooperate, leading to non-uniform distributions that are real. How to distinguish such situations? I believe that this method can be extended to determine if this is the case or not before the application. For example, if the bacteria species are independent of each other and one can use the binomial distributions, then the Fano factor would be proportional to the overall relative fraction of bacterial species. Maybe a simple test can be added to test it before the application of REPOP. However, I believe that this is a minor issue.

      This is an interesting point raised by the reviewer.

      First, we need to clarify an important point–we do not make a well-stirred assumption. Samples can be drawn and plated from any region of space however small and that region’s population can be quantified using our method. The stirring only occurs after we collect a sample in order to dilute the contents and pour the solution homogeneously over the plate.

      As such, learning multiple independent species is possible and not impacted by the dilution (“wellstirred” assumption). In the revised manuscript we will make it clear that this assumption concerns the dilution process. Any correlation between species arises in the initial sample and should be retained in the plating. Once given the sample, the dilution itself produces independent binomial draws from that point in space from which cultures were harvested. REPOP is designed to recover the true underlying heterogeneity in species abundance (even from limited data) by leveraging a Bayesian framework that remains valid regardless of whether species are independent or correlated.

      If one applies the method for multiple species as is, REPOP can recover the marginal distribution of each species in each plate if they are selectively cultured or many species at once if the colonies are sufficiently distinct. To demonstrate this, we will add a synthetic example with two species whose populations in a sample are correlated to the manuscript.

      However, in order to learn the joint distribution and capture correlations between species within samples, the method would need to be extended. At present, in Eq. 5 we sum the likelihood over all values of n, using a data-driven cutoff (twice the na¨ıvely estimated count times the dilution factor). Extending this to multiple species adding up to (n1,n2), while retain the generality of the method, would require quadratically scaling memory with this cutoff in the population number. For this reason while we will comment on this in the next version of the manuscript, it will not be implemented as part of REPOP.

      Reviewer #2:

      A more thorough discussion of when and by how much estimated microbial population abundance distributions differ from the ground truth would be helpful in determining the best practices for applying this method. Not only would this allow researchers to understand the sampling effort necessary to achieve the results presented here, but it would also contextualize the experimental results presented in the paper. Particularly, there is a disconnect between the discussion of the large sample sizes necessary to achieve accurate multimodal distribution estimates and the small sample sizes used in both experiments.

      That is a great suggestion from the reviewer. To address it, we will expand Appendix B, which currently presents the relative error between the means for the experimental results in Fig. 3, to also include a comparable evaluation for the synthetic data example in Fig. 2.

      Specifically, for each example, we will report (1) the relative error in the estimated means (as already done for Fig. 3), and (2) the Kullback-Leibler (KL) divergence between the reconstructed and ground truth distributions. These metrics will be shown as a function of the size of the dataset, enabling a direct assessment of how the sampling effort affects the precision of the inference.

      That said, we highlight that by explicitly modeling the dilution process within a Bayesian framework, REPOP extracts the mathematically optimal amount of information from each individual sample no matter the sample size. Our strategy therefore leads to better inference with fewer measurements, which is particularly important in applications such as plate counting, where data acquisition is laborintensive.

      Reviewer #3:

      While the study is promising, there are a few areas where the paper could be strengthened to increase its impact and usability. First, the extent to which dilution and plating introduce noise is not fully explored. Could this noise significantly affect experimental conclusions? And under what conditions does it matter most? Does it depend on experimental design or specific parameter values? Clarifying this would help readers appreciate when and why REPOP should be used.

      We agree with the reviewer that this is an important point, and we will expand Appendix B to include a quantitative analysis using simulated data (Fig. 2), reporting both relative error and KL divergence as a function of dataset size. This complements our response to Reviewer #2 clarifying when REPOP offers the greatest benefit.

      In addition, we will expand the discussion on how modeling dilution noise becomes essential when learning population dynamics. In particular, we will emphasize the role of Model 3, especially relevant when working with multiple plates and approaching the asymptotic regime—an aspect that was alluded to in Fig. 3 but not fully explored.

      Second, more practical details about the tool itself would be very helpful. Simply stating that it is available on GitHub may not be enough. Readers will want to know what programming language it uses, what the input data should look like, and ideally, see a step-by-step diagram of the workflow. Packaging the tool as an easy-to-use resource, perhaps even submitting it to CRAN or including example scripts, would go a long way, especially since microbiologists tend to favor user-friendly, recipe-like solutions.

      We will update the introduction to reinforce that REPOP is written in Python(PyTorch), installable via pip, and designed for ease of use. We are also expanding the tutorials to include clearer guidance on data formatting and common workflows. Author response image 1 will be added in the revised manuscript to better illustrate the full application process.

      Author response image 1.

      Third, it would be great to see the method tested on existing datasets, such as those from Nic Vega and Jeff Gore (2017), which explore how colonization frequency impacts abundance fluctuation distributions. Even if the general conclusions remain unchanged, showing that REPOP can better match observed patterns would strengthen the paper’s real-world relevance.

      That is a great suggestion from the reviewer. We will demonstrate the application of REPOP to datasets such as that of Vega and Gore (Ref. 27 in the manuscript), as well as other publicly available datasets, in the revised version.

      Lastly, it would be helpful for the authors to briefly discuss the limitations of their method, as no approach is without its constraints. Acknowledging these would provide a more balanced and transparent perspective.

      We agree with the reviewer on that. A new subsection will explicitly address the assumptions of our method, and therefore its limitations, including assumptions about species classification, computational cost of joint inference, and dependence on accurate dilution modeling. This discussion will synthesize points raised throughout our response to all reviewers.

    1. Author response:

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. A previous study by Tschan et al(1) has shown that PU.1 attenuates the transcriptional activity of the p53 tumor suppressor family through direct binding to the DNA-binding and/or the oligomerization domains of p53/p73 proteins. We will discuss this point in the revised manuscript and cite this paper accordingly. Moreover, to further investigate the interaction between Pu.1 and Tp53 in zebrafish, we intend to perform a comprehensive analysis of the tp53 promoter region utilizing bioinformatic prediction tools. This approach aims to identify potential Pu.1 binding sites, thereby providing insights into the direct regulatory interactions between Pu.1 and the tp53 promoter in zebrafish. 

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      In the revised manuscript, we will elaborate on the methodological details of the RNA analysis. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We will explicitly acknowledge this technical constraint in the revised manuscript to ensure methodological transparency.

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Author response image 1). Microglial death occurs only when Pu.1 is disrupted in the spi-b mutant background, in both embryonic and adult brains. The blebbing morphology of some microglia after pu.1 conditional knock out in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic (Figure S4) and adult stages Author response image 2). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Figure 2) versus conditional pu.1 ablation. Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We plan to include this clarification in the revised manuscript.

      Author response image 1.

      Conditional depletion of Pu.1 in embryonic microglia had no effect for their short-term survival. (A) Schematics of 4-OHT treatment for pu.1<sup>KI/WT</sup> Tg(coro1a:CreER) and pu.1<sup>KI/Δ839</sup> Tg(coro1a:CreER) at embryonic stage. (B) Representative images of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 5 dpf. (C) Quantification of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 3 dpf and 5 dpf. Values represent means ± SD, n.s., P >0.05.

      Author response image 2. Simultaneous inactivation of Pu.1 and Spi-b lead to microglia death in adult zebrafish. (A) The experimental setup for pu.1 conditional knockout in adult spi-b<sup>Δ232/Δ232</sup> mutants (B) the representative images of the midbrain cross section of adult pu.1<sup>KI/+</sup>;spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) and pu.1<sup>KI/WT</sup>spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) fish at 2 dpi. The white arrow indicates microglia with blebbing morphology.

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of SPI-B expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineagespecific roles, becoming absent in microglia. We will expand on this evolutionary divergence and its implications for microglial regulation in the revised manuscript.

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We plan to represent our data as mean ± SD in the revised manuscript.

      Reference:

      (1) Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE. PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene. 2008 May 29;27(24):3489-93.

    1. Author response:

      eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control. 

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we plan to perform additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity. 

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. We would like to mention that the 18 cells plotted in Supplementary figure 1 were only from the duration cell category. To improve the clarity of our results, we are going to provide information regarding the number of cells from each rat in our revision. In general, we imaged more than 50 cells from each rat. We would also like to point to the data from individual trials in Supplementary figure 1B showing robust sequentiality.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We are going to conduct the analysis as the reviewer recommended. We agree with the reviewer that better presentation of the neural activity will be helpful for the readers.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      We would like to mention that the prediction errors plotted in this graph were calculated from two types of trials. The correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggested a possible use of this neural mechanism to time the action of the rats.

      In addition, we are going to perform the analysis suggested by the reviewer in our revision. We agree that different ways of analyzing the data would provide better characterization of the scaling effect.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer and we have mentioned this caveat in our original manuscript. We are going to rephrase the sentence as the reviewer suggested during our revision.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions. 

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues. 

      Main Concerns 

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of the discussion goes beyond the scope of this study and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’s article, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response in the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we will perform a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the rat during nose poke and analyze its periodicity among different trials, although the orofacial movements may not be visible to us.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should distribute evenly across different trial times, or linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see Author response image 1 below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation. In order to further test the relationship to motivation, we will measure the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We will analyze and report whether this measurement correlates with the nose poking durations in our data in the revision.

      Author response image 1.

      Furthermore, whether the scaling sequential activity we report represents behavioral timing or true time estimation, the reviewer would agree that these activities correlate with the animal’s nose poking durations, and a previous study has showed that PFC silencing led to disruption of the mouse’s timing behavior (PMID: 24367075). The main surprising finding of the paper is that these duration cells are different from the start and end cells in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clue regarding whether they receive inputs from thirst or reward-related brain regions. This may help partially resolve the “time” vs. “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3)The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. There is undoubtedly variance among individual animals. One of the core reasons for statistical comparison is to compare the group difference with the variance due to sampling. It appears that the reviewer would like to require we conduct our analysis using each rat individually. We will conduct and report analysis with individual rat in Figure 1C, Figure 2C, G, K, Figure 4F in our revised manuscript.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We will incorporate more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We will modify the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We will cite and discuss this study in our revised paper.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We will provide this information as requested. The number of animals were also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further. 

      We will label the analyzed sessions in Figure 1B during our revision.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells. 

      We thank the reviewer for the suggestion and will modify the figure accordingly during revision.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC. 

      We thank the reviewer for the question. In our experience, mice with lens implanted in mPFC did not show observable different to mice without surgery regarding the acquisition of the task and the distribution of the nose-poke durations. Although we could not rule out the effect on other cognitive process, the mice appeared to be intact in the scope of our task. We will provide these behavior data during our revision.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MBSHH subgroup (MBWNT n=70, MBSHH n=224, MBGR3 n=143, MBGR4 n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MBSHH compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MBSHH compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MBSHH patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MBSHHb and MBSHHg Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MBSHH tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MBSHH subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MBSHH infant patient tumors (n=80 MBSHHb and MBSHHg tumors) unlike 35% of MBSHHa  (n=65) or 0% of MBSHHd  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by McKim et al seeks to provide a comprehensive description of the connectivity of neurosecretory cells (NSCs) using a high-resolution electron microscopy dataset of the fly brain and several single-cell RNA seq transcriptomic datasets from the brain and peripheral tissues of the fly. They use connectomic analyses to identify discrete functional subgroups of NSCs and describe both the broad architecture of the synaptic inputs to these subgroups as well as some of the specific inputs including from chemosensory pathways. They then demonstrate that NSCs have very few traditional presynapses consistent with their known function as providing paracrine release of neuropeptides. Acknowledging that EM datasets can't account for paracrine release, the authors use several scRNAseq datasets to explore signaling between NSCs and characterize widespread patterns of neuropeptide receptor expression across the brain and several body tissues. The thoroughness of this study allows it to largely achieve it's goal and provides a useful resource for anyone studying neurohormonal signaling.

      Strengths:

      The strengths of this study are the thorough nature of the approach and the integration of several large-scale datasets to address short-comings of individual datasets. The study also acknowledges the limitations that are inherent to studying hormonal signaling and provides interpretations within the the context of these limitations.

      Weaknesses:

      Overall, the framing of this paper needs to be shifted from statements of what was done to what was found. Each subsection, and the narrative within each, is framed on topics such as "synaptic output pathways from NSC" when there are clear and impactful findings such as "NSCs have sparse synaptic output". Framing the manuscript in this way allows the reader to identify broad takeaways that are applicable to other model system. Otherwise, the manuscript risks being encyclopedic in nature. An overall synthesis of the results would help provide the larger context within which this study falls.

      We agree with the reviewer and will replace all the subsection titles as suggested.

      The cartoon schematic in Figure 5A (which is adapted from a 2020 review) has an error. This schematic depicts uniglomerular projection neurons of the antennal lobe projecting directly to the lateral horn (without synapsing in the mushroom bodies) and multiglomerular projection neurons projecting to the mushroom bodies and then lateral horn. This should be reversed (uniglomerular PNs synapse in the calyx and then further project to the LH and multiglomerular PNs project along the mlACT directly to the LH) and is nicely depicted in a Strutz et al 2014 publication in eLife.

      We thank the reviewer for spotting this error. We will modify the schematic as suggested.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive description of the neurosecretory network in the adult Drosophila brain. They sought to assign and verify the types of 80 neurosecretory cells (NSCs) found in the publicly available FlyWire female brain connectome. They then describe the organization of synaptic inputs and outputs across NSC types and outline circuits by which olfaction may regulate NSCs, and by which Corazon-producing NSCs may regulate flight behavior. Leveraging existing transcriptomic data, they also describe the hormone and receptor expressions in the NSCs and suggest putative paracrine signaling between NSCs. Taken together, these analyses provide a framework for future experiments, which may demonstrate whether and how NSCs, and the circuits to which they belong, may shape physiological function or animal behavior.

      Strengths:

      This study uses the FlyWire female brain connectome (Dorkenwald et al. 2023) to assign putative cell types to the 80 neurosecretory cells (NSCs) based on clustering of synaptic connectivity and morphological features. The authors then verify type assignments for selected populations by matching cluster sizes to anatomical localization and cell counts using immunohistochemistry of neuropeptide expression and markers with known co-expression.

      The authors compare their findings to previous work describing the synaptic connectivity of the neurosecretory network in larval Drosophila (Huckesfeld et al., 2021), finding that there are some differences between these developmental stages. Direct comparisons between adults and larvae are made possible through direct comparison in Table 1, as well as the authors' choice to adopt similar (or equivalent) analyses and data visualizations in the present paper's figures.

      The authors extract core themes in NSC synaptic connectivity that speak to their function: different NSC types are downstream of shared presynaptic outputs, suggesting the possibility of joint or coordinated activation, depending on upstream activity. NSCs receive some but not all modalities of sensory input. NSCs have more synaptic inputs than outputs, suggesting they predominantly influence neuronal and whole-body physiology through paracrine and endocrine signaling.

      The authors outline synaptic pathways by which olfactory inputs may influence NSC activity and by which Corazon-releasing NSCs may regulate flight. These analyses provide a basis for future experiments, which may demonstrate whether and how such circuits shape physiological function or animal behavior.

      The authors extract expression patterns of neuropeptides and receptors across NSC cell types from existing transcriptomic data (Davie et al., 2018) and present the hypothesis that NSCs could be interconnected via paracrine signaling. The authors also catalog hormone receptor expression across tissues, drawing from the Fly Cell Atlas (Li et al., 2022).

      Weaknesses:

      The clustering of NSCs by their presynaptic inputs and morphological features, along with corroboration with their anatomical locations, distinguished some, but not all cell types. The authors attempt to distinguish cell types using additional methodologies: immunohistochemistry (Figure 2), retrograde trans-synaptic labeling, and characterization of dense core vesicle characteristics in the FlyWire dataset (Figure 1, Supplement 1). However, these corroborating experiments often lacked experimental replicates, were not rigorously quantified, and/or were presented as singular images from individual animals or even individual cells of interest. The assignments of DH44 and DMS types remain particularly unconvincing.

      We thank the reviewer for this comment. We would like to clarify that the images presented in Figure 2 and Figure 1 Supplement 1 are representative images based on at least 5 independent samples. We will clarify this in the figure caption and methods. The electron micrographs showing dense core vesicle (DCV) characteristics (Figure 1 Supplement E-G) are also representative images based on examination of multiple neurons. However, we agree with the reviewer that a rigorous quantification would be useful to showcase the differences between DCVs from NSC subtypes. Therefore, we have now performed a quantitative analysis of the DCVs in putative m-NSC<sup>DH44</sup> (n=6), putative m-NSC<sup>DMS</sup> (n=6) and descending neurons (n=4) known to express DMS. For consistency, we examined the cross section of each cell where the diameter of nuclei was the largest. We quantified the mean gray value of at least 50 DCV per cell. Our analysis shows that mean gray values of putative m-NSC<sup>DMS</sup> and DMS descending neurons are not significantly different, whereas the mean gray values of m-NSC<sup>DH44</sup> are significantly larger. This analysis is in agreement with our initial conclusion.

      Author response image 1.

      The authors present connectivity diagrams for visualization of putative paracrine signaling between NSCs based on their peptide and receptor expression patterns. These transcriptomic data alone are inadequate for drawing these conclusions, and these connectivity diagrams are untested hypotheses rather than results. The authors do discuss this in the Discussion section.

      We fully agree with the reviewer and will further elaborate on the limitations of our approach in the revised manuscript. However, there is a very high-likelihood that a given NSC subtype can signal to another NSC subtype using a neuropeptide if its receptor is expressed in the target NSC. This is due to the fact that all NSC axons are part of the same nerve bundle (nervi corpora cardiaca) which exits the brain. The axons of different NSCs form release sites that are extremely close to each other. Neuropeptides from these release sites can easily diffuse via the hemolymph to peripheral tissues that (e.g. fat body and ovaries) that are much further away from the release sites on neighboring NSCs. We believe that neuropeptide receptors are expressed in NSCs near these release sites where they can receive inputs not just from the adjacent NSCs but also from other sources such as the gut enteroendocrine cells. Hence, neuropeptide diffusion is not a limiting factor preventing paracrine signaling between NSCs and receptor expression is a good indicator for putative paracrine signaling.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents an ambitious and comprehensive synaptic connectome of neurosecretory cells (NSC) in the Drosophila brain, which highlights the neural circuits underlying hormonal regulation of physiology and behaviour. The authors use EM-based connectomics, retrograde tracing, and previously characterised single-cell transcriptomic data. The goal was to map the inputs to and outputs from NSCs, revealing novel interactions between sensory, motor, and neurosecretory systems. The results are of great value for the field of neuroendocrinology, with implications for understanding how hormonal signals integrate with brain function to coordinate physiology.

      The manuscript is well-written and provides novel insights into the neurosecretory connectome in the adult Drosophila brain. Some, additional behavioural experiments will significantly strengthen the conclusions.

      Strengths:

      (1) Rigorous anatomical analysis

      (2) Novel insights on the wiring logic of the neurosecretory cells.

      Weaknesses:

      (1) Functional validation of findings would greatly improve the manuscript.

      We agree with this reviewer that assessing the functional output from NSCs would improve the manuscript. Given that we currently lack genetic tools to measure hormone levels and that behaviors and physiology are modulated by NSCs on slow timescales, it is difficult to assess the immediate functional impact of the sensory inputs to NSC using approaches such as optogenetics. However, since l-NSC<sup>CRZ</sup> are the only known cell type that provide output to descending neurons, we will functionally test this output pathway using different behavioral assays recommended by this reviewer.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      Thanks for this nice summary of our paper.

      The following points could be addressed in a revision:

      (1) The authors conclude that much of the person-to-person and strain-to-strain variation seems idiosyncratic to individual sera rather than age groups. This point is not yet fully convincing. While the mean titer of an individual may be idiosyncratic to the individual sera, the strain-to-strain variation still reveals some patterns that are consistent across individuals (the authors note the effects of substitutions at sites 145 and 275/276). A more detailed analysis, removing the individual-specific mean titer, could still show shared patterns in groups of individuals that are not necessarily defined by the birth cohort.

      As the reviewer suggests, we normalized the titers for all sera to the geometric mean titer for each individual in the US-based pre-vaccination adults and children. This is only for the 2023-circulating viral strains. We then faceted these normalized titers by the same age groups we used in Figure 6, and the resulting plot is shown below. Although there are differences among virus strains (some are better neutralized than others), there are not obvious age group-specific patterns (eg, the trends in the two facets are similar). To us this suggests that at least for these relatively closely related recent H3N2 strains, the strain-to-strain variation does not obviously segregate by age group. Obviously, it is possible (we think likely) that there would be more obvious age-group specific trends if we looked at a larger swath of viral strains covering a longer time range (eg, over decades of influenza evolution). We plan to add the new plots shown below to a supplemental figure in the revised manuscript.

      Author response image 1.

      Author response image 2.

      (2) The authors show that the fraction of sera with a titer below 138 correlates strongly with the inferred growth rate using MLR. However, the authors also note that there exists a strong correlation between the MLR growth rate and the number of HA1 mutations. This analysis does not yet show that the titers provide substantially more information about the evolutionary success. The actual relation between the measured titers and fitness is certainly more subtle than suggested by the correlation plot in Figure 5. For example, the clades A/Massachusetts and A/Sydney both have a positive fitness at the beginning of 2023, but A/Massachusetts has substantially higher relative fitness than A/Sydney. The growth inference in Figure 5b does not appear to map that difference, and the antigenic data would give the opposite ranking. Similarly, the clades A/Massachusetts and A/Ontario have both positive relative fitness, as correctly identified by the antigenic ranking, but at quite different times (i.e., in different contexts of competing clades). Other clades, like A/St. Petersburg are assigned high growth and high escape but remain at low frequency throughout. Some mention of these effects not mapped by the analysis may be appropriate.

      Thanks for the nice summary of our findings in Figure 5. However, the reviewer is misreading the growth charts when they say that A/Massachusetts/18/2022 has a substantially higher fitness than A/Sydney/332/2023. Figure 5a shows the frequency trajectory of different variants over time. While A/Massachusetts/18/2022 reaches a higher frequency than A/Sydney/332/2023, the trajectory is similar and the reason that A/Massachusetts/18/2022 reached a higher max frequency is that it started at a higher frequency at the beginning of 2023. The MLR growth rate estimates differ from the maximum absolute frequency reached: instead, they reflect how rapidly each strain grows relative to others. In fact, A/Massachusetts/18/2022 and A/Sydney/332/2023 have similar growth rates, as shown in Supplementary Figure 6b. Similarly, A/Saint-Petersburg/RII-166/2023 starts at a low initial frequency but then grows even as A/Massachusetts/18/2022 and A/Sydney/332/2023 are declining, and so has a higher growth rate than both of those. In the revised manuscript, we will clarify how viral growth rates are estimated from frequency trajectories, and how growth rate differs from max frequency.

      (3) For the protection profile against the vaccine strains, the authors find for the adult cohort that the highest titer is always against the oldest vaccine strain tested, which is A/Texas/50/2012. However, the adult sera do not show an increase in titer towards older strains, but only a peak at A/Texas. Therefore, it could be that this is a virus-specific effect, rather than a property of the protection profile. Could the authors test with one older vaccine virus (A/Perth/16/2009?) whether this really can be a general property?

      We are interested in studying immune imprinting more thoroughly using sequencing-based neutralization assays, but we note that the adults in the cohorts we studied would have been imprinted with much older strains than included in this library. As this paper focuses on the relative fitness of contemporary strains with minor secondary points regarding imprinting, these experiments are beyond the scope of this study. We’re excited for future work (from our group or others) to explore these points by making a new virus library with strains from multiple decades of influenza evolution.

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, which will be relevant across pathogens (assuming the assay can be appropriately adapted). I only have a few comments, focused on maximising the information provided by the sera.

      Thanks very much!

      Firstly, one of the major findings is that there is wide heterogeneity in responses across individuals. However, we could expect that individuals' responses should be at least correlated across the viruses considered, especially when individuals are of a similar age. It would be interesting to quantify the correlation in responses as a function of the difference in ages between pairs of individuals. I am also left wondering what the potential drivers of the differences in responses are, with age being presumably key. It would be interesting to explore individual factors associated with responses to specific viruses (beyond simply comparing adults versus children).

      We’re excited by this idea! We plan to include these analyses in our revised pre-print.

      Relatedly, is the phylogenetic distance between pairs of viruses associated with similarity in responses?

      As above, we like this idea and our revised pre-print will include this analysis.

      Figure 5C is also a really interesting result. To be able to predict growth rates based on titers in the sera is fascinating. As touched upon in the discussion, I suspect it is really dependent on the representativeness of the sera of the population (so, e.g., if only elderly individuals provided sera, it would be a different result than if only children provided samples). It may be interesting to compare different hypotheses - so e.g., see if a population-weighted titer is even better correlated with fitness - so the contribution from each individual's titer is linked to a number of individuals of that age in the population. Alternatively, maybe only the titers in younger individuals are most relevant to fitness, etc.

      We’re very interested in these analyses, but suggest they may be better explored in subsequent works that could sample more children, teenagers and adults across age groups. Our sera set, as the reviewer suggests, may be under-powered to perform the proposed analysis on subsetted age groups of our larger age cohorts.

      In Figure 6, the authors lump together individuals within 10-year age categories - however, this is potentially throwing away the nuances of what is happening at individual ages, especially for the children, where the measured viruses cross different groups. I realise the numbers are small and the viruses only come from a small numbers of years, however, it may be preferable to order all the individuals by age (y-axis) and the viral responses in ascending order (x-axis) and plot the response as a heatmap. As currently plotted, it is difficult to compare across panels

      This is a good suggestion, and a revised pre-print will include heatmaps of the different cohorts, ordered by ages of individuals.

      Reviewer #3 (Public review):

      The authors use high-throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. However, there are some areas where I thought the work could be more strongly motivated and linked together. In particular, how the vaccine responses in US and Australia in Figures 6-7 relate to the earlier analysis around growth rates, and what we would expect the relationship between growth rate and population immunity to be based on epidemic theory.

      Thank you for this nice summary. This reviewer also notes that the text related to figures 6 and 7 are more secondary to the main story presented in figures 3-5. The main motivation for including figures 6 and 7 were to demonstrate the wide-ranging applications of sequencing-based neutralization data, and this can certainly be clarified in minor text revisions.

    1. Author Response

      Public Reviews

      We thank both reviewers for taking the time and effort to think critically about our paper and point out areas where it can be improved. In this document, we do our best to clarify any misunderstandings with the hope that further consideration about the strengths and weaknesses of our approach will be possible. Our responses are in bold.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures. Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study.

      The environments we study represent 12 different concentrations or combinations of two drugs, radicicol and fluconazole. Our hope is that this large dataset (774 mutants x 12 environments) will be useful, both to scientists who are generally interested in the genetic and phenotypic underpinnings of adaptation, and to scientists specifically interested in the evolution of drug resistance.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements.

      This is a misunderstanding that we will work to clarify in the revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons.

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we will explicitly state that these 21,000 isolated lineages do not all represent unique, adaptive lineages. In figure 2 and all associated text, we will change the word “lineages” to “isolates,” where relevant.

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype, and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Note that 774 adaptive lineages is more than most previous studies. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 161 - 162).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance.

      The word “briefly” feels a bit unfair because we discuss this bias on 3 separate occasions (on lines 146 - 147, 260 - 264, and in more detail on 706 - 714). We even walk through an example of a class of mutants that our study misses. We say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we will add more text to the first mention of these missing mutants (lines 146 - 147) so that the implications are more immediately made apparent.

      While we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      We discussed these implications in some detail in the 16 lines mentioned above (146 - 147, 260 - 264, 706 - 714). To add to this discussion, we will also add the following sentence to the end of the paragraph on lines 697 - 714: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”.

      We will also add a new paragraph that discusses these implications earlier in our manuscript. This paragraph will highlight the strengths of our method (e.g., that we “catch” classes of mutants that are often overlooked) while being transparent about the weaknesses of our approach (e.g., that we “miss” mutants with strong tradeoffs).

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations.

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult.

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system).

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      We understand how the reviewer came to this misunderstanding and will adjust our revised manuscript accordingly. Previous work has demonstrated that, in this particular evolution platform, most of the mutations actually occur during the transformation that introduces the DNA barcodes (PMID25731169). In other words, these mutations do not accumulate during the 40 generations of evolution, they are already there. So the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.

      This concern, and all subsequent concerns, seem to be driven by either (a) general concerns about the noisiness of fitness measurements obtained from large-scale barcode fitness assays or (b) general concerns about whether the clusters obtained from our dimensional reduction approach capture this noise as opposed to biologically meaningful differences.

      We will respond to each concern point-by-point, but want to start by generally stating that (a) our particular large-scale barcode fitness assay has several features that diminish noise, and (b) we devote 4 figures and 200 lines of text to demonstrating that these clusters capture biologically meaningful differences between mutants (and not noise).

      In terms of this specific concern, we performed an analysis of noise in the submitted manuscript: Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing noise (Figure S7 panel B). But we agree with the reviewer that this analysis alone is not sufficient to conclude that the clusters distinguish groups of mutants with unique fitness tradeoffs.

      Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      To evaluate the strength of the clustering, we performed numerous analyses including whole genome sequencing, growth experiments, reclustering, and tracing the evolutionary origins of each cluster (Figures 5 - 8). All of these analyses suggested that our clusters capture groups of mutants that have different fitness tradeoffs. We will adjust our revised manuscript to make clear that we do not rely on the results of a clustering algorithm alone to draw conclusions about phenotypic convergence.

      We are also grateful to the reviewer for helping us realize that, as written, our manuscript is not clear with regard to how we perform clustering. We are not using UMAP to decide which mutant belongs to which cluster. Recent work highlights the importance of using an independent clustering method (PMID37590228). Although this recent work addresses the challenge of clustering much higher dimensional data than we survey here, we did indeed use an independent clustering method (gaussian mixture model). In other words, we use UMAP for visualization but not clustering. We also confirm our clustering results using a second independent method (hierarchical clustering; Figure S8). And in our revised manuscript, will confirm with a third method (PCA, see below). We will adjust the main text and the methods section to make these choices clearer.

      This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      The salient question is whether the clusters are so “fuzzy” that they are not meaningful. That interpretation seems unreasonable. Our clusters group mutants with similar genotypes, evolutionary histories, and fitness tradeoffs (Figures 5 - 8). Clustering mutants with similar behaviors is important and useful. It improves phenotypic prediction by revealing which mutants are likely to have at least some phenotypic effects in common. And it also suggests that the phenotypic space is constrained, at least to some degree, which previous work suggests is helpful in predicting evolution (PMID33263280, PMID37437111, PMID22282810, PMID25806684).

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components.

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent some intuitive phenotype, like resistance to fluconazole.

      Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods.

      We will adjust our revised manuscript to explain these reasons why we chose UMAP and GMM over PCA.

      Also, we will include PCA in the supplement of our revised manuscript. Please find below PC1 vs PC2, with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Author response image 1.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages.

      We worry that the idea stems from apriori notions of what the important dimensions should be. It also seems like this would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole.

      Also, we believe the reviewer meant “fitness profile” and not “fitness landscape”. A fitness landscape imagines a walk where every “step” is a mutation. Most lineages in barcoded evolution experiments possess only a single adaptive mutation. A single-step walk is not enough to build a landscape, though others are expanding barcoded evolution experiments beyond the first step (PMID34465770, PMID31723263), so maybe one day this will be possible.

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered.

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. We thank the reviewer for pointing out this gap in our writing. We will adjust our revised manuscript to explain that we ultimately chose to describe 6 clusters that we were able to validate with follow-up experiments. In figures 5, 6, 7, and 8, we use external information to validate the clusters that we report in figure 4. And in lines 697 – 714, we explain that there are may be additional clusters beyond those we tease apart in this study.

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e. merge) when we removed noise suggests these clusters were not capturing noise.

      More generally, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays.

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously (PMID37237236).

      The main assay we use to measure fitness has been previously validated (PMID27594428). No subsequent study using this assay validates using the methods suggested by the reviewer (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203).

      More to the point, bar-seq has been used, without the reviewer’s suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate.

      For all of these reasons, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

      We don’t agree that fitness measurements obtained from this bar-seq assay generally require validation. But we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways, in particular, in that they have different fitness tradeoffs. We have four figures (5 - 8) and 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. Happily, one of these figures (Fig 7) includes growth curves, which are exactly the type of validation experiment asked for by the reviewer.

      Below, we walk through the different types of validation experiments that are present in our original manuscript, and additional validation experiments that we plan to include in the revised version. We are hopeful that these validation experiments are sufficient, or at the very least, that this list empowers reviewers to point out where more work is needed.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S10).

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. Indeed they often do (see pie charts in Figures 6, 7, 8). This method also provides evidence supporting each cluster’s differing fitness tradeoffs.

      For example, mutants in cluster 5 appear to have a tradeoff in a double drug condition (described above). They rarely originate from that evolution condition, unlike mutants in nearby cluster 4 (see Figure 7).

      (3) Mutants from each cluster often fall into different genes: In our original manuscript, we sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6).

      (4) Mutants from each cluster have behaviors previously observed in the literature: In our original manuscript, we compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 457 - 462). Previous work suggests that some mutations to PDR have different tradeoffs than others, which is what we see (lines 540 - 542). IRA1 mutants were previously observed to have high fitness in our “no drug” condition, and are found in the cluster that has the highest fitness in the “no drug” condition (lines 642 - 646). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 652 - 657).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods: In our original manuscript, we performed various different reclustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S9). The clusters of mutants that we observe in figure 4 do not change substantially when we recluster the data. We will add PCA (see above) to these analyses in our revised manuscript.

      (6) We will include additional data showing that mutants in different clusters have different evolutionary origins: Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole (see Fig 4E and Fig 5C). In our revised manuscript, we will show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see figure panel A below). No other cluster’s evolutionary history shows this pattern (figures 6, 7, and 8).

      (7) We will include additional data showing that mutants in different clusters have different growth curves: Cluster 1 lineages are unique in that their fitness advantage is specific to low flu and trades off in higher concentrations of fluconazole. We obtained growth curves for three cluster 1 mutants (2 SUR1 mutants and 1 UPC2 mutant). We compared them to growth curves for three PDR mutants (from clusters 2 and 3). Cluster 1 mutants appear to have the highest growth rates and reach the higher carrying capacity in low fluconazole (see red and green lines in Author response image 2 panel B below). But the cluster 1 mutants are negatively affected by higher concentrations of fluconazole, much more so than the mutants from clusters 2 and 3 (see Author response image 2 panel C below). This is consistent with the different fitness tradeoffs we observe for each cluster (figures 4 and 5). We will include a more detailed version of this analysis and the figures below in our revised manuscript.

      Author response image 2.

      Validation experiments demonstrate that cluster 1 mutants have uniquely high fitness in only the lowest concentration of fluconazole. (A) The mutant lineages in cluster 1 were largely sampled from evolution experiments performed in low flu. This is not true of other clusters (see pie charts in main manuscript). (B) In low flu (4 𝜇g/ml), Cluster 1 lineages (red/UPC2 and green/SUR1) grow faster and achieve higher density than lineages from clusters 2 and 3 (blue/PDR). This is consistent with barseq measurements demonstrating that cluster 1 mutants have the highest fitness in low flu. (C) Cluster 1 lineages are sensitive to increasing flu concentrations (SUR1 and UPC2 mutants, middle and rightmost graphs). This is apparent in that the gray (8 𝜇g/ml flu) and light blue (32 𝜇g/ml flu) growth curves rise more slowly and reach lower density than the dark blue curves (4 𝜇g/ml flu). But this is not the case for the PDR mutants from clusters 2 and 3 (leftmost graph). These observations are consistent with the bar-seq fitness data presented in the main manuscript (Fig 4E).

      With all of these validation efforts combined, we are hopeful that the reviewer is now more convinced that our clusters capture groups of mutants with different fitness tradeoffs (as opposed to noise). We want to conclude by saying that we are grateful to the reviewer for making us think deeply about areas where we can include additional validation efforts as well as areas where we can make our manuscript clearer.

      Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      We are very grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      We think that phrasing the “jump” as a question might help lay readers get from point A to point B. So, in the introduction of our revised manuscript, we will add a paragraph roughly similar to this one: “If two groups of drug-resistant mutants have different fitness tradeoffs, does it mean that they provide resistance through different underlying mechanisms? Alternatively, it could mean that both provide drug resistance via the same mechanism, but some mutations come with a cost that others don’t pay. However, another way to phrase this alternative is to say that both groups of mutants affect fitness through different suites of mechanisms that are only partially overlapping. And so, by identifying groups of mutants with different fitness tradeoffs, we argue that we will be uncovering sets of mutations that impact fitness through different underlying mechanisms. The ability to do so would be useful for genotype-phenotype mapping endeavors.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      In our revised manuscript, we will carefully review all citations. The issue may stem from our attempt to reach two different groups of scientists. We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). Though the 3 papers the reviewer mentions on lines 132 - 133 all pertain to yeast, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should apply broadly, beyond yeast. Similarly, the reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the genotype-phenotype-fitness map should apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So we cited papers from across the tree of life to support this sentence.

      On the other hand, because we study drug resistant mutations, we also hope that our work is of use to scientists studying the evolution of resistance. We agree with the reviewer that in this regard, some of our findings may be especially pertinent to the evolution of resistance to antifungal drugs. We will consider this when reviewing the citations in our revised manuscript and add some text to clarify these points.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae).

      In the revised manuscript, we will make clear that we study S. cerevisiae.

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance. Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper.

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

      Perhaps because our background lies in general study of the genotype-phenotype map, we did not want to make bold assertions about how our work might apply to pathogenic yeasts. But we see how this could be helpful and will add some discussion points about this. Specifically, we will discuss which of the genes and mutants we observe are also found in Candida. We will also investigate whether our observation that low fluconazole represents a seemingly unique challenge, not just a milder version of high fluconazole, has any corollary in the Candida literature.

    1. Author Response

      Reviewer 1 (Public Review):

      1. With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.

      In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).

      The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.

      Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.

      Response: We acknowledge the plausibility of the hypothesis advanced by Reviewer #1. We would like to further clarify the rationale that led us to predict that the hypothesized long-term predictions should manifest at the onset of (and not within) a “phrasal chunk”. The hypothesis does not directly concern the probability of a short event to follow a long one (or the other way around), which to our knowledge has not been systematically quantified in previous cross-linguistic studies. Rather, it concerns how the auditory system forms higher-level auditory chunks based on the rhythmic properties of the native language, which is what the previous behavioral studies on perceptual grouping have addressed (e.g., Iversen 2008; Molnar et al. 2014; Molnar et al. 2016). When presented with sequences of two tones alternating in duration, Spanish speakers typically report perceiving the auditory stream as a repetition of short-long chunks separated by a pause, while speakers of Basque usually report the opposite long-short grouping bias. These results suggest that the auditory system performs a chunking operation by grouping pairs of tones into compressed, higher-level auditory units (often perceived as a single event). The way two constituent tones are combined depends on linguistic experience. Based on this background, we hypothesized the presence of (i) a short-term system that merely encodes a repetition of alternations rule and predicts transitions from one constituent tone to the other (a → b → a → b, etc.); (ii) a long-term system that encodes a repetition of concatenated alternations rule and predicts transitions from one high-level unit to the other (ab → ab, etc.). Under this view, we expect predictions based on the long-term system to be stronger at the onset of (rather than within) high-level units and therefore omissions of the first constituent tone to elicit larger responses than omissions of the second constituent tone.

      In other words, the omission of the onset tone would reflect the omission of the whole chunk. On the other hand, the omission of the internal tone would be better handled by the short-term system, involved in processing the low-level structure of our sequences.

      A similar concern was also raised by Reviewer #2. We will include the view proposed by Reviewer #1 and Reviewer #2 in the updated version of the manuscript.

      1. The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response.

      Response: We appreciate that all three reviewers agreed on the robustness of the data analysis pipeline. The approach employed to identify the presence of an interaction effect was indeed conservative, using a non-parametric test on combined gradiometers data, no a priori assumptions regarding the location of the effect, and small cluster thresholds (cfg.clusteralpha = 0.05) to enhance the likelihood of detecting highly localized clusters with large effect sizes. This approach led to the identification of the cluster illustrated in Figure 2c, where the interaction effect is evident. The fact that this interaction effect arises in a relatively small cluster of sensors does not alter its statistical robustness. The only partial overlap of the cluster with the activation peaks might simply reflect the fact that distinct sources contribute to the generation of the omission-MMN, which has been demonstrated in numerous prior studies (e.g., Zhang et al., 2018; Ross & Hamm, 2020).

      Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones). Because of these points, it is difficult to interpret this interaction as a modulation of the omission response.

      Response: The two participants mentioned by Reviewer #1, despite being somewhat distant from the rest of the group, are not outliers according to the standard Tukey’s rule. As shown in Author response image 1 below, no participant fell outside the upper (Q3+1.5xIQR) and lower whiskers (Q1-1.5xIQR) of the boxplot.

      Author response image 1.

      The presence of a main effect of omission type does not impact the interpretation of the interaction, especially considering that these effects emerge over distinct clusters of channels.

      The code to generate Author response image 1 and the corresponding statistics have been added to the script “analysis_interaction_data.R” in the OSF folder (https://osf.io/6jep8/).

      It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.

      Response: Our interpretation of the results for the present study is mainly driven by the effect observed on sensor-level data, which is statistically robust. The source modeling analyses (in non-invasive electrophysiology) provide a possible model of the candidate brain sources driving the effect observed at the sensor level. The source showing the interactive effect in our study is the left auditory cortex. More details and statistics will be provided in the reviewed version of the manuscript.

      Reviewer #2 (Public Review):

      1. Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of long-term priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.

      Response: We will consider the suggestion of reviewer #2 for the new version of the manuscript. Overall, we believe that the novelty of the current study lies in bridging together findings from two research fields - basic auditory neuroscience and cross-linguistic research - to provide evidence for a predictive coding model in the auditory that uses long-term priors to make perceptual inferences.

      1. Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.

      Response: A similar point was advanced by Reviewer #1. We tried to clarify our hypothesis (see above). We will consider including this interpretation in the updated version of the manuscript.

      1. In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.

      Response: We thank the reviewer for this comment, which is indeed very pertinent. Below are some comments highlighting our thoughts on this.

      1) We will explore in more detail the possibility that the aligned phase may differ depending on linguistic background, which is indeed very interesting. However, we believe that even if a phase modulation by language experience is found, it would not negate the possibility that the group differences in the MMN are driven by different long-term predictions. Rather, since the hypothesized phase differences would be driven by long-term linguistic experience, phase entrainment may reflect a mechanism through which long-term predictions are carried. On this point, we agree with the Reviewer when says that “this view would not change the impact of the results but add depth to their interpretation”.

      2) Related to the point above: Despite evoked responses and oscillations are often considered distinct electrophysiological phenomena, current evidence suggests that these phenomena are interconnected (e.g., Studenova et al., 2023). In our view, the hypotheses that the MMN reflects differences in phase alignment and long-term prediction errors are not mutually exclusive.

      3) Despite the plausibility of the view proposed by reviewer #2, many studies in the auditory neuroscience literature putatively consider the MMN as an index of prediction error (e.g., Bendixen et al., 2012; Heilbron and Chait, 2018). There are good reasons to believe that also in our study the MMN reflects, at least in part, an error response.

      In the updated version of the manuscript, we will include a paragraph discussing the possibility that the reported group differences in the omission MMN might be partially accounted for by differences in neural entrainment to the rhythmic sound sequences.

      Reviewer #3 (Public Review):

      The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.

      Response: We appreciate the positive feedback from Reviewer #3. Concerning this weakness highlighted: we agree with Reviewer #3 that it would be nice to see this study replicated in the future with larger sample sizes and a behavioral counterpart. Overall, we hope this work will lead to more studies using cross-linguistic/cultural comparisons to assess the effect of experience on neural processing. In the context of the present study, we believe that the lack of behavioral data does not undermine the main findings of this study, given the careful selection of the participants and the well-known robustness of the perceptual grouping effect (e.g., Iversen 2008; Yoshida et al., 2010; Molnar et al. 2014; Molnar et al. 2016). As highlighted by Reviewer #2, having Spanish and Basque dominant “speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it.”

      References

      1. Bendixen, A., SanMiguel, I., & Schröger, E. (2012). Early electrophysiological indicators for predictive processing in audition: a review. International Journal of Psychophysiology, 83(2), 120-131.

      2. Heilbron, M., & Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex?. Neuroscience, 389, 54-73.

      3. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 2263-2271.

      4. Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45-64.

      5. Molnar, M., Carreiras, M., & Gervain, J. (2016). Language dominance shapes non-linguistic rhythmic grouping in bilinguals. Cognition, 152, 150-159.

      6. Ross, J. M., & Hamm, J. P. (2020). Cortical microcircuit mechanisms of mismatch negativity and its underlying subcomponents. Frontiers in Neural Circuits, 14, 13.

      7. Simon, J., Balla, V., & Winkler, I. (2019). Temporal boundary of auditory event formation: An electrophysiological marker. International Journal of Psychophysiology, 140, 53-61.

      8. Studenova, A. A., Forster, C., Engemann, D. A., Hensch, T., Sander, C., Mauche, N., ... & Nikulin, V. V. (2023). Event-related modulation of alpha rhythm explains the auditory P300 evoked response in EEG. bioRxiv, 2023-02.

      9. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356-361.

      10. Zhang, Y., Yan, F., Wang, L., Wang, Y., Wang, C., Wang, Q., & Huang, L. (2018). Cortical areas associated with mismatch negativity: A connectivity study using propofol anesthesia. Frontiers in Human Neuroscience, 12, 392.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria, combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low-transmission settings. This paper focuses on Magude and Matutuine, two districts in southern Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again, strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex, and other factors. These data have practical implications for public health strategies aiming for malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study lies in the combination of different sources of data - epidemiological, travel, and genetic data - to estimate importation probabilities, and the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      Thank you for your review and consideration. As mentioned, we state in the manuscript the limitations related to sample sizes and travel reports. We aim to continue this study with new prospective data, aiming to address these limitations.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data. The work is well-written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material.

      In fact, one of the questions that remains unanswered is the overall contribution of importation events to transmission in the areas. While the Bayesian classifier does not quantify this, our future analysis will focus on combining outbreak detection, genetic clustering and importation classification to quantify the contribution of imported cases to outbreak resurgence and to the overall transmission.

      Thank you for pointing out the inconsistency in the final formula. In fact, the final formula corresponds to P(I<sub>A</sub> | G), instead to i>P(I<sub>A</sub>), so:

      instead of

      We will correct this error in a new version of the manuscript.

      Reviewer #3 (Public review):

      The authors present an important approach to identify imported P. falciparum malaria cases, combining genetic and epidemiological/travel data. This tool has the potential to be expanded to other contexts. The data was analyzed using convincing methods, including a novel statistical model; although some recognized limitations can be improved. This study will be of interest to researchers in public health and infectious diseases.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to help efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for risk factor analysis, which can also be affected by travel recall bias. Additionally, the authors used a proxy for transmission intensity and assumed some conditions for the genetic variable when calculating the importation probability for specific scenarios. The weaknesses were assessed by the authors.

      We acknowledge the limitations commented by the reviewer. We have the following plans to address the limitations. We will repeat the study for our data collected in 2023, which this time contains a good representation of all the provinces of Mozambique, and completeness of the metadata collection was ensured by implementing a new protocol in January 2023. Regarding the proxy for transmission intensity, we will refine the model by integrating monthly estimates of malaria incidence (previously calibrated to address testing and reporting rates) from the DHIS2 data, taking also into account the date of the reported cases in the analysis.

    1. Author Response

      We are grateful to the editors for considering our manuscript and facilitating the peer review process. Importantly, we would like to express our gratitude to reviewers for their constructive comments. Given eLife’s publishing format, we provide an initial author response now, which will be followed by a revised manuscript in the near future. Please find our responses below.

      eLife Assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Reviewer 1

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      • Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      • Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      • Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      • Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their subjective feelings. It might have been better to query participants about perceived stimulus intensity levels. This per- spective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the rele- vance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.1- 2.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Reviewer 2

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential impli- cations for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Thank you very much for these positive comments.

      Reviewer 3

      We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally trans- formed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens. Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines sig- nificance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the x- axis and the recovered parameters on the y-axis would effectively convey this missing information. Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Thank for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regula- tion.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. Author response:

      We thank the reviewers for their thorough reading and thoughtful feedback. Below, we provisionally address each of the concerns raised in the public reviews, and outline our planned revision that aims to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile.

      Reviewer 1:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, in our revision of the manuscript, we will make sure to clarify the kinds of resources the experiment involved, and highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains.

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We thank the reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. We will attempt to implement this as an alternative model and compare it with the current model.  

      We also acknowledge the importance of avoiding a potential "jangle fallacy". We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is able and willing to invest (see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modelling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Ultimately, even in the model suggested by the Reviewer, there would need to be a dedicated variable representing elasticity, such as the probability of sloped controllability functions. A single-process account thus allows that different aspects of this process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, ‘elasticity of controllability bias’ and ‘maximum controllability bias’) is consistent with a common construct account. 

      That said, given the Reviewer’s comments, we believe that some of the terminology we used may have been misleading. In our planned revision, we will modify the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. 

      Reviewer 2:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the reviewer for highlighting the lack of clarity in our concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and a planned revision of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case environments can differ in the degree to which they are elastic. For further details on this formal definition, see our response to Reviewer 3 below. We will make these necessary clarifications in the revised manuscript. 

      Importantly, whether an environment is more or less elastic does not determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability<sup>1</sup>: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P(S' \= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability<sup>2</sup>: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      I(S'; A, C | S) = H(S'|S) - H(S'|S, A, C)

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state 𝑆, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’3,4. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] \= 1 bit

      For the elastic environment:

      P(goal) \= (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 \= .33, P(non-goal) \= .67  H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] \= .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C \= 0) \= -[.2 × log<sub>2</sub>(.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub>(.6) + .4 × log<sub>2</sub>(.4)] \= .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × <sub>2</sub>(.1) + .9 × <sub>2</sub>(.9)] \= .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × <sub>2</sub>(.2) + .8 × <sub>2</sub> .8)] \= .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 \= .52 bits

      Step 3: Calculating I(S' | A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and goal reaching is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We will amend the manuscript to clarify this distinction between controllability and its elasticity.

      Reviewer 3:

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We will clarify this in our revision of the manuscript.

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability(1) as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions(2,3) would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We will add this formal definition to the manuscript.  

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%]). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location, which was revealed together with the outcome (since depending on the starting location, the treasure location was automatically reached by walking). To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We will include this new analysis in the revised manuscript.

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity. Thus, in our revision of the manuscript, we will add a discussion concerning possible alternative models.  

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. Our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improve control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We will clarify the scope of the present conclusions in the revised manuscript.

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance. This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations hold for the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we will now report in the supplementary materials along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Furthermore, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, bottom plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p\=.03 permutation test) to the observed canonical correlation. Participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of controlrelated psychopathology to which SOA contributed significantly.  

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort(7), whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression(5-6).

      We will revise the text to better clarify the advantageous and disadvantageous of our analytical approach, and the conclusions that can and cannot be drawn from it.

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (λ) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 22). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝝐<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝝐<sub>elasticity</sub>→ λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝝐<sub>elasticity</sub>). This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (Author response image 1) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We will incorporate these clarifications and additional analysis in our revised manuscript.

      Author response image 1.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were merely meant to describe the relationship between model parameters and modelindependent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, instead of a higher controllability bias primarily associating with futile investment of resources in uncontrollable environments, it could have been primarily associated with more proper investment of resources in high-controllability environments. Additionally, we believe these analyses are of value to readers who seek to understand the role of different parameters in the model. In our planned revision, we will clarify that the relevant analyses are merely descriptive. 

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. 

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates

      (a<sub>elastic≥1</sub>, a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. 

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We will discuss the Reviewer’s suggestion for a potentially more accurate model in the revised manuscript. 

      References

      (1) Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2) Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3) Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4) Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5) Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6) Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7) Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. Author response: 

      We thank the reviewers for their feedback on our paper. We have taken all their comments into account in revising the manuscript. We provide a point-by-point response to their comments, below.

      Reviewer #1:

      Major comments:

      The manuscript is clearly written with a level of detail that allows others to reproduce the imaging and cell-tracking pipeline. Of the 22 movies recorded one was used for cell tracking. One movie seems sufficient for the second part of the manuscript, as this manuscript presents a proof-of-principle pipeline for an imaging experiment followed by cell tracking and molecular characterisation of the cells by HCR. In addition, cell tracking in a 5-10 day time-lapse movie is an enormous time commitment.

      My only major comment is regarding "Suppl_data_5_spineless_tracking". The image file does not load.

      It looks like the wrong file is linked to the mastodon dataset. The "Current BDV dataset path" is set to "Beryl_data_files/BLB mosaic cut movie-02.xml", but this file does not exist in the folder. Please link it to the correct file.

      We have corrected the file path in the updated version of Suppl. Data 5.

      Minor comments:

      The authors state that their imaging settings aim to reduce photo damage. Do they see cell death in the regenerating legs? Is the cell death induced by the light exposure or can they tell if the same cells die between the movies? That is, do they observe cell death in the same phases of regeneration and/or in the same regions of the regenerating legs?

      Yes, we observe cell death during Parhyale leg regeneration. We have added the following sentence to explain this in the revised manuscript: "During the course of regeneration some cells undergo apoptosis (reported in Alwes et al., 2016). Using the H2B-mRFPruby marker, apoptotic cells appear as bright pyknotic nuclei that break up and become engulfed by circulating phagocytes (see bright specks in Figure 2F)."

      We now also document apoptosis in regenerated legs that have not been subjected to live imaging in a new supplementary figure (Suppl. Figure 3),  and we refer to these observations as follows: "While some cell death might be caused by photodamage, apoptosis can also be observed in similar numbers in regenerating legs that have not been subjected to live imaging (Suppl. Figure 3)."

      Based on 22 movies, the authors divide the regeneration process into three phases and they describe that the timing of leg regeneration varies between individuals. Are the phases proportionally the same length between regenerating legs or do the authors find differences between fast/slow regenerating legs? If there is a difference in the proportions, why might this be?

      Both early and late phases contribute to variation in the speed of regeneration, but there is no clear relationship between the relative duration of each phase and the speed of regeneration. We now present graphs supporting these points in a new supplementary figure (Suppl. Figure 2).  

      To clarify this point, we have added the following sentence in the manuscript: "We find that the overall speed of leg regeneration is determined largely by variation in the speed of the early (wound closure) phase of regeneration, and to a lesser extent by variation in later phases when leg morphogenesis takes place (Suppl. Figure 2 A,B). There is no clear relationship between the relative duration of each phase and the speed of regeneration (Suppl. Figure 2 A',B')."

      Based on their initial cell tracing experiment, could the authors elaborate more on what kind of biological information can be extracted from the cell lineages, apart from determining which is the progenitor of a cell? What does it tell us about the cell population in the tissue? Is there indication of multi- or pluripotent stem cells? What does it say about the type of regeneration that is taking place in terms of epimorphosis and morphallaxis, the old concepts of regeneration?

      In the first paragraph of Future Directions we describe briefly the kind of biological information that could be gained by applying our live imaging approach with appropriate cell-type markers (see below). We do not comment further, as we do not currently have this information at hand. Regarding the concepts of epimorphosis and morphallaxis, as we explain in Alwes et al. 2016, these terms describe two extreme conditions that do not capture what we observe during Parhyale leg regeneration. Our current work does not bring new insights on this topic.

      Page 5. The authors mention the possibility of identifying the cell ID based on transcriptomic profiling data. Can they suggest how many and which cell types they expect to find in the last stage based on their transcriptomic data?

      We have added this sentence: "Using single-nucleus transcriptional profiling, we have identified approximately 15 transcriptionally-distinct cell types in adult Parhyale legs (Almazán et al., 2022), including epidermis, muscle, neurons, hemocytes, and a number of still unidentified cell types."

      Page 6. Correction: "..molecular and other makers.." should be "..molecular and other markers.."

      Corrected

      Page 8. The HCR in situ protocol probably has another important advantage over the conventional in situ protocol, which is not mentioned in this study. The hybridisation step in HCR is performed at a lower temperature (37˚C) than in conventional in situ hybridisation (65˚C, Rehm et al., 2009). In other organisms, a high hybridisation temperature affects the overall tissue morphology and cell location (tissue shrinkage). A lower hybridisation temperature has less impact on the tissue and makes manual cell alignment between the live imaging movie and the fixed HCR in situ stained specimen easier and more reliable. If this is also the case in Parhyale, the authors must mention it.

      This may be correct, but all our specimens were treated at 37˚C, so we cannot assess whether hybridisation temperature affects morphological preservation in our specimens.

      Page 9. The authors should include more information on the spineless study. What been is spineless? What do the cell lineages tell about the spineless progenitors, apart from them being spread in the tissue at the time of amputation? Do spineless progenitors proliferate during regeneration? Do any spineless expressing cells share a common progenitor cell?

      We now point out that spineless encodes a transcription factor. We provide a summary of the lineages generating spineless-expressing cells in Suppl. Figure 6, and we explain that "These epidermal progenitors undergo 0, 1 or 2 cell divisions, and generate mostly spineless-expressing cells (Suppl. Figure 5)."

      Page 10. Regarding the imaging temperature, the Materials and Methods state "... a temperature control chamber set to 26 or 27˚C..."; however, in Suppl. Data 1, 26˚C and 29˚C are indicated as imaging temperatures. Which is correct?

      We corrected the Methods by adding "with the exception of dataset li51, imaged at 29°C"

      Page 10. Regarding the imaging step size, the Materials and Methods state "...step size of 1-2.46 µm..."; however, Suppl. Data 1 indicate a step size between 1.24 - 2.48 µm. Which is correct?

      We corrected the Methods.

      Page 11. Correct "...as the highest resolution data..." to "...at the highest resolution data..."

      The original text is correct ("standardised to the same dimensions as the highest resolution data").

      Page 11. Indicate which supplementary data set is referred to: "Using Mastodon, we generated ground truth annotations on the original image dataset, consisting of 278 cell tracks, including 13,888 spots and 13,610 links across 55 time points (see Supplementary Data)."

      Corrected

      p. 15. Indicate which supplementary data set is referred to: "In this study we used HCR probes for the Parhyale orthologues of futsch (MSTRG.441), nompA (MSTRG.6903) and spineless (MSTRG.197), ordered from Molecular Instruments (20 oligonucleotides per probe set). The transcript sequences targeted by each probe set are given in the Supplementary Data."

      Corrected

      Figure 3. Suggestion to the overview schematics: The authors might consider adding "molting" as the end point of the red bar (representing differentiation).

      The time of molting is not known in the majority of these datasets, because the specimens were fixed and stained prior to molting. We added the relevant information in the figure legend: "Datasets li-13 and li-16 were recorded until the molt; the other recordings were stopped before molting."

      Figure 4B': Please indicate that the nuclei signal is DAPI.

      Corrected

      Supplementary figure 1A. Word is missing in the figure legend: ...the image also shows weak…

      Corrected

      Supplementary Figure 2: Please indicate the autofluorescence in the granular cells. Does it correspond to the yellow cells?

      Corrected

      Video legend for video 1 and 2. Please correct "H2B-mREFruby" to "H2B-mRFPruby".

      Corrected

      Reviewer #2:

      Major comments:

      MC 1. Given that most of the technical advances necessary to achieve the work described in this manuscript have been published previously, it would be helpful for the authors to more clearly identify the primary novelty of this manuscript. The abstract and introduction to the manuscript focus heavily on the technical details of imaging and analysis optimization and some additional summary of the implications of these advances should be included here to aid the reader.

      This paper describes a technical advance. While previous work (Alwes et al. 2016) established some key elements of our live imaging approach, we were not at that time able to record the entire time course of leg regeneration (the longest recordings were 3.5 days long). Here we present a method for imaging the entire course of leg regeneration (up to 10 days of imaging), optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining in cuticularised adult legs (an important technical breakthrough in this experimental system), which we combine with live imaging to determine the fate of tracked cells. We have revised the abstract and introduction of the paper to point out these novelties, in relation to our previous publications.

      In the abstract we explain: "Building on previous work that allowed us to image different parts of the process of leg regeneration in the crustacean Parhyale hawaiensis, we present here a method for live imaging that captures the entire process of leg regeneration, spanning up to 10 days, at cellular resolution. Our method includes (1) mounting and long-term live imaging of regenerating legs under conditions that yield high spatial and temporal resolution but minimise photodamage, (2) fixing and in situ staining of the regenerated legs that were imaged, to identify cell fates, and (3) computer-assisted cell tracking to determine the cell lineages and progenitors of identified cells. The method is optimised to limit light exposure while maximising tracking efficiency."

      The introduction includes the following text: "Our first systematic study using this approach presented continuous live imaging over periods of 2-3 days, capturing key events of leg regeneration such as wound closure, cell proliferation and morphogenesis of regenerating legs with single-cell resolution (Alwes et al., 2016). Here, we extend this work by developing a method for imaging the entire course of leg regeneration, optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining of gene expression in cuticularised adult legs, which we combine with live imaging to determine the fate of tracked cells."

      MC 2. The description of the regeneration time course is nicely detailed but also very qualitative. A major advantage of continuous recording and automated cell tracking in the manner presented in this manuscript would be to enable deeper quantitative characterization of cellular and tissue dynamics during regeneration. Rather than providing movies and manually annotated timelines, some characterization of the dynamics of the regeneration process (the heterogeneity in this is very very interesting, but not analyzed at all) and correlating them against cellular behaviors would dramatically increase the impact of the work and leverage the advances presented here. For example, do migration rates differ between replicates? Division rates? Division synchrony? Migration orientation? This seems to be an incredibly rich dataset that would be fascinating to explore in greater detail, which seems to me to be the primary advance presented in this manuscript. I can appreciate that the authors may want to segregate some biological findings from the method, but I believe some nominal effort highlighting the quantitative nature of what this method enables would strengthen the impact of the paper and be useful for the reader. Selecting a small number of simple metrics (eg. Division frequency, average cell migration speed) and plotting them alongside the qualitative phases of the regeneration timeline that have already been generated would be a fairly modest investment of effort using tools that already exist in the Mastodon interface, I would roughly estimate on the order of an hour or two per dataset. I believe that this effort would be well worth it and better highlight a major strength of the approach.

      The primary goal of this work was to establish a robust method for continuous long-term live imaging of regeneration, but we do appreciate that a more quantitative analysis would add value to the data we are presenting. We tried to address this request in three steps:

      First, we examined whether clear temporal patterns in cell division, cell movements or other cellular features can be observed in an accurately tracked dataset (li13-t4, tracked in Sugawara et al. 2022). To test this we used the feature extraction functions now available on the Mastodon platform (see link). We could discern a meaningful temporal pattern for cell divisions (see below); the other features showed no interpretable pattern of variation.

      Second, we asked whether we could use automated cell tracking to analyse the patterns of cell division in all our datasets. Using an Elephant deep learning model trained on the tracks of the li13-t4 dataset, we performed automated cell tracking in the same dataset, and compared the pattern of cell divisions from the automated cell track predictions with those coming from manually validated cell tracks. We observed that the automated tracks gave very imprecise results, with a high background of false positives obscuring the real temporal pattern (see images below, with validated data on the left, automated tracking on the right). These results show that the automated cell tracking is not accurate enough to provide a meaningful picture on the pattern of cell divisions.

      Third, we tried to improve the accuracy of detection of dividing cells by additional training of Elephant models on each dataset (to lower the rate of false positives), followed by manual proofreading. Given how labour intensive this is, we could only apply this approach to 4 additional datasets. The results of this analysis are presented in Figure 4.

      Author response image 1.

      MC 3. The authors describe the challenges faced by their described approach:

      Using this mode of semi-automated and manual cell tracking, we find that most cells in the upper slices of our image stacks (top 30 microns) can be tracked with a high degree of confidence. A smaller proportion of cell lineages are trackable in the deeper layers.

      Given that the authors quantify this in Table 1, it would aid the reader to provide metrics in the manuscript text at this point. Furthermore, the metrics provided in Table 1 appear to be for overall performance, but the text describes that performance appears to be heavily depth dependent. Segregating the performance metrics further, for example providing DET, TRA, precision and recall for superficial layers only and for the overall dataset, would help support these arguments and better highlight performance a potential adopter of the method might expect.

      In the revised manuscript we have added data on the tracking performance of Elephant in relation to imaging depth in Suppl. Figure 3. These data confirm our original statement (which was based on manual tracking) that nuclei are more challenging to track in deeper layers.

      We point to these new results in two parts of the paper, as follows: "A smaller proportion of cells are trackable in the deeper layers (see Suppl. Figure 3)", and "Our results, summarised in Table 1A, show that the detection of nuclei can be enhanced by doubling the z resolution at the expense of xy resolution and image quality. This improvement is particularly evident in the deeper layers of the imaging stacks, which are usually the most challenging to track (Suppl. Figure 3)."

      MC 4. Performance characterization in Table 1 appears to derive from a single dataset that is then subsampled and processed in different ways to assess the impact of these changes on cell tracking and detection performance. While this is a suitable strategy for this type of optimization it leaves open the question of performance consistency across datasets. I fully recognize that this type of quantification can be onerous and time consuming, but some attempt to assess performance variability across datasets would be valuable. Manual curation over a short time window over a random sampling of the acquired data would be sufficient to assess this.

      We think that similar trade-offs will apply to all our datasets because tracking performance is constrained by the same features, which are intrinsic to our system; e.g. by the crowding of nuclei in relation to axial resolution, or the speed of mitosis in relation to the temporal resolution of imaging. We therefore do not see a clear rationale for repeating this analysis. On a practical level, our existing image datasets could not be subsampled to generate the various conditions tested in Table 1, so proving this point experimentally would require generating new recordings, and tracking these to generate ground truth data. This would require months of additional work.

      A second, related question is whether Elephant would perform equally well in detecting and tracking nuclei across different datasets. This point has been addressed in the Sugawara et al. 2022 paper, where the performance of Elephant was tested on diverse fluorescence datasets.

      Reviewer #3:

      Major comments:

      • The authors should clearly specify what are the key technical improvements compared to their previous studies (Alwes et al. 2016, Elife; Konstantinides & Averof 2014, Science). There, the approaches for mounting, imaging, and cell tracking are already introduced, and the imaging is reported to run for up to 7 days in some cases.

      In Konstantinides and Averof (2014) we did not present any live imaging at cellular resolution. In Alwes et al. (2016) we described key elements of our live imaging approach, but we were never able to record the entire time course of leg regeneration. The longest recordings in that work were 3.5 days long.

      We have revised the abstract and introduction to clarify the novelty of this work, in relation to our previous publications. Please see our response to comment MC1 of reviewer 2.

      • While the authors mention testing the effect of imaging parameters (such as scanning speed and line averaging) on the imaging/tracking outcome, very little or no information is provided on how this was done beyond the parameters that they finally arrived to.

      Scan speed and averaging parameters were determined by measuring contrast and signal-to-noise ratios in images captured over a range of settings. We have now added these data in Supplementary Figure 1.

      • The authors claim that, using the acquired live imaging data across entire regeneration time course, they are now able to confirm and extend their description of leg regeneration. However, many claims about the order and timing of various cellular events during regeneration are supported only by references to individual snapshots in figures or supplementary movies. Presenting a more quantitative description of cellular processes during regeneration from the acquired data would significantly enhance the manuscript and showcase the usefulness of the improved workflow.

      The events we describe can be easily observed in the maximum projections, available in Suppl. Data 2. Regarding the quantitative analysis, please see our response to comment MC2 of reviewer 2.  

      • Table 1 summarizes the performance of cell tracking using simulated datasets of different quality. However only averages and/or maxima are given for the different metrics, which makes it difficult to evaluate the associated conclusions. In some cases, only 1 or 2 test runs were performed.

      The metrics extracted from each of the three replicates, per dataset, are now included in Suppl. Data 4.

      We consistently used 3 replicates to measure tracking performance with each of the datasets. The "replicates" column label in Table 1 referred to the number of scans that were averaged to generate the image, not to the replicates used for estimating the tracking performance. To avoid confusion, we changed that label to "averaging".

      • OPTIONAL: An imaging approach that allows using the current mounting strategy but could help with some of the tradeoffs is using a spinning-disk confocal microscope instead of a laser scanning one. If the authors have such a system available, it could be interesting to compare it with their current scanning confocal setup.

      Preliminary experiments that we carried out several years ago on a spinning disk confocal (with a 20x objective and the CSU-W1 spinning disk) were not very encouraging, and we therefore did not pursue this approach further. The main problem was bad image quality in deeper tissue layers.

      Minor comments:

      • The presented imaging protocol was optimized for one laser wavelength only (561 nm) - this should be mentioned when discussing the technical limitations since animals tend to react differently to different wavelengths. Same settings might thus not be applicable for imaging a different fluorescent protein.

      In the second paragraph of the Results section, we explain that we perform the imaging at long wavelengths in order to minimise photodamage. It should be clear to the readers that changing the excitation wavelength will have an impact for long-term live imaging.

      • For transferability, it would be useful if the intensity of laser illumination was measured and given in the Methods, instead of just a relative intensity setting from the imaging software. Similarly,more details of the imaging system should be provided where appropriate (e.g., detector specifications).

      We have now measured the intensity of the laser illumination and added this information in the

      Methods: "Laser power was typically set to 0.3% to 0.8%, which yields 0.51 to 1.37 µW at 561 nm (measured with a ThorLabs Microscope Slide Power Sensor, #S170C)."

      Regarding the imaging system and the detector, we provide all the information that is available to us on the microscope's technical sheets.

      • The versions of analysis scripts associated with the manuscript should be uploaded to an online repository that permanently preserves the respective version.

      The scripts are now available on gitbub and online repositories. The relevant links are included in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.

      Strengths:

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      A recommendation should be added on when or under which conditions to use this pipeline.

      We thank the reviewer for this valuable feedback, which will be addressed in the revision. In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm.

      The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed).

      - Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered.

      - Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results.

      - Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium).

      - Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed.

      Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics.

      Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope.

      Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. If we identify suitable datasets to validate these modules, we will include them in the revised manuscript.

      To evaluate our 3D nuclei segmentation model, we plan to test it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method).

      Preliminary results are promising (see Author response image 1). We will provide quantitative comparisons of our model’s performance on these datasets, using annotations or reference predictions provided by the original authors where available.

      Author response image 1.

      Qualitative comparison of our custom Stardist3D segmentation strategy on diverse published 3D nuclei datasets. We show one slice from the XY plane for simplicity. (a) Gastruloid stained with the nuclear marker DRAQ5 imaged with an open-top dual-view and dual-illumination LSM [1]. (b) Breast cancer spheroid [2]. (c) Primary pancreatic ductal adenocarcinoma organoids imaged with confocal microscopy[2]. (d) Human colon organoid imaged with LSM laser scanning confocal microscope [2]. (e) Monolayer HCT116 cells imaged with LSM laser scanning confocal microscope [2]. (f) Fixed zebrafish embryo stained for nuclei and imaged with a Zeiss LSM 880 confocal microscopy [3].

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      - CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      - DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 2.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      - AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      - Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript. Author response image 3 displays the results qualitatively compared to our trained model Stardist-tapenade. For the revision of the paper, we will perform a comprehensive benchmark of these state-of-the-art routines, including quantitative assessment of the performance.

      Author response image 3.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately: https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      We agree with the reviewer that a morphometric analysis based on the axial views would be informative and plan to perform this analysis for the revision.

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H.T., Karatas, E., Poquillon, T., Grenci, G., Furlan, A., Dilasser, F., Mohamad Raffi, S.B., Blanc, D., Drimaracci, E., Mikec, D. and Galisot, G., 2025. Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology. Nature Methods, 22(6), pp.1343-1354.

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4: Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

    1. Author Response

      Joint Public Review

      Strengths

      Overall, the idea that the PAG interacts with the BLA via the midline thalamus during a predator vs. foraging test is new and quite interesting. The authors have used appropriate tools to address their questions. The major impact in the field would be to add evidence to claims that the BLA can be downstream of the dPAG to evoke defensive behaviors. The study also adds to a body of evidence that the PAG mediates primal fear responses.

      Weaknesses

      (Anatomical concerns)

      1) The authors claim that the recordings were performed in the dorsal PAG (dPAG), but the histological images in Fig. 1B and Supplementary S2 for example show the tip of the electrode in a different subregion of PAG (ventral/lateral). They should perform a more careful histological analysis of the recording sites and explain the histological inclusion and exclusion criteria. Diagrams showing the sites of all PAG and BLA recordings, as well as all fiber optics, would be helpful.

      The PAG is composed of dorsomedial (dm), dorsolateral (dl), lateral (l), and ventrolateral (vl) columns that extend along the rostro-caudal axis of the aqueduct. The term “dorsal PAG” (dPAG) generally encompasses dmPAG, dlPAG, and lPAG, as substantiated by track-tracing, neurochemical, and immunohistochemical techniques (e.g., Bandler et al., 1991; Bandler & Keay, 1996; Carrive, 1993). As Bandler and Shipley (1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switchoff behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, all recordings were conducted within the dPAG. Also, Figures 1B and S2 in our manuscript correspond to the -6.04 mm template from Paxinos & Watson’s atlas (1998), which is shown in the left panel in Author response image 1 and is considerably anterior to the location where the vlPAG emerges, as shown in the right panel. In our revised manuscript, we will provide a detailed definition of the dPAG, inclusive of dmPAG, dlPAG, and lPAG, and support this with the referenced literature.

      Author response image 1.

      2) Prior studies investigating the role of BLA neurons during a foraging vs. robot test similar to the one used in this study should be also cited and discussed (e.g., Amir et al 2019; Amir et al 2015). These two studies demonstrated that most neurons in the basal portion of the BLA exhibit inhibitory activity during foraging behavior and only a small fraction of neurons (~4%) display excitatory activity in response to the robot (in contrast to the 25% reported in the present study). A very accurate histological analysis of BLA recording sites should be performed to clarify whether distinct subregions of the BLA encode foraging and predator-related information, as previously shown in the two described studies.

      In the revised manuscript, we will discuss papers by Amir et al. (2015) and Amir et al. (2019) that utilized a similar 'approach food-avoid predator' paradigm. These studies found a correlation between the neuronal activities in the basolateral amygdala (BL) and the velocity of animal movement during foraging, regardless of the presence or absence of predators. Specifically, the majority of BL neurons were inhibited in both conditions, with only 4.5% being responsive to predators. Consequently, Amir et al. posited that amygdala activity predominantly aligns with behavioral output such as foraging, rather than with responses to threats.

      In contrast, our body of work (Kim et al., 2018; Kong et al., 2021; the present study) reveals that the majority of neurons in the BA/BLA displayed distinct responses in pre-robot and robot sessions. Kong et al. (2021) discussed in depth several factors that may account for this discrepancy, given that both Amir et al. and our research used similar behavioral paradigms. Differences in apparatus features, experimental procedures, and data analysis methodologies (refer to Amir et al., 2019) could be contributing to the conflicting results and interpretations concerning the significance of amygdalar neuronal activities.

      Additionally, our studies uniquely monitored the same set of amygdalar neurons during pre-robot and robot sessions, affording us the opportunity for a direct comparison of neuronal activities under different threat conditions.

      Another salient difference lines in the foraging success rates, which were markedly higher in Amir et al (~80%) compared to our studies (<3-4%). We hypothesize that there may be an inverse relationship between the pellet procurement rate and the intensity of fear. The high foraging success rate in Amir et al., which correlates with subdued amygdalar activity, stands in contrast to our findings of heightened amygdalar activity associated with a lower foraging success rate. Supporting this notion, optogeneticallyinduced amygdalar activity led naïve rats to abandon foraging and escape to the nest (Kong et al., 2021, the present study).

      3) An important claim of this study that the PAG sends predator-related signals to BLA via the PVT (Fig. 4). The authors stated that PVT neurons labeled by intra-BLA injection of the retrograde tracer CTB were activated by the predator, but a proper immunohistochemical quantification with a control group was not provided to support this claim. To provide better support for their claim, the authors should quantify the doublelabeled PVT neurons (cFos plus CTB positive neurons) during the robot test.

      As recommended, we will include a revised Fig. 4 in the manuscript to present the quantification of neurons that are double-labeled with c-Fos and CTB in the PVT. This updated figure will provide a more rigorous analysis and visual representation of the data.

      4) The AVV anterograde tracer deposit spread to a large part of the PAG, including dorsolateral and lateral PAG, and supraoculomotor regions (Fig. 4B). Is the projection to the PVT from the dPAG or other regions of the PAG?

      As previously addressed in response to Comment #1, the dPAG comprises the dmPAG, dlPAG, and lPAG. In the revised manuscript, we will acknowledge the diffusion of the AAV to the adjacent deep gray layer of the superior colliculus. Additionally, we are considering conducting more restricted AAV injections into the dPAG to verify terminal expressions in the PVT.

      (Concerns about the strength of the evidence supporting a role for the PVT)

      5) The authors conclude in the discussion section that the dPAG-amygdala pathway is involved in generating antipredatory defensive behavior. However, the current results are entirely based on correlational analyses of neural firing rate and there is no direct demonstration that the PAG provides information about the robot to the BLA. Therefore, the authors should tone down their interpretation or provide more evidence to support it by performing experiments applying inhibitory tools in the dPAG > PVT > BLA pathway and examining the impact on behavior and downstream neural firing.

      As suggested, we will moderate the assertions about the functional implications of the PVT, based on the data from anterograde and retrograde tracers, to present a more measured interpretation in the manuscript.

      (Other concerns)

      6) One of the main findings of this study is the observation that BLA neurons that are responsive to PAG photostimulation are preferentially recruited during the foraging vs. robot test (Fig. 3). However, the experimental design used to address this question is problematic because the laser photostimulation of PAG neurons preceded the foraging vs. robot test. Prior photoactivation of PAG may have caused indirect shortterm synaptic plasticity in BLA cells, which would favor the response of these cells to the robot. Please see Oishi et al, 2019 PMID: 30621738, which demonstrated that 10 trains of 20Hz photoactivation (300 pulses each) was sufficient to induce LTP in brain slices.

      After approximately eight photostimulation trials of the dPAG, with 40 pulses each, the animals entered a post-photostimulation testing phase (referred to as "Post"; Fig. 3C), lasting 10-15 minutes over an average of eight trials before robot testing. Although the PAG does not directly project to the BLA, the remote possibility of trans-synaptic plasticity in the BLA cannot be completely excluded and will be acknowledged. Additionally, it is noteworthy that Oishi et al's (2019) study applied a total of 3,000 pulses (i.e., 10 15-s trains of 20-Hz pulses) and investigated CA3-CA3 synaptic plasticity, as opposed to a total of 320 pulses (i.e., 8 2-s trains of 20-Hz pulses) in our study.

      7) The authors should perform a longitudinal analysis of the behavioral responses of the rats across the trials to clarify whether the animals habituate to the robot or not. In Figure 1E, it appears that PAG neurons fire less across the trials, which could be associated with behavioral habituation to the predator robot. If that is the case, the activity of many other PAG and BLA neurons will also most likely vary according to the trial number, which would impact the current interpretation of the results.

      In Figure 1E, the y-axis represents the Z scores of individual dPAG neurons, instead of representing repeated tests of the same neuron across multiple trials. The raster plot in Figure 1F clearly depicts that the same dPAG neurons consistently display heightened neural activity in response to the approaching robot across successive trials.

      8) In Figure 1, it is unclear why the authors compared the activity of neurons that respond to the robot activation against the activity of the neurons during the retrieval of the food pellets in the pre-robot and postrobot sessions. The best comparison would be aligning the cells that were responsive to the activation of the robot with the moment in which the animals run back to the nest after consuming the pellets during the prerobot or post-robot sessions. This would enable the authors to demonstrate that the PAG responses are directly associated with the expression of escaping behavior in the presence of the robot rather than associated with the onset of goal-directed movement in direction to the next during the pre- and post-robot sessions. A graphic showing the correlation between PAG firing rate and escape response would be also informative.

      Figure 1E compares the dPAG neural activity when animals enter a designated pellet zone (time-stamped by camera tracking) during both pre-robot and post-robot trials to the dPAG neural activity when entering the robot trigger zone (time-stamped by robot activation). We wish to clarify that rats carry the large (0.5 g) pellet back to the nest for consumption rather than consume it in the open arena before returning to the nest.

      In our study, we aimed to investigate the direct response of dPAG neurons to the looming predator and explore the communication between dPAG and BLA in relation to antipredatory defensive responses. To build upon our previous research that suggests a potential role of dPAG in conveying such responses to the BLA (Kim et al., 2013) and the immediate firing of BLA neurons in response to predatory threats (Kim et al., 2018; Kong et al., 2021), we chose to narrow our testing window to a short latency period (< 500 ms) following robot activations. This specific time window allowed us to focus on the initial stages of the threat stimulus processing and minimize potential confounding factors such as the presence of residual firing activity triggered by the robot during the animals’ escape or any activity changes induced by the animals' behavior.

      Furthermore, Figure S1C clearly demonstrates that (i) increased activity of dPAG robot cells preceded the animals’ actual turning and fleeing behavior toward the nest, as indicated by the peak values of movement speed (dark yellow), and (ii) the presence of pellets did not affect activity changes of the robot cells during pre- and post-robot sessions. These observations suggest that the heightened activity of dPAG robot cells was not due to movement changes or pellet motivation.

      Lastly, as stated in the original manuscript, the vast majority of robot cells (90.9%) did not show significant correlations between movement speed and firing rates, lending further support to the interpretation that the dPAG activity observed was not merely a reflection of movement changes.

      References

      Bandler, R., Carrive, P., & Depaulis, A. (1991). Emerging principles of organization of the midbrain periaqueductal gray matter. The midbrain periaqueductal gray matter: functional, anatomical, and neurochemical organization, 1-8.

      Bandler, R. & Keay, K. A. (1996). Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression. Progress in brain research, 107, 285-300.

      Bandler, R. & Shipley, M. T. (1994) Columnar organization in the midbrain periaqueductal gray: modules for emotional expression? Trends in Neurosciences, 17(9), 379-89.

      Carrive, P. (1993). The periaqueductal gray and defensive behavior: functional representation and neuronal organization. Behavioural brain research, 58(1-2), 27-47.

      Oishi, N., Nomoto, M., Ohkawa, N., Saitoh, Y., Sano, Y., Tsujimura, S., ... & Inokuchi, K. (2019). Artificial association of memory events by optogenetic stimulation of hippocampal CA3 cell ensembles. Molecular brain, 12, 1-10.

      Paxinos, G. & Watson, C. (1998). The Rat Brain in Stereotaxic Coordinates. Academic Press, San Diego. Schenberg, L. C., Póvoa, R. M. F., Costa, A. L. P., Caldellas, A. V., Tufik, S., & Bittencourt, A. S. (2005). Functional specializations within the tectum defense systems of the rat. Neuroscience & Biobehavioral Reviews, 29(8), 1279-1298.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work by Ding et al uses agent-based simulations to explore the role of the structure of molecular motor myosin filaments in force generation in cytoskeletal structures. The focus of the study is on disordered actin bundles which can occur in the cell cytoskeleton and have also been investigated with in vitro purified protein experiments.

      Strengths:

      The key finding is that cooperative effects between multiple myosin filaments can enhance both total force and the efficiency of force generation (force per myosin). These trends were possible to obtain only because the detailed structure of the motor filaments with multiple heads is represented in the model.

      We appreciate your comments about the strength of our study.

      Weaknesses:

      It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments or can be tested in future experiments.

      Thank you for the comment. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and non-muscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown below, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same. We will add discussion about these in the revised manuscript.

      Author response image 1.

      As the reviewer noticed, our results were briefly compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) where different myosin isoforms were used for in vitro actin bundles. We will add more quantitative comparisons between the in vitro study and our results.

      In addition, at the end of the conclusion section, we suggested future experiments that can be used for verifying our results. In particular, experiments with synthetic myosin filaments with tunable geometry seem to be suitable for verifying our computational predictions and observations.

      The model assumptions and scientific context need to be described better.

      We apologize for the insufficient descriptions about the model. We will revise those parts to better explain model assumptions and scientific context.

      The network contractility seems to be a mere appendix to the bundle contractility which is presented in much more detail.

      We included some cases run with the two-dimensional network in this study to prove the generality of our conclusions. We included minimal preliminary results in this study because we are currently working on a follow-up study with network structures. I hope that the reviewer would understand our intention and situation.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors use a mechanical model to investigate how the geometry and deformations of myosin II filaments influence their force generation. They introduce a force generation efficiency that is defined as the ratio of the total generated force and the maximal force that the motors can generate. By changing the architecture of the myosin II filaments, they study the force generation efficiency in different systems: two filaments, a disorganized bundle, and a 2D network. In the simple two-filament systems, they found that in the presence of actin cross-linking proteins motors cannot add up their force because of steric hindrances. In the disorganized bundle, the authors identified a critical overlap of motors for cooperative force generation. This overlap is also influenced by the arrangement of the motor on the filaments and influenced by the length of the bare zone between the motor heads.

      Strengths:

      The strength of the study is the identification of organizational principles in myosin II filaments that influence force generation. It provides a complementary mechanistic perspective on the operation of these motor filaments. The force generation efficiency and the cooperative overlap number are quantitative ways to characterize the force generation of molecular motors in clusters and between filaments. These quantities and their conceptual implications are most likely also applicable in other systems.

      Thank you for the comments about the strength of our study.

      Weaknesses:

      The detailed model that the authors present relies on over 20 numerical parameters that are listed in the supplement. Because of this vast amount of parameters, it is not clear how general the findings are. On the other hand, it was not obvious how specific the model is to myosin II, meaning how well it can describe experimental findings or make measurable predictions. The model seems to be quantitative, but the interpretation and connection to real experiments are rather qualitative in my point of view.

      As the reviewer mentioned, all agent-based computational models for simulating the actin cytoskeleton are inevitably involved with such a large number of parameters. Some of the parameter values are not known well, so we have tuned our parameter values carefully by comparing our results with experimental observations in our previous studies since 2009. 

      We were aware of the importance of rigorous representation of unbinding and walking rates of myosin motors, so we implemented the parallel cluster model, which can predict those rates with consideration of the mechanochemical rates of myosin II, into our model. Thus, we are convincing that our motors represent myosin II.

      In our manuscript, our results were compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) several times. In particular, larger force generation with more myosin heads per thick filament was consistent between the experiment and our simulations.

      Our study can make various predictions. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and non-muscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown in Author response image 1, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same. We will add discussion about these in the revised manuscript.

      We will add more discussion about these in the revised manuscript.

      It was often difficult for me to follow what parameters were changed and what parameters were set to what numerical values when inspecting the curve shown in the figures. The manuscript could be more specific by explicitly giving numbers. For example, in the caption for Figure 6, instead of saying "is varied by changing the number of motor arms, the bare zone length, the spacing between motor arms", the authors could be more specific and give the ranges: ""is varied by changing the number of motor arms form ... to .., the bare zone length from .. to..., and the spacing between motor arms from .. to ..".

      This unspecificity is also reflected in the text: "We ran simulations with a variation in either L<sub>sp</sub> or L<sub>bz</sub>" What is the range of this variation? "When L<sub>M</sub> was similar" similar to what? "despite different N<sub>M</sub>." What are the different values for N<sub>M</sub>? These are only a few examples that show that the text could be way more specific and quantitative instead of qualitative descriptions.

      We appreciate the comment. We will specify the range of the variation in each parameter in the revised manuscript.

      In the text, after equation (2) the authors discuss assumptions about the binding of the motor to the actin filament. I think these model-related assumptions and explanations should be discussed not in the results section but rather in the "model overview" section.

      Thank you for pointing this out. We will reorganize the text in the revised manuscript.

      The lines with different colors in Figure 2A are not explained. What systems and parameters do they represent?

      The different colors used in Fig. 2A were used for distinguishing 20 cases. We will add explanation about the colors in the figure caption in the revised manuscript.

    1. Author response:

      We thank the reviewers for their support of this work and insightful recommendations for how to improve it. We have provided specific responses to each reviewer comment below. To summarize how we intend to address the requested revisions:

      Many of the reviewers’ comments requested additional technical or quality details about the DMS libraries or assays (e.g., number of cells tested, number of sequencing reads, assay replication, assay sensitivity, library balance), and we provide additional information and analyses that we can incorporate into the relevant portions of the text, supplementary tables, and supplementary figures to address these questions.

      Some comments asked to clarify nomenclature/wording or provide additional labels to images, and we will make these changes as requested.

      A few questions would require additional experimental data to address. Where experiments have already been performed, we will incorporate those results or cite relevant work previously reported in the literature.

      Reviewer 1:

      Summary

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells.

      Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Are variants evenly represented in the library?

      We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.

      Author response image 1.

      We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below.

      Author response image 2.

      Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?

      We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.

      Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions. One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.

      Author response image 3.

      Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      This will be addressed by incorporating the following details into the manuscript:

      We seeded 17 million cells per replicate at the start of each assay and, with a doubling of ~1.5x over the course of the assay, harvested ~25.5 million cells per replicate for RNA extraction and sequencing. We found this sufficient to get at least ~30-60x cellular coverage per amino acid variant.

      Total mapped reads per replicate at RNA-seq stage

      - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3

      - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5

      - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5

      Reads per barcode distribution

      - Median read counts of 8, 10, and 6 reads per sample per barcode for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively.

      Barcodes per variant distribution

      - As reported, the median number of barcodes per variant across samples (the “median of medians”) is 56 for Gs/CRE and 28 for Gq/UAS

      - Additionally, it is 44 for Gs/CRE+Chaperone

      It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      We account for this heterogeneity in several ways. First, as shown above (Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to drug-treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g. a “high count” barcode consistently being high relative to the mean).

      Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.

      To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., PMID: 8631834). We will add this reference to the text to further support the use of the Gq assay.

      Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).

      As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

      We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.

      Reviewer 2:

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    1. Author response:

      Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension  (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a triplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes. We will revise and tone down the corresponding part of the discussion to clarify that it is just a possible interpretation of the results.  

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.  

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.  

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Partwords in List B might be attributed to gender alternation.  

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.  

      Author response image 2.

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words, 

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Author response image 4 for the location of electrodes in an infant head model).  

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      Author response image 4.

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.  

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation). We will revise the discussion section to clarify this theoretical framework.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it. We will revise this section to tone down our claims.  

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We will revise the methods section to clarify these important points.

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We will rephrase this sentence in the manuscript to make it clearer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I was surprised at the effect of BicD2 knockdown on LAMP (and VPS41) localization, which really suggests that in HeLa and Cos7 cells, BicD2 regulation of Kinesin-1 (rather than dynein) is the primary driver of lysosome localization. The KIF5B-knockout rescue of the BicD2overexpression phenotype was a very powerful result that supports this conclusion. Have the authors looked at other cargos, eg, Golgi or centrosomes in G2? Can the authors include more discussion about what this result means or how they imagine dynein and kinesin-1's interaction with BicD2 is regulated? 

      We have performed this experiment as requested by the reviewer. The BICD2 siRNA also resulted in Golgi fragmentation and localization defects of the centrosome in cells that are in G2 phase of the cell cycle (Supplemental Fig. 2E-H).

      We have also added additional discussion related to how BICD2 might couple cargos to opposite polarity motors (lines 440-447). Interestingly, the lysosome motility defect we observe upon BICD2 knock down has similarity to the RAB6A trafficking phenotype. In both cases, what one sees is a sharp reduction in the number of motile particles rather than a reversal in the direction of motility. This suggests that both motors are involved in the steady state distribution of these cargoes.

      (2) Have the authors examined if the SMALED mutants show diminished or increased binding to KIF5B? While the authors are correct that the mutations could hyperactivate dynein because they reduce BicD2 autoinhibition, it is possible that the SMALED mutants hyperactivate dynein because they no longer bind kinesin. This would be particularly interesting, given the complex relationship between BicD2 regulation of dynein and kinesin that the authors show in Figure 3. 

      Thank you for this suggestion. We had not considered this. We have added this experiment in the revised manuscript (Supplemental Fig. 3H, I). We find that the interaction between wild-type BICD2 and KIF5B is only slightly above the control. This is consistent with published findings that indicate that although the isolated CC2 domain of BICD2 is able to interact with KIF5B, the binding is lower for the full-length protein. This is most likely due to the intramolecular interaction between the N and C-termini of BICD2 partially blocking the binding site. Interestingly, however, all three mutants display a reduced interaction with KIF5B, with the reduction being most severe for the cargo domain binding mutants. Thus, as we discuss in the revised manuscript, dynein hyperactivity likely results from increased binding to dynein and a concurrent reduction in binding to KIF5B.

      (3) What is already known about the protein GRAMD1A? Did the authors choose to focus on GRAMD1A because it was the only novel interaction found in the SMALED mutant interactomes, or was this protein interesting for a different reason? Does the known function of GRAMD1A explain the potential dysfunction of cells expressing BICD2_R747C or patients who have this mutation? More discussion of this protein and why the authors focused on it would really strengthen the manuscript. 

      We chose to focus on GRAMD1A for a few reasons. The protein that displayed the highest gain of function interaction with BICD2_R747C in our proteomic analysis was Plastin. However, using at least one antibody against Plastin, we were not able to validate this result. In addition, we had previously performed a proteomic screen using a BICD2_R747A (arginine to alanine) mutation and had compared the interactome of this mutant to the wild-type protein. Plastin was not recovered in that screen but the top hit was GRAMD1A. Given that we isolated GRAMD1A in two separate screens as a gain of function interaction, we believed the result was worth focusing on for followup studies. 

      GRAMD1A (as well as its paralogs GRAMD1B and C) function in non-vesicle transport of accessible cholesterol from the plasma membrane to the ER. We have added additional discussion on GRAMD1A (lines 484-495). While we observe a relocalization of GRAMD1A in mutant expressing cells, we do not know whether this is sufficient to result in cholesterol transport defects. There are several routes for cholesterol uptake, with the GRAMD1A pathway representing just one these routes. 

      Reviewer #2 (Public review):

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7. 

      The localization of our exogenous BICD2-mNeon constructs is similar to what others have seen using GFP tagged versions of the protein (for example Peeters et al., 2013). In addition, in the experiment shown in the initial version of the paper, we were focusing on the centrosomal localization of BICD2. However, our BICD2-mNeon construct is also observed at the Golgi, in addition to its localization throughout the cell (Supplemental Fig. 3C). 

      We attempted to perform a co-immunoprecipitation experiment using endogenous proteins as suggested by the reviewer. Although a rabbit polyclonal antibody was able to coimmunoprecipitate RANBP2 with BICD2, the antibody complex of heavy and light chains comigrated with the VPS41 band and was abundantly detected by the secondary antibody used in the western blot. Thus, we were not able to make a conclusion regarding whether or not VPS41 was present in the co-immunoprecipitate. We attempted the experiment using a mouse monoclonal antibody against BICD2. However, this antibody failed in the immunoprecipitation experiment and we could not detect either RANBP2 (a validated cargo) or VPS41. Although the VPS41 antibody we used in the paper works for western blot, it does not recognize the native protein. Thus, despite our best efforts, we are not able to draw a valid conclusion from these coip experiments.

      It is beyond the scope of the revision to perform the entire experiment in a BICD2 KO cell line.  A BICD2 KO cell line does not exist and it would take several months to make such a knock out in the FLP IN HEK cells that were used in this manuscript. However, we have validated the interaction between BICD2 and VPS41 in cells that have been depleted of endogenous BICD2 (Supplemental Fig. 1B). The transgenic constructs contain silent mutations that make them refractory to bicD2 siRNA1. Thus, although endogenous BICD2 is depleted by the siRNA treatment, wild-type and mutant BICD2_TurboID is not. A similar approach was also used to demonstrate the gain of function interaction between BICD2_R747C and GRAMD1A in cells depleted of endogenous BICD2 (Supplemental Fig. 5A).

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments. 

      We have not been successful in purifying full length BICD2 in bacteria, perhaps due to solubility issues. However, we have added several experiments to further examine the nature of the BICD2-HOPS complex interaction.

      We have performed the experiment as requested. We find that BICD2_delCC1 is able to bind VPS41, but not as efficiently as the full length protein. However, unlike the CC3 cargo binding construct, the BICD2_delCC1 construct also displays reduced binding to RANBP2 (Supplemental Fig. 1D). We attribute this defect to either the intramolecular BICD2 interaction blocking cargo binding or potentially to a folding defect in the BICD2_delCC1 construct. Thus, although we performed this experiment as suggested by the reviewer, we are not able to make a solid conclusion.

      Based on the fact that VPS41 was the most abundantly detected HOPS component in the BICD2 interactome, we hypothesized that it was the point of direct contact between BICD2 and the HOPS complex. However, contrary to our hypothesis, depletion of VPS41 did not compromise the association between BICD2 and VPS16 and VPS18 (Supplemental Fig. 1E). Thus, we conclude that there are multiple points of contact between BICD2 and the HOPS complex, with BICD2 perhaps recognizing a common motif or domain present in these proteins.

      We next attempted to map the interaction site using Alphafold2 multimer. Although we were able to use this platform to predict a high confidence interaction between BICD2 and RAB6A (consistent with published results), this did not yield a high confidence prediction for the BICD2HOPS complex interaction.

      Ultimately although we added several new experiments, we were not able to determine the minimal region for binding, nor whether the interaction is direct or indirect. These caveats are clearly stated in the revised manuscript. Regardless of whether the interaction is direct or indirect however, it is noteworthy that the association between BICD2 and the HOPS complex is reduced by the R747C SMALED2 mutation.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms. 

      This point was addressed in the general discussion. Given the complexity of SMALED2 (autosomal dominant disorder; variable phenotypic severity; adult onset disorder in many instances, etc.) it is very hard to model in a cell line. One of the reasons we focused our studies on the HOPS complex and VPS41 in particular was because mutations in VPS41 are associated with spinocerebellar ataxia, a neurodevelopment disorder. However, we cannot conclude whether the reduction/loss of interaction of BICD2 with the HOPS complex is causative for disease symptoms. We also cannot conclude at present whether the mis-targeting of GRAMD1A is causative for disease symptoms. We have discussed these caveats in the revised manuscript and have included a section in the discussion that specifically lists the limitations of our study (lines 511-530).

      With that said, we can conclude that mutations in the cargo binding domain of BICD2 result in dynein hyperactivity, altered BICD2 localization in hippocampal neurons, and reduced neurite growth. Given that we observe interactome changes in HEK cells, it is plausible that interactome changes also exist in motor neurons. However, even in the absence of interactome changes, hyperactivation of dynein alone can result in cargo trafficking defects; the same cargos can be excessively localized in the soma vs the axon. As noted previously, however, a thorough examination of these points will require the use of genetically engineered motor neurons and is beyond the scope of the current study.

      Reviewer #3 (Public review):

      Strengths: 

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain-of-function interaction with GRAMD1A. 

      Weaknesses: 

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant? 

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript. 

      As discussed for reviewer 2 (point 2), we have added several experiments to better characterize the BICD2-HOPS complex interaction.

      We chose to focus on the HOPS complex for a few reasons. The list of interactions that displayed a >2 fold enrichment vs control was actually not that large (66 proteins). Within this list, we identified 4 out of 6 HOPS components and VPS41 was the 5th most enriched protein in the BICD2 interactome (RANBP2 by contrast was #16 on this list). Furthermore, the BICD2_R747C mutation resulted in greatly reduced interaction of BICD2 with the HOPS complex, whereas its interaction with dynein was increased. These results indicate that these proteins are not simply immunoprecipitating with the BICD2/dynein complex. Apart from the HOPS complex, lysosomal proteins were not present in the interactome, making it unlikely that they were identified due to non-specific interactions between BICD2 and co-precipitating lysosomes.

      (2) In the biotin proximity ligation assay, a large number of Bi cD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why, particularly GRAMD1A was chosen as a gain-of-function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant, but GRAMD1A also interacts with WT BicD2. 

      Please see the above discussion on GRAMD1A (reviewer 1, point 3). GRAMD1A comes down non-specifically with the binding control as well as BICD2_wt. We therefore conclude that wildtype BICD2 does not specifically interact with GRAMD1A above background levels (Fig. 7, compare the control lane vs BICD2-wt).

      (3) Furthermore, the functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors, a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyperactivation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data. 

      We have performed the experiment as requested by the reviewer. The re-localized GRAMD1A localizes adjacent to Pericentrin, a centrosomal marker (Supplemental Fig. 5B-F). GRAMD1A and BICD2 appear to co-localize in a ring around the Pericentrin marked centrosome.

      The re-localization of GRAMD1A to the centrosomal area by BICD2_R747C appears to be unique to this mutant, and not simply an issue of dynein hyperactivity. The other two mutants tested, BICD2_N188T and BICD2_R694C also hyperactivate dynein. However, they do not result in the same type of dramatic re-localization of GRAMD1A as we observe with the BICD2_R747C mutant. We conclude that this altered localization results from a gain of function interaction with BICD2_R747C as well as dynein hyperactivity.

      Reviewer #1 (Recommendations for the authors): 

      Please add a discussion about how the authors calculated the Cell Body enrichment shown in 5E. Is this a ratio of the BicD2 intensity in the cell body:axon? Did the authors normalize for potential differences in BicD2 variant expression? 

      Yes, it is a ratio of the intensity between the cell body and axon. This is described in the Methods section under quantification (lines 725-728). We attempted to image cells expressing similar amounts of protein.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The paper would benefit from an explanation of why the authors chose to follow up on the HOPS complex out of all proteins identified in the interactome experiment. 

      This discussion has been included in the revised manuscript.  

      (2) In panel B of Supplementary Figure 1, RFP mTurbo has a significant amount of non-specific binding to VPS18. The authors note that in the initial interactome experiment, there was a twofold enrichment of this protein in BICD2 pulldown versus control. Do the authors have a co-IP that has a similar enrichment?

      VPS18 occasionally comes down non-specifically with our RFP-TurboID control. However, the interaction is specific, because very little VPS18 comes down with the BICD2 construct lacking the cargo binding domain (Fig. 2B). An additional example of the VPS18 binding result is shown in Supplemental Fig. 1E.

      (3) In Figure 2B, there seems to be less Vps18 in the input for BICD2 delCC3-mTrbo. Do the authors have a blot where there is equal input across all conditions? This may increase the slight signal seen in the pulldown.

      The blot shown in Supplemental Fig. 1C has equivalent load for VPS18 across all lanes. Minimal binding of VPS18 is observed with the BICD2_delCC3 sample.

      (4) In Figure 3, can the authors show representative images of GFP-VPS-41 and LAMP1 localization that are at the same magnification? It currently looks as if the localization pattern differs between the two under control siRNA. Alternatively, the authors should show colocalization of the two, as the authors note both are localized to late endosomes/lysosomes. 

      We have provided additional images that are at the same magnification (Supplemental Fig. 2IK). Co-localization between GFP-VPS41 (rabbit polyclonal antibody against GFP) and LAMP1 (rabbit polyclonal antibody) is not possible. However, published studies have shown that a subset of V5 tagged VPS41 vesicles are positive for LAMP1. We have cited this study.

      (5) In Supplementary Figure 2, the authors should show the knockdown efficiency of both BICD2 siRNAs. The VPS41 staining in panel B looks like there is less perinuclear localization than with BICD2 siRNA 1. Is the because of knockdown efficiency? 

      We have included this data (Supplemental Fig. 2B). Both siRNAs are capable of depleting BICD2. However, we do see slightly more effective knock down with siRNA-1.

      (6) The data in Figure 4A would be more striking with quantification. 

      Quantifications have been provided (Supplemental Fig. 3A,B). Using a one-way Anova analysis, BICD2_R747C is the only mutant that shows significance. Variability in the binding experiment resulted in the other two mutants not showing a statistically significant change. However, the additional assays that are provided (centrosomal enrichment of BICD2 and peroxisome tethering) clearly demonstrate that the R694C mutant also results in dynein hyperactivation. It should be noted that the analysis done by Huynh et al., 2017 also showed a binding increase between BICD2 disease mutants and dynein. However, due to binding variability, their results were not not statistically significant.

      (7) Can the authors explain how centrosome enrichment is calculated in Figure 4F? The intensity of colocalization with the centrosome between mutant constructs visually does not look significantly different. Is this a ratio of centrosome localization to cell body localization? 

      We apologize for this omission. This has been added to the quantification section of the Methods (lines 721-723). Yes, it is a ratio of mean signal at the centrosome vs mean signal in the rest of the cell.

      (8) The current input blot in Supplementary Figure 4A shows increasing amounts of importin beta across the lanes. Do the authors have a blot of panel A in which the input level of importin beta is the same between constructs? Does this change the level of importin beta that is pulled down?

      Another replicate of this experiment has been shown. We have retained the original experiment as well (Supplemental Figs. 4A, B).

      Reviewer #3 (Recommendations for the authors): 

      Minor points: 

      (1) In the .pdf version of the supplemental tables, the text is often cropped. It is recommended to delete the .pdf versions and just retain the Excel versions of the tables. 

      We are not sure why this occurred. Excel files were provided. In addition, the raw data from the mass spectrometry experiments will also be included with the final version of the manuscript.

      (2) Line 367: For transport of Rab6, kinesin-1 is the dominant motor, but dynein is still active and engaging in a tug of war (Serra Marquez et al 2022). 

      Thank you. We have revised our text to include this discussion. In this regard, LAMP1 vesicles are similar. Loss of BICD2 results in a greater number of stationary vesicles rather than vesicles that are excessively targeted towards the microtubules minus end.

      (3) Line 371: BicD2 is required for the transport of RanBP2 from annulate lamellae to nuclear pore complexes.

      Thank you. We have modified our text. 

      (4) Yi et al., 2023 have previously shown changed interactions of the BicD2/R747C mutant, such as decreased binding to Nup358 and increased binding to Nesprin-2, as well as functional implications for the associated brain developmental pathways, which should be acknowledged.

      We apologize for leaving this out. In the original version of the manuscript, we were attempting to keep the discussion more concise. We have added a discussion of these findings in the revised manuscript (lines 496-507).

    1. Author response:

      Reviewer #1 (Public review):

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Weaknesses:

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study.

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNA-seq analyses in whole fat body tissues expressing MPC. These methods revealed hyperactivation of mTORC1 and Myc signaling in fat body cells expressing MPC in Drosophila, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses inappropriate reductions in cell size and activates signaling pathways to promote cell growth. The best characterized “sizer” pathway for mammalian cells is the CycD/CDK4 complex which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTOR pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those over-expressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc are epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. TAG abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses:

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention.

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We will carefully revise the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes.

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation.

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes?

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impacts nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author response image 1A). However, the reduction in nuclear size became significant only after 36 hours of MPC expression (Author response image 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author response image 1C and D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells. A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>²</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R<sup>²</sup> value.

      This image highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those over-expressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis?

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass.

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are.

      We appreciate this valuable suggestion. In the revised manuscript, we will quantify trehalose abundance in circulation and within fat bodies. As described in the Methods section, following the approach outlined in Ugrankar-Banerjee et al., 2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. We will apply this methodology to include the trehalose measurements as part of our updated analysis.

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8).

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we will quantify the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR and validate the corresponding data in Supplementary Figure 8f through western blot analysis.

      Additionally, we will assess the efficiency of pcb, pdha, dlat, pepck2, and Got2 manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      We thank the reviewer for this comment. IFT172C contains TPR region and Ubox-like region which are admittedly tightly bound to each other. While there is a possibility that this region functions and exists as one domain, below are the reasons why we chose to classify these regions as two different domains.

      (1) TPR and Ubox-like regions are two different structural classes

      (2) TPR region is linked to Ubox-like region via a long linker which seems poised to regulate the relative movement between these regions.

      (3) Many ciliopathy mutations are mapped to the interface of TPR region and the Ubox region hinting at a regulatory mechanism governed by this interface.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      We thank the reviewer for this comment. We are currently working on identifying the substrates of IF172 to reveal the physiological relevant of its ubiquitination activity.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in a revised version of the manuscript.

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in the conclusion on pg. 11.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      We thank the reviewer for raising this important point about ubiquitination site identification. While not included in our manuscript, we did perform mass spectrometry analysis of ubiquitination sites using wild-type IFT172 and several mutants (P1725A, C1727R, and F1715A). As shown in the figure below, we detected multiple ubiquitination sites across these constructs. The wild-type protein showed ubiquitination at positions K1022, K1237, K1271, and K1551, while the mutants displayed slightly different patterns of modification. However, we should note that the MS intensity signals for these ubiquitinated peptides were relatively low compared to unmodified peptides, making it difficult to draw strong conclusions about site specificity or physiological relevance.

      Author response image 1.

      These results align with the reviewer's suggestion that ubiquitination occurs within the TPR-containing region. However, given the technical limitations of the MS analysis and the potential for E3-independent ubiquitination by UBCH5A, we have taken a conservative approach in interpreting these findings.

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      We do acknowledge in the manuscript is that the conjugation of ubiquitins to IFT172C is weak (Page 16). Future experiments of identification of potential substrates and its implications in ciliary regulation will provide further context to our in vitro ubiquitination experiments.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      We agree that these mutations could provide insights into ubiquitin binding by IFT172. We are currently pursuing further mutagenesis studies on the IFT172-Ub interface based on the AF model. We however have evaluated the ubiquitin binding activity of the mutant F1715A using similar pulldowns, which showed no significant impact for the mutation on the ubiquitin binding activity of IFT172. We are yet to evaluate the impact of alternate amino acid substitutions at these positions. The I1688 mutants we cloned could not be expressed in soluble form, thus could not be used for testing in ubiquitination activity or ubiquitin binding assays.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      It is likely that IFT172 only binds ubiquitin with low affinity as indicated by our in vitro pulldowns and the AF interface. In our pull down experiment performed using the Chlamy flagella extracts, we have used extensive washes to remove non-specific interactors. This might have also excluded the identification of weak but bona fide interactors of IFT172. Additionally, we have not used any ubiquitination preserving reagents such as NEM in our pulldown buffers, exposing the cellular ubiquitinated proteins to DUB mediated proteolysis further preventing their identification in our pulldown/MS experiment.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      We acknowledge that our current data cannot distinguish whether the TGFβ pathway defects arise from general protein instability or from specific loss of ubiquitin-related functions. Our experiments demonstrate that the U-box-like region is required for both IFT172 stability and proper TGFβ signaling, but we agree that establishing a direct mechanistic link between these phenomena would require additional evidence. We will revise our discussion to more clearly acknowledge this limitation in our current understanding of the relationship between IFT172's U-box region and TGFβ pathway regulation.

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

      We agree with the reviewer that our AlphaFold-Multimer (AF-M) predictions alone do not constitute experimental validation of a direct interaction. We appreciate the reviewer's understanding of the technical challenges in validating this interaction experimentally. We will revise our text to more precisely state that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a potential direct IFT172 interactor" and will discuss the AF-M predictions as computational evidence that suggests, but does not prove, a direct interaction. This more accurately reflects the current state of our understanding of this potential interaction.

      Reviewer #3:

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      We agree with the reviewer that validation of protein-protein interactions in living cells provides important physiological context. While our pulldown experiments have identified several promising interaction partners and the AF-M predictions provide computational support for these interactions, we acknowledge that demonstrating these interactions in vivo would strengthen our findings. However, we believe our current biochemical and structural analyses provide valuable insights into the molecular basis of IFT172's interactions, laying important groundwork for future cell-based studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      We thank the reviewer for noting that the characterization of our cell lines could be more rigorous. In the revised manuscript, we will provide additional characterization of the cell lines, including detailed sequencing information and validation data for the IFT172 mutants. This will bring the documentation of our cell-based experiments up to the same standard as other aspects of our work.

    1. Author response:

      We thank the reviewers for their help and their suggestions to make this manuscript more rigorous. We would like to post provisional author responses when eLife publish the reviewed preprint, and the more detailed responses will be supplemented with the revised manuscript.

      • There are questions about choices made in the computational approach (architecture and type of generative model, training set).

      We will train a new generator model based on the current GAN architecture, but with ‘hybrid’ AMP/AVP training sets (Reviewer 1 and 3). Hence, we can directly compare the performances of two generators. Based on our preliminary data, providing GAN with more AVP sequences during training helped the designed peptides pass the AVP filter, at the cost of reducing the average AMPredicgtor scores. The new generator also elevated the diversity of designed sequences.

      We also perturbed the detailed architecture of our deep learning models, including fully-connected graph edge encodings and different versions of ESM (e.g. esm1b_t33_650M_UR50S, esm2_t48_15B_UR50D, Reviewer 2). In the revised manuscript, we will report the effects of these modifications and suggest the overall construct of GCN and GAN are suitable for a light-weight sequence label model, as demonstrated in Author response table 1 and 2. For the generator, we suggest that using our approach, we may have reached a plateau for the GAN sampling (Author response table 3).

      Author response table 1.

      Results of AMPredictor with different graph edge encodings

      Author response table 2.

      Results of AMPredictor with different ESM versions

      Author response table 3.

      Evaluation of generated sequences with different sampling numbers

      • There is an important concern about the small number of antimicrobial peptides tested, compared to other studies, and the origin of antiviral activities.

      We will address this concern by increasing the number of peptides tested in anti-microbial and anti-viral experiments. As reported in current version of our manuscript, the first generation of GAN generated 128 unique designs and the top 2% (3 designs) was tested experimentally. The second generation of GAN will produce ~1024 designs (1-2 weeks) and the top 2% (~ 20 new sequences) will be tested. We are in the process of synthesize (2-3 weeks) and MIC measurement (1 week). The overall size of tested sample will reach 20-30 sequences. We will focus on sequences with low similarity (< 30%) to any known AMPs, thus expanding the universe functional peptides. We estimated the collection of these new data in 6 weeks.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      In Figure 3, we did not delete Msi2 via shRNA. Instead, we utilized a genetic model in which the Msi2 gene was disrupted via gene trap mutagenesis. We have also used this model in previous publications to define the impact of Msi2 loss in other systems1.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      We demonstrated that the efficiency of Msi2 depletion was ~83% (Figures 4A and 4C) via qPCR analysis for our in vitro and in vivo experiments, respectively, and verified the knockdown via bulk RNA-seq analysis. The shRNA hairpin used was previously validated and published by our lab2.

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      The basis of the cell growth impairment after Msi2 deletion/knockdown in this paper is certainly an important question, and future experiments will be performed to better delineate this. In previous publications loss of Msi2 in leukemia cells has been shown to inhibit growth via arrested cell cycle progression by increasing the expression of p213. Further, loss of Msi2 was also shown to promote apoptosis in part by upregulating Bax3. These data suggest that Msi2 can have an impact via multiple distinct mechanisms including by mediating cell cycle arrest and blocking apoptosis. While these specific genes were not detectably changed after loss of Msi2 in lung cancer cells, other genes in these and other pathways will be important to study in the future.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      In our RNA-seq of cells following Msi2 knockdown, Msi1 expression was undetectable. The TPM values for Msi1 in control and knockdown cells were less than 0.01, suggesting that it did not compensate for the loss of Msi2.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

      Previous published work from the lab showed that expression of Msi2 in the context of myeloid leukemia1can not only repress NUMB protein (I believe protein should be all caps?) (as has been previously demonstrated in the nervous system) but also Numb RNA. This indicated that as an RNA binding protein, Msi2 also can bind and destabilize direct binding targets such as Numb; this was the reason for pursuing transcriptomic analysis.  However as the reviewer suggests, proteomic studies are certainly very important to develop a complete picture of the impact of Musashi to determine which targets are controlled by Msi2 at the protein level.

      Reviewer #2 (Public Review):

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      In previous unpublished work from our lab, we have found that Msi2+ cells from a GFP reporter KPf/fC mouse are readily able to become GFP negative (Msi2-), but the inverse is not true. Specifically, when Msi2+ KPf/fC pancreatic cells were transplanted into the flanks of NSG mice, Msi2+ cells formed tumors in all recipients; these tumors contained both GFP+ and GFP- cells (over 80%)  recapitulating the original heterogeneity and suggesting GFP+ cells can give rise to both GFP+ and GFP- cells (Lytle and Reya, unpublished observations). In contrast only a small subset of GFP- transplanted mice formed tumors. One of the rare GFP- derived tumors was isolated and found to contain largely GFP- cells, with ~0.1% GFP+ cells. The small frequency of GFP expression could be from contaminating cells or may suggest that GFP- cells retain some ability to switch on Msi under selective pressure, and that although they pose a lower risk of driving tumorigenesis than Msi+ cells, they may nonetheless bear latent potential to become higher risk. These data may offer a possible model for projecting the potential of Msi2+ cells in the lung, but is something that needs to be further studied in this tissue.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      In unpublished work from our lab, we have found that Kras mutation upregulates Msi2 over baseline and subsequent p53 loss upregulates Msi2 further in the context of pancreatic cells (Lytle and Reya unpublished results), therefore it is possible that the same is true for the lung. Specifically, we have observed that Msi2 increased from normal acinar cells to Kras-mutated acinar (e.g. pancreatic intraepithelial neoplasia (PanIN)).

      To address whether Msi2+ cells are more susceptible to tumorigenesis, we have recently published data showing that the stabilization of the oncogenic MYC protein in lung Msi2+ cells drive the formation of small-cell lung cancer in a new inducible Msi2-CreERT2; CAG-LSL-MycT58A mice (Msi2-Myc)4 model. More importantly, this data provides the first evidence that normal Msi2+ cells are primed and highly sensitive to MYC-driven transformation across many organs and not just the lung4.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      We isolated the lungs of three Msi2 WT and three Msi2 KO mice and used immunofluorescence staining to stain for CC10 (BASC) and SPC (AT2) to determine if these cell populations were reduced after Msi2 loss alone. Below are representative images showing that the Msi2 KO mice did not have lower numbers of both BASC and AT2 cell populations. 

      Author response image 1.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      The shRNA hairpin used for Msi2 was previously validated and published by our lab2. Additionally, in this work we did develop and use a Msi2 genetic knockout mouse model that validates our shRNA knockdown data showing the specific impact of Msi2 on lung tumor growth.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

      Due to space considerations, we were unable not put the specifics in the legend, but the details are in the methods section (Flank Transplant Assays). In brief, 500,000 cells/well were plated in a 6-well plate coated with Matrigel and 83,000 cells/well were plated in a 24-well plate coated with Matrigel for subsequent determination of transduction efficiency via FACS. 24 hours after transduction, media from the cells was collected and placed on ice. 1mL of 2mg/mL collagenase/dispase was then added to the well and incubated for 45 minutes at 37ºC to dissociate the remaining cells from Matrigel followed by subsequent washes. Cells were pelleted by centrifugation and an equivalent number of shControl and shMsi2 transduced cells were resuspended in full media, mixed at a 1:1 ratio with growth factor reduced Matrigel at a final volume of 100 μL, and transplanted subcutaneously into the flanks of NSG recipient mice.

      Reviewer #3 (Public Review):

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      We performed co-staining of Msi2 and CC10 as well as Msi2 and SPC in Figure 1C. In the future we can include additional markers as well as markers for airway and other alveolar cell types.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      Recently, we published data showing that Msi2 is expressed in Clara cells at the bronchoalveolar junction in the lung of our new Msi2-CreERT2 knock-in mouse model4. Furthermore, stabilization of the oncogenic MYC protein in these specific cells to model Myc amplification was sufficient to drive the formation of small-cell lung cancer4. These data excitingly demonstrate that Msi2+ cells are more responsive to transformation after Myc stabilization.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

      In the future we hope to collect more patient samples to further validate the data presented with the first 2 patients shown here. We are not certain about the reason behind the large impact of Msi2 inhibition, but as cancer stem cells drive the formation of the rest of the tumor and also drive the stromal microenvironment, it is possible that when Msi2 is deleted, Msi2- cells no longer form tumors? and also the ability to build the stromal microenvironment is impacted. This possibility needs to be further tested in future experiments.

      References

      (1) Ito, T. Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T., & Reya, T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765–768 (2010).

      (2) Fox, R. G. Lytle, N. K., Jaquish, D. V., Park, F. D., Ito, T., Bajaj, J., Koechlein, C. S., Zimdahl, B., Yano, M., Kopp, J. L., Kritzik, M., Sicklick, J. K., Sander, M., Grandgenett, P. M., Hollingsworth, M. A., Shibata, S., Pizzo, D., Valasek, M. A., Sasik, R., Scadeng, M., Okano, H., Kim, Y., MacLeod, A. R., Lowy, A. M., & Reya, T. Image-based detection and targeting of therapy resistance in pancreatic adenocarcinoma. Nature 534, 407–411 (2016).

      (3) Zhang, H. Tan, S., Wang, J., Chen, S., Quan, J., Xian, J., Zhang, Ss., He, J., & Zhang, L. Musashi2 modulates K562 leukemic cell proliferation and apoptosis involving the MAPK pathway. Exp Cell Res 320, 119-27 (2014).

      (4) Rajbhandari, N., Hamilton, M., Quintero, C.M., Ferguson, L.P., Fox, R., Schürch, C.M., Wang, J., Nakamura, M., Lytle, N.K., McDermott, M., Diaz, E., Pettit, H., Kritzik, M., Han, H., Cridebring, D., Wen, K.W., Tsai, S., Goggins, M.G., Lowy, A.M., Wechsler-Reya, R.J., Von Hoff, D.D., Newman, A.M., & Reya, T. Single-cell mapping identifies MSI+ cells as a common origin for diverse subtypes of pancreatic cancer. Cancer Cell 41(11):1989-2005.e9 (2023).

    1. Author Response

      Reviewer #1 (Public Review):

      1) “It is unclear whether new in vivo experiments were conducted for this study”.

      All in vivo experiments shown were conducted independently by new researchers in the lab, using the original fly stocks. This will be more clearly stated in the revised supplement. The aim of repeating the experiments was to directly compare the consequences of impaired N- and C-terminal shedding side-by-side in two Hh-dependent developmental systems.

      2) “A critical shortcoming of the study is that experiments showing Shh secretion/export do not include a Shh(-) control condition. Without demonstration that the bands analyzed are specific for Shh(+) conditions, these experiments cannot be appropriately evaluated”.

      C9C5 antibody reactivity and specificity is shown below, and this control will be added to the revised manuscript. We established the C9C5 immunoblotting protocol – and generated the blot shown in Author Response Image 1 - before any of the experiments in the manuscript were started. The immunoblot clearly shows Shh specificity similar to that of R&D AF464 anti-Shh antibodies that were previously used in the lab. The immunoblot also shows that both antibodies detect the same Shh signals in media, that C9C5 is more sensitive, and that AF464 and C9C5 detect 5E1-IP’d dual-lipidated and monolipidated soluble Shh equally well. Also note that, in our hands, C9C5 is highly specific: this antibody detects N-truncated C25S;Δ26-35Shh of increased electrophoretic mobility, but does not cause unspecific signals above or below, even if the blot is strongly overexposed (as shown here). Specific Shh detection by C9C5 is also discussed in our response to editor’s comments below.

      Cells were transfected with constructs encoding full-length C25SShh or truncated C25S;Δ26-35Shh, and proteins in serum-containing media were 5E1 immunoprecipitated or concentrated by heparin-sepharose pulldown. Dual-lipidated R&D 8908-SH was dissolved in the same medium and subjected to the same 5E1 immunoprecipitation or heparin pulldown. The blot was incubated with antibody AF464 and (after stripping) with antibody C9C5. Immunoblot analysis revealed high specificity of both antibodies and also revealed poor interactions of dual-lipidated 8908-SH with highly charged heparin.

      3) “A stably expressing Shh/Hhat cell line would reduce condition to condition and experiment to experiment variability”.

      We fully agree with this reviewer and therefore aimed to establish stable Hhat expressing cell lines several years ago. However, stable Hhat expression eliminated transfected cells after several passages, or cells gradually ceased to express Hhat, preventing us to establish a stable line despite several attempts and tried strategies. For this reason, we established transient co-expression of Shh/Hhat from the same mRNA to at least eliminate variability between relative Shh/Hhat expression levels and to assure complete Shh palmitoylation in our assays.

      4) “Unusual normalization strategies are used for many experiments, and quantification/statistical analyses are missing for several experiments”.

      This comment refers to data shown in Fig. 3 (here, no quantification of Scube2 function in Disp-/- cells had been conducted) and to qPCR data shown in Fig. 4 (here, Shh and C25AShh were compared only indirectly via dual-lipidated R&D 8908-SH, but not directly in a side-by-side experiment, and Shh variants with an N-terminal alanine or a serine were directly compared). We agree with the reviewer and therefore currently repeat qPCR assays and quantify blots to eliminate these technical shortcomings from the final manuscript.

      5) “The study provides a modest advance in the understanding of the complex issue of Shh membrane extraction”

      Our investigation identified unexpected links between Disp as a furin-activated Hh exporter, sheddase-mediated Shh release, Scube2-mediated Shh release and lipoprotein-mediated Hh transport – established modes indeed but with no previously established direct connections – that increase their relevance. We also identified a previously unknown N-processed Shh variant attached to lipoproteins and show that Disp/Scube2 function absolutely requires lipoproteins. Therefore, although we do agree that our findings are confirmatory for the above modes, they also provide new mechanistic insight and challenge the currently dominating model of Disp-mediated hand-over of dual-lipidated Hh to Scube2 chaperones (this model does not predict a role for lipoprotein particles but for both Shh lipids in signaling, for a recent discussion, see PMID 36932157). Our findings suggest an answer to the intensely debated question of whether Disp/Ptch extract cholesterol from the outer or inner plasma membrane leaflet, and suggest that N-palmitate is dispensable for signaling of lipoprotein-associated Shh to Ptch receptors. Finally, we note that previous in vivo studies in flies often relied on Hh overexpression in the fat body, raising questions on their physiological relevance. Our in vivo analyses of Hh function in wing- and eye discs are more physiologically relevant and can explain the previously reported presence of non-lipidated bioactive Hh in disc tissue (PMID: 23554573).

      Reviewer #2 (Public Review):

      1) “However, the results concerning the roles of lipoproteins and Shh lipid modifications are largely confirmatory of previous results, and molecular identity/physiological relevance of the newly identified Shh variant remain unclear”.

      Regarding the confirmatory aspects of our work, please also refer to our response to reviewer 1. In addition, we would like to reply that our unbiased experimental approach was designed to challenge the model of Shh shedding by testing whether established Shh release regulators affect it (e.g. support it) or not. As described in our work, Disp, Scube2 and lipoproteins all contribute to increased shedding (which is new), that Disp function depends on lipoprotein presence (also new), and that lipoproteins modify the outcome of Shh shedding (dual Shh shedding versus N-shedding and lipoprotein association), which is also new.

      Regarding physiological relevance, we would like to reply that our finding that artificially generated monolipidated variants (C25SShh and ShhN) solubilize in uncontrolled manner from producing cells can explain previously observed, highly variable gain-of-function or loss-of-function phenotypes upon their overexpression in vivo 1, 2, 3, 4, 5. Our data is also supported by the observed presence of variably lipidated Shh/Hh variants in vivo 6, and the in vivo observation that complete removal of Scube activity in zebrafish embryos phenocopies a complete loss of Hh function that is bypassed by increased ligand expression - and even results in wild-type-like ectopic Shh target gene expression 7. The in vivo observations are compatible with our data but are incompatible with proposed alternative models of Scube-mediated dual-lipidated Shh extraction and continued Shh/Scube association to allow for morphogen transport.

      2) “Thus, it would be important to demonstrate key findings in cells that secrete Shh endogenously”.

      Experimental data shown in Fig. S8B demonstrates that en-controlled expression of sheddase-resistant Hh variants blocks endogenous Hh function in the same wing disc compartment. To our knowledge, this assay is the most physiologically relevant test of the mechanism of Disp-mediated Hh release. Still, we have now started to analyze Hh from Drosophila disc tissue biochemically and hope that we can include our findings in the final manuscript.

      3) “The authors could use an orthogonal approach, optimally a demonstration of physical interaction, or at least fractionation by a different parameter”.

      We agree with this reviewer’s assessment and are currently in the process to establish co-IP and density gradient conditions to test physical HDL/Shh interactions. The results will be included in the final version of record.

    1. Author Response

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript and will submit our revised manuscript after the reviewed preprint is published by eLife.  

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (Ref 27; Woo, T. T. et al. 2020).

      1. T. T. Woo, C. N. Chuang, M. Higashide, A. Shinohara, T. F. Wang, Dual roles of yeast Rad51 N-terminal domain in repairing DNA double-strand breaks. Nucleic Acids Res 48, 8474-8489 (2020).

      Second, in our preprint manuscript, we have also shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C).

      Third, as revealed by the results of our preprint manuscript (Figure 4), it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (Bachmair, A. et al. 1986; Tasaki, T. et al. 2012; Varshavshy, A. et al. 2019). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptides, unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (Hwang, C. S., 2019).

      A. Bachmair, D. Finley, A. Varshavsky, In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186 (1986).

      T. Tasaki, S. M. Sriram, K. S. Park, Y. T. Kwon, The N-end rule pathway. Annu Rev Biochem 81, 261-289 (2012).

      A. Varshavsky, N-degron and C-degron pathways of protein degradation. Proc Natl Acad Sci 116, 358-366 (2019).

      C. S. Hwang, A. Shemorry, D. Auerbach, A. Varshavsky, The N-end rule pathway is mediated by a complex of the RING-type Ubr1 and HECT-type Ufd4 ubiquitin ligases. Nat Cell Biol 12, 1177-1185 (2010).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus.

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (32, 74), and that polyX prevalence differs among species (43, 75-77).

      1. Cheung HC, San Lucas FA, Hicks S, Chang K, Bertuch AA, Ribes-Zamora A. An S/T-Q cluster domain census unveils new putative targets under Tel1/Mec1 control. BMC Genomics. 2012;13:664.

      2. Mier P, Elena-Real C, Urbanek A, Bernado P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J. 2020;18:306-13.

      3. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      4. Kuspa A, Loomis WF. The genome of Dictyostelium discoideum. Methods Mol Biol. 2006;346:15-30.

      5. Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiol Rev. 2017;41(6):923-40.

      6. Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85(4):709-19.

      We will cite the two references by Kiersten M. Ruff in our revised manuscript.

      K. M. Ruff and R. V. Pappu, (2015) Multiscale simulation provides mechanistic insights into the effects of sequence contexts of early-stage polyglutamine-mediated aggregation. Biophysical Journal 108, 495a.

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis.

      Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43; Palo Mier et al. 2020), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown below, i.e., polyQ (Author response image 1), polyN (Author response image 2), polyS (Author response image 3) and polyT (Author response image 4).

      Author response image 1.

      Q contents in 7 different types of polyQ motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 2.

      N contents in 7 different types of polyN motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 3.

      S contents in 7 different types of polyS motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 4.

      T contents in 7 different types of polyT motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      The results summarized in these four new figures support that polyX prevalence differs among species and that the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43; Palo Mier et al. 2020).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 1). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 2). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 3). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 4).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed. The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      (1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (Tuite, M. F. 2006). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      M. F., Tuite, Yeast prions and their prion forming domain. Cell 27, 397-407 (2005).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007).

      J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      (2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (Traven, A. and Heierhorst, J. 2005). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      A. Traven and J, Heierhorst, SQ/TQ cluster domains: concentrated ATM/ATR kinase phosphorylation site regions in DNA-damage-response proteins. Bioessays. 27, 397-407 (2005).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      (3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package.

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we show evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected by translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      Thank you. These comments are not supported by the results in Figure 1.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (Huttenhower, C., et al. 2009).

      Curtis Huttenhower, C., Haley, E. M., Hibbs, M., A., Dumeaux, V., Barrett, D. R., Hilary A. Coller, H. A., and Olga G. Troyanskaya, O., G. Exploring the human genome with functional maps, Genome Research 19, 1093-1106 (2009).

      The results presented in Author response image 5 and Author response image 6 support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (74). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      1. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      Author response image 5.

      Selection of biological processes with overrepresented SCD-containing proteins in different eukaryotes. The percentages and number of SCD-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stop codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 6.

      Selection of biological processes with overrepresented polyQ-containing proteins in different eukaryotes. The percentages and numbers of polyQ-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stops codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

    1. Author Response

      Reviewer #1 (Public Review):

      Wang and all present an interesting body of work focused on the effects of high altitude and hypoxia on erythropoiesis, resulting in erythrocytosis. This work is specifically focused on the spleen, identifying splenic macrophages as central cells in this effect. This is logical since these cells are involved in erythrophagocytosis and iron recycling. The results suggest that hypoxia induces splenomegaly with decreased number of splenic macrophages. There is also evidence that ferroptosis is induced in these macrophages, leading to cell destruction. Finally, the data suggest that ferroptosis in splenic red pulp macrophages causes the decrease in RBC clearance, resulting in erythrocytosis aka lengthening the RBC lifespan. However, there are many issues with the presented results, with somewhat superficial data, meaning the conclusions are overstated and there is decreased confidence that the hypotheses and observed results are directly causally related to hypoxia.

      Major points:

      1) The spleen is a relatively poorly understood organ but what is known about its role in erythropoiesis especially in mice is that it functions both to clear as well as to generate RBCs. The later process is termed extramedullary hematopoiesis and can occur in other bones beyond the pelvis, liver, and spleen. In mice, the spleen is the main organ of extramedullary erythropoiesis. The finding of transiently decreased spleen size prior to splenomegaly under hypoxic conditions is interesting but not well developed in the manuscript. This is a shortcoming as this is an opportunity to evaluate the immediate effect of hypoxia separately from its more chronic effect. Based just on spleen size, no conclusions can be drawn about what happens in the spleen in response to hypoxia.

      Thank you for your insightful comments and questions. The spleen is instrumental in both immune response and the clearance of erythrocytes, as well as serving as a significant reservoir of blood in the body. This organ, characterized by its high perfusion rate and pliability, constricts under conditions of intense stress, such as during peak physical exertion, the diving reflex, or protracted periods of apnea. This contraction can trigger an immediate release of red blood cells (RBCs) into the bloodstream in instances of substantial blood loss or significant reduction of RBCs. Moreover, elevated oxygen consumption rates in certain animal species can be partially attributed to splenic contractions, which augment hematocrit levels and the overall volume of circulating blood, thereby enhancing venous return and oxygen delivery (Dane et al. J Appl Physiol, 2006, 101:289-97; Longhurst et al. Am J Physiol, 1986, 251: H502-9). In our investigation, we noted a significant contraction of the spleen following exposure to hypoxia for a period of one day. We hypothesized that the body, under such conditions, is incapable of generating sufficient RBCs promptly enough to facilitate enhanced oxygen delivery. Consequently, the spleen reacts by releasing its stored RBCs through splenic constriction, leading to a measurable reduction in spleen size.

      However, we agree with you that further investigation is required to fully understand the implications of these changes. Considering the comments, we propose to extend our research by incorporating more detailed examinations of spleen morphology and function during hypoxia, including the potential impact on extramedullary hematopoiesis. We anticipate that such an expanded analysis would not only help elucidate the initial response to hypoxia but also provide insights into the more chronic effects of this condition on spleen function and erythropoiesis.

      2) Monocyte repopulation of tissue resident macrophages is a minor component of the process being described and it is surprising that monocytes in the bone marrow and spleen are also decreased. Can the authors conjecture why this is happening? Typically, the expectation would be that a decrease in tissue resident macrophages would be accompanied by an increase in monocyte migration into the organ in a compensatory manner.

      We appreciate your insightful query regarding the observed decrease in monocytes in the bone marrow and spleen, particularly considering the typical compensatory increase in monocyte migration into organs following a decrease in tissue resident macrophages.

      The observed decrease in monocytes within the bone marrow is likely attributable to the fact that monocytes and precursor cells for red blood cells (RBCs) both originate from the same hematopoietic stem cells within the bone marrow. It is well established that exposure to hypobaric hypoxia (HH) induces erythroid differentiation specifically within the bone marrow, originating from these hematopoietic stem cells. As such, we postulate that the differentiation into monocytes is reduced under hypoxic conditions, which may subsequently cause a decrease in migration to the spleen.

      Furthermore, we hypothesize that an increased migration of monocytes to other tissues under HH exposure may also contribute to the decreased migration to the spleen. The liver, which partially contributes to the clearance of RBCs, may play a role in this process. Our investigations to date have indeed identified an increased monocyte migration to the liver. We were pleased to discover an elevation in CSF1 expression in the liver following HH exposure for both 7 and 14 days. This finding was corroborated through flow cytometry, which confirmed an increase in monocyte migration to the liver.

      Consequently, we propose that under HH conditions, the liver requires an increased influx of monocytes, which in turn leads to a decrease in monocyte migration to the spleen. However, it is important to note that these findings will be discussed more comprehensively in our forthcoming publication, and as such, the data pertaining to these results have not been included in the current manuscript.

      3) Figure 3 does not definitively provide evidence that cell death is specifically occurring in splenic macrophages and the fraction of Cd11b+ cells is not changed in NN vs HH. Furthermore, the IHC of F4/80 in Fig 3U is not definitive as cells can express F4/80 more or less brightly and no negative/positive controls are shown for this panel.

      We appreciate your insightful comments and critiques regarding Figure 3. We acknowledge that the figure, as presented, does not definitively demonstrate that cell death is specifically occurring in splenic macrophages. While it is challenging to definitively determine the occurrence of cell death in macrophages based solely on Figure 3D-F, our single-cell analysis provides strong evidence that such an event occurs. We initially observed cell death within the spleen under hypobaric hypoxia (HH) conditions, and to discern the precise cell type involved, we conducted single-cell analyses. Regrettably, we did not articulate this clearly in our preliminary manuscript. In the revised version, we have modified the sequence of Figure 3A-C and Figure 3D-F for better clarity. Besides, we observed a significant decrease in the fraction of F4/80hiCD11bhi macrophages under HH conditions compared to NN. To make the changes more evident in CD86 and CD206, we have transformed these scatter plots into histograms in our revised manuscript.

      Considering the limitations of F4/80 as a conclusive macrophage identifier, we have concurrently presented the immunohistochemical (IHC) analyses of heme oxygenase-1 (HO-1). Functioning as a macrophage marker, particularly in cells involved in iron metabolism, HO-1 offers additional diagnostic accuracy. Observations from both F4/80 and HO-1 staining suggested a primary localization of positively stained cells within the splenic red pulp. Following exposure to hypoxia-hyperoxia (HH) conditions, a decrease was noted in the expression of both F4/80 and HO-1. This decrease implies that HH conditions contribute to a reduction in macrophage population and impede the iron metabolism process. In the revised version of our manuscript, we have enhanced the clarity of Figure 3U to illustrate the presence of positive staining, with an emphasis on HO-1 staining, which is predominantly observed in the red pulp.

      4) The phagocytic function of splenic red pulp macrophages relative to infection cannot be used directly to understand erythrophagocytosis. The standard approach is to use opsonized RBCs in vitro. Furthermore, RBC survival is a standard method to assess erythrophagocytosis function. In this method, biotin is injected via tail vein directly and small blood samples are collected to measure the clearance of biotinilation by flow; kits are available to accomplish this. Because the method is standard, Fig 4D is not necessary and Fig 4E needs to be performed only in blood by sampling mice repeatedly and comparing the rate of biotin decline in HH with NN (not comparing 7 d with 14 d).

      We appreciate your insightful comments and suggestions. We concur that the phagocytic function of splenic red pulp macrophages in the context of infection may not be directly translatable to understanding erythrophagocytosis. Given our assessment that the use of cy5.5-labeled E.coli alone may not be sufficient to accurately evaluate the phagocytic function of macrophages, we extended our study to include the use of NHS-biotin-labeled RBCs to assess phagocytic capabilities. While the presence of biotin-labeled RBCs in the blood could provide an indication of RBC clearance, this measure does not exclusively reflect the spleen's role in the process, as it fails to account for the clearance activities of other organs.

      Consequently, we propose that the remaining biotin-labeled RBCs in the spleen may provide a more direct representation of the organ's function in RBC clearance and sequestration. Our observations of diminished erythrophagocytosis at both 7 and 14 days following exposure to HH guided our subsequent efforts to quantify biotin-labeled RBCs in both the circulatory system and spleen. These measurements were conducted during the 7 to 14-day span following the confirmation of impaired erythrophagocytosis. Comparative evaluation of RBC clearance rates under NN and HH conditions provided further evidence supporting our preliminary observations, with the data revealing a decrease in the RBC clearance rate in the context of HH conditions. In response to feedback from other reviewers, we have elected to exclude the phagocytic results and the diagram of the erythrocyte labeling assay. These amendments will be incorporated into the revised manuscript. The reviewers' constructive feedback has played a crucial role in refining the methodological precision and coherence of our investigation.

      5) It is unclear whether Tuftsin has a specific effect on phagocytosis of RBCs without other potential confounding effects. Furthermore, quantifying iron in red pulp splenic macrophages requires alternative readily available more quantitative methods (e.g. sorted red pulp macrophages non-heme iron concentration).

      We appreciate your comments and questions regarding the potential effect of Tuftsin on the phagocytosis of RBCs and the quantification of iron in red pulp splenic macrophages. Regarding the role of Tuftsin, we concur that the literature directly associating Tuftsin with erythrophagocytosis is scant. The work of Gino Roberto Corazza et al. does suggest a link between Tuftsin and general phagocytic capacity, but it does not specifically address erythrophagocytosis (Am J Gastroenterol, 1999;94:391-397). We agree that further investigations are required to elucidate the potential confounding effects and to ascertain whether Tuftsin has a specific impact on the phagocytosis of RBCs. Concerning the quantification of iron in red pulp splenic macrophages, we acknowledge your suggestion to employ readily available and more quantitative methods. We have incorporated additional Fe2+ staining in the spleen at two time points: 7 and 14 days subsequent to HH exposure (refer to the following Figure). The resultant data reveal an escalated deposition of Fe2+ within the red pulp, as evidenced in Figures 5 (panels L and M) and Figure 7 (panels L and M).

      6) In Fig 5, PBMCs are not thought to represent splenic macrophages and although of some interest, does not contribute significantly to the conclusions regarding splenic macrophages at the heart of the current work. The data is also in the wrong direction, namely providing evidence that PBMCs are relatively iron poor which is not consistent with ferroptosis which would increase cellular iron.

      We appreciate your insightful critique regarding Figure 5 and the interpretation of our data on peripheral blood mononuclear cells (PBMCs) in relation to splenic macrophages. We understand that PBMCs do not directly represent splenic macrophages, and we agree that any conclusions drawn from PBMCs must be considered with caution when discussing the behavior of splenic macrophages.

      The primary rationale for incorporating PBMCs into our study was to investigate the potential correspondence between their gene expression changes and those observed in the spleen after HH exposure. This was posited as a working hypothesis for further exploration rather than a conclusive statement. The gene expression in PBMCs was congruous with changes in the spleen's gene expression, demonstrating an iron deficiency phenotype, ostensibly due to the mobilization of intracellular iron for hemoglobin synthesis. Thus, it is plausible that NCOA4 may facilitate iron mobilization through the degradation of ferritin to store iron.

      It remains ambiguous whether ferroptosis was initiated in the PBMCs during our study. Ferroptosis primarily occurs as a response to an increase in Fe2+ rather than an overall increase in intracellular iron. Our preliminary proposition was that relative changes in gene expression in PBMCs could potentially mirror corresponding changes in protein expression in the spleen, thereby potentially indicating alterations in iron processing capacity post-HH exposure. However, we fully acknowledge that this is a conjecture requiring further empirical substantiation or clinical validation.

      7) Tfr1 increase is typically correlated with cellular iron deficiency while ferroptosis consistent with iron loading. The direction of the changes in multiple elements relevant to iron trafficking is somewhat confusing and without additional evidence, there is little confidence that the authors have reached the correct conclusion. Furthermore, the results here are analyses of total spleen samples rather than specific cells in the spleen.

      We appreciate your astute comments and agree that the observed increase in transferrin receptor (TfR) expression, typically associated with cellular iron deficiency, appears contradictory to the expected iron-loading state associated with ferroptosis. We understand that this apparent contradiction might engender some uncertainty about our conclusions.

      In our investigation, we evaluated total spleen samples as opposed to distinct cell types within the spleen, a factor that could have contributed to the seemingly discordant findings. An integral element to bear in mind is the existence of immature RBCs in the spleen, particularly within the hematopoietic island where these immature RBCs cluster around nurse macrophages. These immature RBCs contain abundant TfR which was needed for iron uptake and hemoglobin synthesis. These cells, which prove challenging to eliminate via perfusion, might have played a role in the observed upregulation in TfR expression, especially in the aftermath of HH exposure. Our further research revealed that the expression of TfR in macrophages diminished following hypoxic conditions, thereby suggesting that the elevated TfR expression in tissue samples may predominantly originate from other cell types, especially immature RBCs (refer to subsequent Figure).

      Reviewer #2 (Public Review):

      The authors aimed at elucidating the development of high altitude polycythemia which affects mice and men staying in the hypoxic atmosphere at high altitude (hypobaric hypoxia; HH). HH causes increased erythropoietin production which stimulates the production of red blood cells. The authors hypothesize that increased production is only partially responsible for exaggerated red blood cell production, i.e. polycythemia, but that decreased erythrophagocytosis in the spleen contributes to high red blood cells counts.

      The main strength of the study is the use of a mouse model exposed to HH in a hypobaric chamber. However, not all of the reported results are convincing due to some smaller effects which one may doubt to result in the overall increase in red blood cells as claimed by the authors. Moreover, direct proof for reduced erythrophagocytosis is compromised due to a strong spontaneous loss of labelled red blood cells, although effects of labelled E. coli phagocytosis are shown. Their discussion addresses some of the unexpected results, such as the reduced expression of HO-1 under hypoxia but due to the above-mentioned limitations much of the discussion remains hypothetical.

      Thank you for your valuable feedback and insight. We appreciate the recognition of the strength of our study model, the exposure of mice to hypobaric hypoxia (HH) in a hypobaric animal chamber. We also understand your concerns about the smaller effects and their potential impact on the overall increase in red blood cells (RBCs), as well as the apparent reduced erythrophagocytosis due to the loss of labelled RBCs.

      Erythropoiesis has been predominantly attributed to the amplified production of RBCs under conditions of HH. The focus of our research was to underscore the potential acceleration of hypoxia-associated polycythemia (HAPC) as a result of compromised erythrophagocytosis. Considering the spontaneous loss of labelled RBCs in vivo, we assessed the clearance rate of RBCs at the stages of 7 and 14 days within the HH environment, and subsequently compared this rate within the period from 7 to 14 days following the clear manifestation of erythrophagocytosis impairment at the two aforementioned points identified in our study. This approach was designed to negate the effects of spontaneous loss of labelled RBCs in both NN and HH conditions. Correspondingly, the results derived from blood and spleen analyses corroborated a decline in the RBC clearance rate under HH when juxtaposed with NN conditions.

      Apart from the E. coli phagocytosis and the labeled RBCs experiment (this part of the results was removed in the revision), the injection of Tuftsin further substantiated the impairment of erythrophagocytosis in the HH spleen, as evidenced by the observed decrease in iron within the red pulp of the spleen post-perfusion. Furthermore, to validate our findings, we incorporated RBCs staining in splenic cells at 7 and 14 days of HH exposure, which provided concrete confirmation of impaired erythrophagocytosis (new Figure 4E).

      As for the reduced expression of heme oxygenase-1 (HO-1) under hypoxia, we agree that this was an unexpected result, and we are in the process of further exploring the underlying mechanisms. It is possible that there are other regulatory pathways at play that are yet to be identified. However, we believe that by offering possible interpretations of our data and potential directions for future research, we contribute to the ongoing scientific discourse in this area.

      Reviewer #3 (Public Review):

      The manuscript by Yang et al. investigated in mice how hypobaric hypoxia can modify the RBC clearance function of the spleen, a concept that is of interest. Via interpretation of their data, the authors proposed a model that hypoxia causes an increase in cellular iron levels, possibly in RPMs, leading to ferroptosis, and downregulates their erythrophagocytic capacity. However, most of the data is generated on total splenocytes/total spleen, and the conclusions are not always supported by the presented data. The model of the authors could be questioned by the paper by Youssef et al. (which the authors cite, but in an unclear context) that the ferroptosis in RPMs could be mediated by augmented erythrophagocytosis. As such, the loss of RPMs in vivo which is indeed clear in the histological section shown (and is a strong and interesting finding) can be not directly caused by hypoxia, but by enhanced RBC clearance. Such a possibility should be taken into account.

      Thank you for your insightful comments and constructive feedback. In their research, Youssef et al. (2018) discerned that elevated erythrophagocytosis of stressed red blood cells (RBCs) instigates ferroptosis in red pulp macrophages (RPMs) within the spleen, as evidenced in a mouse model of transfusion. This augmentation of erythrophagocytosis was conspicuous five hours post-injection of RBCs. Conversely, our study elucidated the decrease in erythrophagocytosis in the spleen after both 7 and 14 days.

      Typically, macrophages exhibit an enhanced phagocytic capacity in the immediate aftermath of stress or stimulation. Nonetheless, the temporal points of observation in our study were considerably extended (seven and fourteen days). It remains uncertain whether phagocytic capability was amplified during the acute phase of HH exposure—particularly within the first day, considering that splenoconstriction under HH for one day results in the release of stored RBCs into the bloodstream—and whether this initial response could precipitate ferroptosis and subsequently diminished erythrophagocytosis at the 7 or 14 day marks under continued HH conditions.

      Major points:

      1) The authors present data from total splenocytes and then relate the obtained data to RPMs, which are quantitatively a minor population in the spleen. Eg, labile iron is increased in the splenocytes upon HH, but the manuscript does not show that this occurs in the red pulp or RPMs. They also measure gene/protein expression changes in the total spleen and connect them to changes in macrophages, as indicated in the model Figure (Fig. 7). HO-1 and levels of Ferritin (L and H) can be attributed to the drop in RPMs in the spleen. Are any of these changes preserved cell-intrinsically in cultured macrophages? This should be shown to support the model (relates also to lines 487-88, where the authors again speculate that hypoxia decreases HO-1 which was not demonstrated). In the current stage, for example, we do not know if the labile iron increase in cultured cells and in the spleen in vivo upon hypoxia is the same phenomenon, and why labile iron is increased. To improve the manuscript, the authors should study specifically RPMs.

      We express our gratitude for your perceptive remarks. In our initial manuscript, we did not evaluate labile iron within the red pulp and red pulp macrophages (RPMs). To address this oversight, we utilized the Lillie staining method, in accordance with the protocol outlined by Liu et al., (Chemosphere, 2021, 264(Pt 1):128413), to discern Fe2+ presence within these regions. The outcomes were consistent with our antecedent Western blot and flow cytometry findings in the spleen, corroborating an increment in labile iron specifically within the red pulp of the spleen.

      However, we acknowledge the necessity for other supplementary experimental efforts to further validate these findings. Additionally, we scrutinized the expression of heme oxygenase-1 (HO-1) and iron-related proteins, including transferrin receptor (TfR), ferroportin (Fpn), ferritin (Ft), and nuclear receptor coactivator 4 (NCOA4) in primary macrophages subjected to 1% hypoxic conditions, both with and without hemoglobin treatment. Our results indicated that the expression of ferroptosis-related proteins was consistent with in vivo studies, however the expression of iron related proteins was not similar in vitro and in vivo. It suggesting that the increase in labile iron in cultured cells and the spleen in vivo upon hypoxia are not identical phenomena. However, the precise mechanism remains elusive.

      In our study, we observed a decrease in HO-1 protein expression following 7 and 14 days of HH exposure, as shown in Figure 3U, 5A, and S1A. This finding contradicts previous research that identified HO-1 as a hypoxia-inducible factor (HIF) target under hypoxic conditions (P J Lee et al., 1997). Our discussion, therefore, addressed the potential discrepancy in HO-1 expression under HH. According to our findings, HO-1 regulation under HH appears to be predominantly influenced by macrophage numbers and the RBCs to be processed in the spleen or macrophages, rather than by hypoxia alone.

      It is challenging to discern whether the increased labile iron observed in vitro accurately reflects the in vivo phenomenon, as replicating the iron requirements for RBCs production induced by HH in vitro is inherently difficult. However, by integrating our in vivo and in vitro studies, we determined that the elevated Fe2+ levels were not dependent on HO-1 protein expression, as HO-1 levels was increased in vitro while decreasing in vivo under hypoxic/HH exposure.

      2) The paper uses flow cytometry, but how this method was applied is suboptimal: there are no gating strategies, no indication if single events were determined, and how cell viability was assessed, which are the parent populations when % of cells is shown on the graphs. How RBCs in the spleen could be analyzed without dedicated cell surface markers? A drop in splenic RPMs is presented as the key finding of the manuscript but Fig. 3M shows gating (suboptimal) for monocytes, not RPMs. RPMs are typically F4/80-high, CD11-low (again no gating strategy is shown for RPMs). Also, the authors used single-cell RNAseq to detect a drop in splenic macrophages upon HH, but they do not indicate in Fig. A-C which cluster of cells relates to macrophages. Cell clusters are not identified in these panels, hence the data is not interpretable).

      Thank you for your comments and constructive critique regarding our flow cytometry methodology and presentation. We understand the need for greater transparency and detailed explanation of our procedures, and we acknowledge that the lack of gating strategies and other pertinent information in our initial manuscript may have affected the clarity of our findings.

      In our initial report, we provided an overview of the decline in migrated macrophages (F4/80hiCD11bhi), including both M1 and M2 expression in migrated macrophages, as illustrated in Figure 3, but did not specifically address the changes in red pulp macrophages (RPMs). Based on previous results, it is difficult to identify CD11b- and CD11blo cells. We will repeat the results and attempt to identify F4/80hiCD11blo cells in the revised manuscript. The results of the reanalysis are now included (Figure 3M). However, single-cell in vivo analysis studies may more accurately identify specific cell types that decrease after exposure to HH.

      Furthermore, we substantiated the reduction in red pulp, as evidenced by Figure 4J, given that iron processing primarily occurs within the red pulp. In Figure 3, our initial objective was merely to illustrate the reduction in total macrophages in the spleen following HH exposure.

      To further clarify the characterization of various cell types, we conducted a single-cell analysis. Our findings indicated that clusters 0,1,3,4,14,18, and 29 represented B cells, clusters 2, 10, 12, and 28 represented T cells, clusters 15 and 22 corresponded to NK cells, clusters 5, 11, 13, and 19 represented NKT cells, clusters 6, 9, and 24 represented cell cycle cells, clusters 26 and 17 represented plasma cells, clusters 21 and 23 represented neutrophils, cluster 30 represented erythrocytes, and clusters 7, 8, 16, 20, 24, and 27 represented dendritic cells (DCs) and macrophages, as depicted in Figure 3E.

      3) The authors draw conclusions that are not supported by the data, some examples: a) they cannot exclude eg the compensatory involvement of the liver in the RBCs clearance (the differences between HH sham and HH splenectomy is mild in Fig. 2 E, F and G).

      Thank you for your insightful comments and for pointing out the potential involvement of other organs, such as the liver, in the RBC clearance under HH conditions. We concur with your observation that the differences between the HH sham and HH splenectomy conditions in Fig. 2 E, F, and G are modest. This could indeed suggest a compensatory role of other organs in RBC clearance when splenectomy is performed. Our intent, however, was to underscore the primary role of the spleen in this process under HH exposure.

      In fact, after our initial investigations, we conducted a more extensive study examining the role of the liver in RBC clearance under HH conditions. Our findings, as illustrated in the figures submitted with this response, indeed support a compensatory role for the liver. Specifically, we observed an increase in macrophage numbers and phagocytic activity in the liver under HH conditions. Although the differences in RBC count between the HH sham and HH splenectomy conditions may seem minor, it is essential to consider the unit of this measurement, which is value*1012/ml. Even a small numerical difference can represent a significant biological variation at this scale.

      b) splenomegaly is typically caused by increased extramedullary erythropoiesis, not RBC retention. Why do the authors support the second possibility? Related to this, why do the authors conclude that data in Fig. 4 G,H support the model of RBC retention? A significant drop in splenic RBCs (poorly gated) was observed at 7 days, between NN and HH groups, which could actually indicate increased RBC clearance capacity = less retention.

      Prior investigations have predominantly suggested that spleen enlargement under hypoxic conditions stems from the spleen's extramedullary hematopoiesis. Nevertheless, an intriguing study conducted in 1994 by the General Hospital of Xizang Military Region reported substantial exaggeration and congestion of splenic sinuses in high altitude polycythemia (HAPC) patients. This finding was based on the dissection of spleens from 12 patients with HAPC (Zou Xunda, et al., Southwest Defense Medicine, 1994;5:294-296). Moreover, a recent study indicated that extramedullary erythropoiesis reaches its zenith between 3 to 7 days (Wang H et al., 2021).

      Considering these findings, the present study postulates that hypoxia-induced inhibition of erythrophagocytosis may lead to RBC retention. However, we acknowledge that the manuscript in its current preprint form does not offer conclusive evidence to substantiate this hypothesis. To bridge this gap, we further conducted experiments where the spleen was perfused, and total cells were collected post HH exposure. These cells were then smeared onto slides and subjected to Wright staining. Our results unequivocally demonstrate an evident increase in deformation and retention of RBCs in the spleen following 7 and 14 days of HH exposure. This finding strengthens our initial hypothesis and contributes a novel perspective to the understanding of splenic responses under hypoxic conditions.

      c) lines 452-54: there is no data for decreased phagocytosis in vivo, especially in the context of erythrophagocytosis. This should be done with stressed RBCs transfusion assays, very good examples, like from Youssef et al. or Threul et al. are available in the literature.

      Thanks. In their seminal work, Youssef and colleagues demonstrated that the transfusion of stressed RBCs triggers erythrophagocytosis and subsequently incites ferroptosis in red pulp macrophages (RPMs) within a span of five hours. Given these observations, the applicability of this model to evaluate macrophage phagocytosis in the spleen or RPMs under HH conditions may be limited, as HH has already induced erythropoiesis in vivo. In addition, it was unclear whether the membrane characteristics of stress induced RBCs were similar to those of HH induced RBCs, as this is an important signal for in vivo phagocytosis. The ambiguity arises from the fact that we currently lack sufficient knowledge to discern whether the changes in phagocytosis are instigated by the presence of stressed RBCs or by changes of macrophages induced by HH in vivo. Nonetheless, we appreciate the potential value of this approach and intend to explore its utility in our future investigations. The prospect of distinguishing the effects of stressed RBCs from those of HH on macrophage phagocytosis is an intriguing line of inquiry that could yield significant insights into the mechanisms governing these physiological processes. We will investigate this issue in our further study.

      d) Line 475 - ferritinophagy was not shown in response to hypoxia by the manuscript, especially that NCOA4 is decreased, at least in the total spleen.

      Drawing on the research published in eLife in 2015, it was unequivocally established that ferritinophagy, facilitated by Nuclear Receptor Coactivator 4 (NCOA4), is indispensable for erythropoiesis. This process is modulated by iron-dependent HECT and RLD domain containing E3 ubiquitin protein ligase 2 (HERC2)-mediated proteolysis (Joseph D Mancias et al., eLife. 2015; 4: e10308). As is widely recognized, NCOA4 plays a critical role in directing ferritin (Ft) to the lysosome, where both NCOA4 and Ft undergo coordinated degradation.

      In our study, we provide evidence that exposure to HH stimulates erythropoiesis (Figure 1). We propose that this, in turn, could promote ferritinophagy via NCOA4, resulting in a decrease in NCOA4 protein levels post-HH exposure. We will further increase experiments to verify this concern. This finding not only aligns with the established understanding of ferritinophagy and erythropoiesis but also adds a novel dimension to the understanding of cellular responses to hypoxic conditions.

      4) In a few cases, the authors show only representative dot plots or histograms, without quantification for n>1. In Fig. 4B the authors write about a significant decrease (although with n=1 no statistics could be applied here; of note, it is not clear what kind of samples were analyzed here). Another example is Fig. 6I. In this case, it is even more important as the data are conflicting the cited article and the new one: PMCID: PMC9908853 which shows that hypoxia stimulates efferocytosis. Sometimes the manuscript claim that some changes are observed, although they are not visible in representative figures (eg for M1 and M2 macrophages in Fig. 3M)

      We recognize that our initial portrayal of Figure 4B was lacking in precision, given that it did not include the corresponding statistical graph. While our results demonstrated a significant reduction in the ability to phagocytose E. coli, in line with the recommendations of other reviewers, we have opted to remove the results pertaining to E. coli phagocytosis in this revision, as they primarily reflected immune function. In relation to PMC9908853, which reported metabolic adaptation facilitating enhanced macrophage efferocytosis in limited-oxygen environments, it is worth noting that the macrophages investigated in this study were derived from ER-Hoxb8 macrophage progenitors following the removal of β-estradiol. Consequently, questions arise regarding the comparability between these cultured macrophages and primary macrophages obtained fresh from the spleen post HH exposure. The characteristics and functions of these two different macrophage sources may not align precisely, and this distinction necessitates further investigation.

      5) There are several unclear issues in methodology:

      • what is the purity of primary RPMs in the culture? RPMs are quantitatively poorly represented in splenocyte single-cell suspensions. This reviewer is quite skeptical that the processing of splenocytes from approx 1 mm3 of tissue was sufficient to establish primary RPM cultures. The authors should prove that the cultured cells were indeed RPMs, not monocyte-derived macrophages or other splenic macrophage subtypes.

      Thank you for your thoughtful comments and inquiries. Firstly, I apologize if we did not make it clear in the original manuscript. The purity of the primary RPMs in our culture was found to be approximately 40%, as identified by F4/80hiCD11blo markers using flow cytometry. We recognize that RPMs are typically underrepresented in splenocyte single-cell suspensions, and the concern you raise about the potential for contamination by other cell types is valid.

      We apologize for any ambiguities in the methodological description that may have led to misunderstandings during the review. Indeed, the entirety of the spleen is typically employed for splenic macrophage culture. The size of the spleen can vary dependent on the species and age of the animal, but in mice, it is commonly approximately 1 cm in length. The spleen is then dissected into minuscule fragments, each approximately 1 mm3 in volume, to aid in enzymatic digestion. This procedure does not merely utilize a single 1 mm3 tissue fragment for RPMs cultures. Although the isolation and culture of spleen macrophages can present considerable challenges, our method has been optimized to enhance the yield of this specific cell population.

      • (around line 183) In the description of flow cytometry, there are several missing issues. In 1) it is unclear which type of samples were analyzed. In 2) it is not clear how splenocyte cell suspension was prepared.

      1) Whole blood was extracted from the mice and collected into an anticoagulant tube, which was then set aside for subsequent thiazole orange (TO) staining. 2) Splenic tissue was procured from the mice and subsequently processed into a single-cell suspension using a 40 μm filter. The erythrocytes within the entire sample were subsequently lysed and eliminated, and the remaining cell suspension was resuspended in phosphate-buffered saline (PBS) in preparation for ensuing analyses.

      We have meticulously revised these methodological details in the corresponding section of the manuscript to ensure clarity and precision.

      • In line 192: what does it mean: 'This step can be omitted from cell samples'?

      The methodology employed for the quantification of intracellular divalent iron content and lipid peroxidation level was executed as follows: Splenic tissue was first processed into a single cell suspension, subsequently followed by the lysis of RBCs. It should be noted that this particular stage is superfluous when dealing with isolated cell samples. Subsequently, a total of 1 × 106 cells were incubated with 100 μL of BioTracker Far-red Labile Fe2+ Dye (1 mM, Sigma, SCT037, USA) for a duration of 1 hour, or alternatively, C11-Bodipy 581/591 (10 μM, Thermo Fisher, D3861, USA) for a span of 30 minutes. Post incubation, cells were thoroughly washed twice with PBS. Flow cytometric analysis was subsequently performed, utilizing the FL6 (638 nm/660 nm) channel for the determination of intracellular divalent iron content, and the FL1 (488 nm/525 nm) channel for the quantification of the lipid peroxidation level.

      • 'TO method' is not commonly used anymore and hence it was unclear to this Reviewer. Reticulocytes should be analyzed with proper gating, using cell surface markers.

      We are appreciative of your astute observation pertaining to the methodology we employed to analyze reticulocytes in our study. We value your recommendation to utilize cell surface markers for effective gating, which indeed represents a more modern and accurate approach. However, as reticulocyte identification is not the central focus of our investigation, we opted for the TO staining method—due to its simplicity and credibility of results. In our initial exploration, we adopted the TO staining method in accordance with the protocol outlined (Sci Rep, 2018, 8(1):12793), primarily owing to its established use and demonstrated efficacy in reticulocyte identification.

      • The description of 'phagocytosis of E. coli and RBCs' in the Methods section is unclear and incomplete. The Results section suggests that for the biotinylated RBCs, phagocytosis? or retention? Of RBCs was quantified in vivo, upon transfusion. However, the Methods section suggests either in vitro/ex vivo approach. It is vague what was indeed performed and how in detail. If RBC transfusion was done, this should be properly described. Of note, biotinylation of RBCs is typically done in vivo only, being a first step in RBC lifespan assay. The such assay is missing in the manuscript. Also, it is not clear if the detection of biotinylated RBCs was performed in permeablized cells (this would be required).

      Thanks for the comments. In our initial methodology, we employed Cy5.5-labeled Escherichia coli to probe phagocytic function, albeit with the understanding that this may not constitute the most ideal model for phagocytosis detection within this context (in light of recommendations from other reviewers, we have removed the E. coli phagocytosis results from this revision, as they predominantly mirror immune function). Our fundamental aim was to ascertain whether HH compromises the erythrophagocytic potential of splenic macrophages. In pursuit of this, we subsequently analyzed the clearance of biotinylated RBCs in both the bloodstream and spleen to assess phagocytic functionality in vivo.

      In the present study, instead of transfusing biotinylated RBCs into mice, we opted to inject N-Hydroxysuccinimide (NHS)-biotin into the bloodstream. NHS-biotin is capable of binding with cell membranes in vivo and can be recognized by streptavidin-fluorescein isothiocyanate (FITC) after cells are extracted from the blood or spleen in vitro. Consequently, biotin-labeled RBCs were detectable in both the blood and spleen following NHS-biotin injection for a duration of 21 days.

      Ultimately, we employed flow cytometry to analyze the NHS-biotin labeled RBCs in the blood or spleen. This method facilitates the detection of live cells and is not applicable to permeabilized cells. We believe this approach better aligns with our investigative goals and offers a more robust evaluation of erythrophagocytic function under hypoxic conditions.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample.

      Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C).  For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability.  Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32,  that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq  (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4 C,D.  The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4. 

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human  and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and  Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one to one orthologues as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress  that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3.  We will add a better description in the revised version.

      References

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

    1. Author Response

      We would like to thank the senior editor, reviewing editor and all the reviewers for taking out precious time to review our manuscript and appreciating our study. We are excited that all of you have found strength in our work and have provided comments to strengthen it further. We sincerely appreciate the valuable comments and suggestions, which we believe will help us to further improve the quality of our work.

      Reviewer 1

      The manuscript by Dubey et al. examines the function of the acetyltransferase Tip60. The authors show that (auto)acetylation of a lysine residue in Tip60 is important for its nuclear localization and liquid-liquid-phase-separation (LLPS). The main observations are: (i) Tip60 is localized to the nucleus, where it typically forms punctate foci. (ii) An intrinsically disordered region (IDR) within Tip60 is critical for the normal distribution of Tip60. (iii) Within the IDR the authors show that a lysine residue (K187), that is auto-acetylated, is critical. Mutation of that lysine residue to a non-acetylable arginine abolishes the behavior. (iv) biochemical experiments show that the formation of the punctate foci may be consistent with LLPS.

      On balance, this is an interesting study that describes the role of acetylation of Tip60 in controlling its biochemical behavior as well as its localization and function in cells. The authors mention in their Discussion section other examples showing that acetylation can change the behavior of proteins with respect to LLPS; depending on the specific context, acetylation can promote (as here for Tip60) or impair LLPS.

      Strengths:

      The experiments are largely convincing and appear to be well executed.

      Weaknesses:

      The main concern I have is that all in vivo (i.e. in cells) experiments are done with overexpression in Cos-1 cells, in the presence of the endogenous protein. No attempt is made to use e.g. cells that would be KO for Tip60 in order to have a cleaner system or to look at the endogenous protein. It would be reassuring to know that what the authors observe with highly overexpressed proteins also takes place with endogenous proteins.

      Response: The main reason to perform these experiments with overexpression system was to generate different point mutants and deletion mutants of TIP60 and analyse their effect on its properties and functions. To validate our observations with overexpression system, we also examined localization pattern of endogenous TIP60 by IFA and results depict similar kind of foci pattern within the nucleus as observed with overexpressed TIP60 protein (Figure 4A). However, we understand the reviewers concern and agree to repeat some of the overexpression experiments under endogenous TIP60 knockdown conditions using siRNA or shRNA against 3’ UTR region.

      Also, it is not clear how often the experiments have been repeated and additional quantifications (e.g. of western blots) would be useful.

      Response: The experiments were performed as independent biological replicates (n=3) and this is mentioned in the figure legends. Regarding the suggestion for quantifying Western blots, we want to bring into the notice that where ever required (for blots such as Figure 2F, 6H) that require quantitative estimation, graph representing quantitated value with p-value had already been added. However as suggested, in addition, quantitation for Figure 6D will be performed and added in the revised version.

      In addition, regarding the LLPS description (Figure 1), it would be important to show the wetting behaviour and the temperature-dependent reversibility of the droplet formation.

      Response: We appreciate the suggestion, and we will perform these assays and include the results in the revised version.

      In Fig 3C the mutant (K187R) Tip60 is cytoplasmic, but still appears to form foci. Is this still reflecting phase separation, or some form of aggregation?

      Response: TIP60 (K187R) mutant remains cytosolic with homogenous distribution as shown in Figure 2E. Also with TIP60 partners like PXR or p53, this mutant protein remains homogenously distributed in the cytosol. However, when co-expressed with TIP60 (Wild-type) protein, this mutant protein although still remain cytosolic some foci-like pattern is also observed at the nuclear periphery which we believe could be accumulated aggregates.

      Reviewer 2

      The manuscript "Autoacetylation-mediated phase separation of TIP60 is critical for its functions" by Dubey S. et al reported that the acetyltransferase TIP60 undergoes phase separation in vitro and cell nuclei. The intrinsically disordered region (IDR) of TIP60, particularly K187 within the IDR, is critical for phase separation and nuclear import. The authors showed that K187 is autoacetylated, which is important for TIP60 nuclear localization and activity on histone H4. The authors did several experiments to examine the function of K187R mutants including chromatin binding, oligomerization, phase separation, and nuclear foci formation. However, the physiological relevance of these experiments is not clear since TIP60 K187R mutants do not get into nuclei. The authors also functionally tested the cancer-derived R188P mutant, which mimics K187R in nuclear localization, disruption of wound healing, and DNA damage repair. However, similar to K187R, the R188P mutant is also deficient in nuclear import, and therefore, its defects cannot be directly attributed to the disruption of the phase separation property of TIP60. The main deficiency of the manuscript is the lack of support for the conclusion that "autoacetylation-mediated phase separation of TIP60 is critical for its functions".

      This study offers some intriguing observations. However, the evidence supporting the primary conclusion, specifically regarding the necessity of the intrinsically disordered region (IDR) and K187ac of TIP60 for its phase separation and function in cells, lacks sufficient support and warrants more scrutiny. Additionally, certain aspects of the experimental design are perplexing and lack controls to exclude alternative interpretations. The manuscript can benefit from additional editing and proofreading to improve clarity.

      Response: We understand the point raised by the reviewer, however we would like to draw his attention to the data where we clearly demonstrated that acetylation of lysine 187 within the IDR of TIP60 is required for its phase separation (Figure 2J). We would like to draw reviewer’s attention to other TIP60 mutants within IDR (R177H, R188H, K189R) which all enters the nucleus and make phase separated foci. Cancer-associated mutation at R188 behaves similarly because it also hampers TIP60 acetylation at the adjacent K187 residue. Our in vitro and in cellulo results clearly demonstrate that autoacetylation of TIP60 at K187 within its IDR is critical for multiple functions including its translocation inside the nucleus, its protein-protein interaction and oligomerization which are prerequisite for phase separation of TIP60.

      There are two putative NLS sequences (NLS #1 from aa145; NLS #2 from aa184) in TIP60, both of which are within the IDR. Deletion of the whole IDR is therefore expected to abolish the nuclear localization of TIP60. Since K187 is within NLS #2, the cytoplasmic localization of the IDR and K187R mutants may not be related to the ability of TIP60 to phase separation.

      Response: We are not disputing the presence of putative NLS within IDR region of TIP60, however our results through different mutations within IDR region (K76, K80, K148, K150, R177, R178, R188, K189) clearly demonstrate that only K187 residue acetylation is critical to shuttle TIP60 inside the nucleus while all other lysine mutants located within these putative NLS region exhibited no impact on TIP60’s nuclear shuttling. We have mentioned this in our discussion, that autoacetylation of TIP60’s K187 may induce local structural modifications in its IDR which is critical for translocating TIP60 inside the nucleus where it undergoes phase separation critical for its functions. A previous example of similar kind shows, acetylation of lysine within the NLS region of TyrRS by PCAF promote its nuclear localization (Cao X et al 2017, PNAS). IDR region (which also contains K187 site) is important for phase separation once the protein enters inside the nucleus. This could be the cell’s mechanism to prevent unwarranted action of TIP60 until it enters the nucleus and phase separate on chromatin at appropriate locations.

      The chromatin-binding activity of TIP60 depends on HAT activity, but not phase-separation (Fig 1I), (Fig 2B). How do the authors reconcile the fact that the K187R mutant is able to bind to chromatin with lower activity than the HAT mutant (Fig 2F, 2I)?

      Response: K187 acetylation is required for TIP60’s nuclear translocation but not critical for chromatin binding. When soluble fraction is prepared in fractionation experiment, nuclear membrane is disrupted and TIP60 (K187R) mutant has no longer hindrance in accessing the chromatin and thus can load on the chromatin (although not as efficient as Wild-type protein). For efficient chromatin binding auto-acetylation of other lysine residues in TIP60 is required which might be hampered due to reduced catalytic activity or not sufficient enough to maintain equilibrium with HDAC’s activity inside the nucleus. In case of K187R, the reduced auto-acetylation is captured when protein is the cytosol. During fractionation, once this mutant has access to chromatin, it might auto-acetylate other lysine residues critical for chromatin loading (remember catalytic domain is intact in this mutant). This is evident due to hyper auto-acetylation of Wild-type protein compared to K187R or HAT mutant proteins. We want to bring into notice that phase-separation occurs only after efficient chromatin loading of TIP60 that is the reason that under in-cellulo conditions, both K187R (which cannot enter the nucleus) and HAT mutant (which enters the nucleus but fails to efficiently binds onto the chromatin) fails to form phase separated nuclear punctate foci.

      The DIC images of phase separation in Fig 2I need to be improved. The image for K187R showed the irregular shape of the condensates, which suggests particles in solution or on the slide. The authors may need to use fluorescent-tagged TIP60 in the in vitro LLPS experiments.

      Response: We believe this comment is for figure 2J. The irregularly shaped condensates observed for TIP60 K187R are unique to the mutant protein and are not caused by particles on the slide. We would like to draw reviewer’s attention to supplementary figure S2A, where DIC images for TIP60 (Wild-type) protein tested under different protein and PEG8000 conditions are completely clear where protein did not made phase separated droplets ruling out the probability of particles in solution or slides.

      The authors mentioned that the HAT mutant of TIP60 does not phase separate, which needs to be included.

      Response: We have already added the image of RFP-TIP60 (HAT mutant) in supplementary Fig S4A (panel 2) in the manuscript.

      Related to Point 3, the HAT mutant that doesn't form punctate foci by itself, can incorporate into WT TIP60 (Fig 5A). In vitro LLPS assay for WT, HAT, and K187R mutants with or without acetylation should be included. WT and mutant TIP can be labelled with GFP and RFP, respectively.

      Response: We would like to draw reviewer’s attention towards our co-expression experiments performed in Figure 5 where Wild-type protein (both tagged and untagged condition) is able to phase separate and make punctate foci with co-expressed HAT mutant protein (with depleted autoacetylation capacity). We believe these in cellulo experiments are already able to answer the queries what reviewer is suggesting to acheive by in vitro experiments.

      Fig 3A and 3B showed that neither K187 mutant nor HAT mutant could oligomerize. If both experiments were conducted in the absence of in vitro acetylation, how do the authors reconcile these results?

      Response: We thank the reviewer for highlighting our oversight in omitting the mention of acetyl coenzyme A here. To induce acetylation under in vitro conditions, we have added 10 µM acetyl CoA into the reactions depicted in Figure 3A and 3B. The information for acetyl CoA for Figure 3B was already included in the GST-pull down assay (material and methods section). We will add the same in the oligomerization assay of material and methods in the revised manuscript.

      In Fig 4, the colocalization images showed little overlap between TIP60 and nuclear speckle (NS) marker SC35, indicating that the majority of TIP60 localized in the nuclear structure other than NS. Have the authors tried to perturbate the NS by depleting the NS scaffold protein and examining TIP60 foci formation? Do PXR and TP53 localize to NS?

      Response: Under normal conditions majority of TIP60 is not localized in nuclear speckles (NS) so we believe that perturbing NS will not have significant effect on TIP60 foci formation. Interestingly, recently a study by Shelly Burger group (Alexander KA et al Mol Cell. 2021 15;81(8):1666-1681) had shown that p53 localizes to NS to regulate subset of its targeted genes. We have mentioned about it in our discussion section. No information is available about localization of PXR in NS.

      Were TIP60 substrates, H4 (or NCP), PXR, TP53, present inTIP60 condensates in vitro? It's interesting to see both PXR and TP53 had homogenous nuclear signals when expressed together with K187R, R188P (Fig 6E, 6G), or HAT (Suppl Fig S4A) mutants. Are PXR or TP53 nuclear foci dependent on their acetylation by TIP60? This can and should be tested.

      Response: Both p53 and PXR are known to be acetylated by TIP60. In case of PXR, TIP60 acetylate PXR at lysine 170 and this TIP60-mediated acetylation of PXR at K170 is important for TIP60-PXR foci which now we know are formed by phase separation (Bakshi K et al Sci Rep. 2017 Jun 16;7(1):3635).

      Since R188P mutant, like K187R, does not get into the nuclei, it is not suitable to use this mutant to examine the functional relevance of phase separation for TIP60. The authors need to find another mutant in IDR that retains nuclear localization and overall HAT activity but specifically disrupts phase separation. Otherwise, the conclusion needs to be restated. All cancer-derived mutants need to be tested for LLPS in vitro.

      Response: We appreciate the reviewer’s point here, but it is important to note that the objective of these experiments is to understand the impact of K187R (critical in multiple aspects of TIP60 including phase separation) and R188P (a naturally occurring cancer-associated mutation and behaving similarly to K187R) on TIP60’s activities to determine their functional relevance. As suggested by the reviewer to test and find IDR mutant that fails to phase separate however retains nuclear localization and catalytic activity can be examined in future studies.

      For all cellular experiments, it is not mentioned whether endogenous TIP60 was removed and absent in the cell lines used in this study. It's important to clarify this point because the localization and function of mutant TIP60 are affected by WT TIP60 (Fig 5).

      Response: Endogenous TIP60 was present in in cellulo experiments, however as suggested by reviewer 1 we will perform some of the in cellulo experiments under endogenous TIP60 knockdown condition to validate our findings.

      It is troubling that H4 peptide is used for in vitro HAT assay since TIP60 has much higher activity on nucleosomes and its preferred substrates include H2A.

      Response: The purpose of using H4 peptide in the HAT assay is to determine the impact of mutations of TIP60’s catalytic activity. As H4 is one of the major histone substrate for TIP60, we believe it satisfy the objective of experiments.

      Reviewer 3

      This study presents results arguing that the mammalian acetyltransferase Tip60/KAT5 auto-acetylates itself on one specific lysine residue before the MYST domain, which in turn favors not only nuclear localization but also condensate formation on chromatin through LLPS. The authors further argue that this modification is responsible for the bulk of Tip60 autoacetylation and acetyltransferase activity towards histone H4. Finally, they suggest that it is required for association with txn factors and in vivo function in gene regulation and DNA damage response.

      These are very wide and important claims and, while some results are interesting and intriguing, there is not really close to enough work performed/data presented to support them. In addition, some results are redundant between them, lack consistency in the mutants analyzed, and show contradiction between them. The most important shortcoming of the study is the fact that every single experiment in cells was done in over-expressed conditions, from transiently transfected cells. It is well known that these conditions can lead to non-specific mass effects, cellular localization not reflecting native conditions, and disruption of native interactome. On that topic, it is quite striking that the authors completely ignore the fact that Tip60 is exclusively found as part of a stable large multi-subunit complex in vivo, with more than 15 different proteins. Thus, arguing for a single residue acetylation regulating condensate formation and most Tip60 functions while ignoring native conditions (and the fact that Tip60 cannot function outside its native complex) does not allow me to support this study.

      Response: We appreciate the reviewer’s point here, but it is important to note that the main purpose to use overexpression system in the study is to analyse the effect of different generated point/deletion mutations on TIP60. We have overexpressed proteins with different tags (GFP or RFP) or without tags (Figure 3C, Figure 5) to confirm the behaviour of protein which remains unperturbed due to presence of tags. To validate we have also examined localization of endogenous TIP60 protein which also depict similar localization behaviour as overexpressed protein. We would like to draw attention that there are several reports in literature where similar kind of overexpression system are used to determine functions of TIP60 and its mutants. Also nuclear foci pattern observed for TIP60 in our studies is also reported by several other groups.

      Sun, Y., et. al. (2005) A role for the Tip60 histone acetyltransferase in the acetylation and activation of ATM. Proc Natl Acad Sci U S A, 102(37):13182-7.

      Kim, C.-H. et al. (2015) ‘The chromodomain-containing histone acetyltransferase TIP60 acts as a code reader, recognizing the epigenetic codes for initiating transcription’, Bioscience, Biotechnology, and Biochemistry, 79(4), pp. 532–538.

      Wee, C. L. et al. (2014) ‘Nuclear Arc Interacts with the Histone Acetyltransferase Tip60 to Modify H4K12 Acetylation(1,2,3).’, eNeuro, 1(1). doi: 10.1523/ENEURO.0019-14.2014.

      However, as a caution and suggested by other reviewers also we will perform some of these overexpression experiments in absence of endogenous TIP60 by using 3’ UTR specific siRNA/shRNA.

      We thank the reviewer for his comment on muti-subunit complex proteins and we would like to expand our study by determining the interaction of some of the complex subunits with TIP60 ((Wild-type) that forms nuclear condensates), TIP60 ((HAT mutant) that enters the nucleus but do not form condensates) and TIP60 ((K187R) that do not enter the nucleus and do not form condensates). We will include the result of these experiments in the revised manuscript.

      • It is known that over-expression after transient transfection can lead to non-specific acetylation of lysines on the proteins, likely in part to protect from proteasome-mediated degradation. It is not clear whether the Kac sites targeted in the experiments are based on published/public data. In that sense, it is surprising that the K327R mutant does not behave like a HAT-dead mutant (which is what exactly?) or the K187R mutant as this site needs to be auto-acetylated to free the catalytic pocket, so essential for acetyltransferase activity like in all MYST-family HATs. In addition, the effect of K187R on the total acetyl-lysine signal of Tip60 is very surprising as this site does not seem to be a dominant one in public databases.

      Response: We have chosen autoacetylation sites based on previously published studies where LC-MS/MS and in vitro acetylation assays were used to identified autoacetylation sites in TIP60 which includes K187. We have already mentioned about it in the manuscript and have quoted the references (1. Yang, C., et al (2012). Function of the active site lysine autoacetylation in Tip60 catalysis. PloS one 7, e32886. 10.1371/journal.pone.0032886. 2. Yi, J., et al (2014). Regulation of histone acetyltransferase TIP60 function by histone deacetylase 3. The Journal of biological chemistry 289, 33878–33886. 10.1074/jbc.M114.575266.). We would like to emphasize that both these studies have identified K187 as autoacetylation site in TIP60. Since TIP60 HAT mutant (with significantly reduced catalytic activity) can also enter nucleus, it is not surprising that K327 could also enter the nucleus.

      • As the physiological relevance of the results is not clear, the mutants need to be analyzed at the native level of expression to study real functional effects on transcription and localization (ChIP/IF). It is not clear the claim that Tip60 forms nuclear foci/punctate signals at physiological levels is based on what. This is certainly debated because in part of the poor choice of antibodies available for IF analysis. In that sense, it is not clear which Ab is used in the Westerns. Endogenous Tip60 is known to be expressed in multiple isoforms from splice variants, the most dominant one being isoform 2 (PLIP) which lacks a big part (aa96-147) of the so-called IDR domain presented in the study. Does this major isoform behave the same?

      Response: TIP60 antibody used in the study is from Santa Cruz (Cat. No.- sc-166323). This antibody is widely used for TIP60 detection by several methods and has been cited in numerous publications. Cat. No. will be mentioned in the manuscript. Regarding isoforms, three isoforms are known for TIP60 among which isoform 2 is majorly expressed and used in our study. Isoform and 1 and 2 have same length of IDR (150 amino acids) while isoform 3 has IDR of 97 amino acids. Interestingly, the K187 is present in all the isoforms (already mentioned in the manuscript) and missing region (96-147 amino acid) in isoform 3 has less propensity for disordered region (marked in blue circle). This clearly shows that all the isoforms of TIP60 has the tendency to phase separate.

      Author response image 1.

      • It is extremely strange to show that the K187R mutant fails to get in the nuclei by cell imaging but remains chromatin-bound by fractionation... If K187 is auto-acetylated and required to enter the nucleus, why would a HAT-dead mutant not behave the same?

      Response: We would like to draw attention that both HAT mutant and K187R mutant are not completely catalytically dead. As our data shows both these mutants have catalytic activity although at significantly decreased levels. We believe that K187 acetylation is critical for TIP60 to enter the nucleus and once TIP60 shuttles inside the nucleus autoacetylation of other sites is required for efficient chromatin binding of TIP60. In fractionation assay, nuclear membrane is dissolved while preparing the soluble fraction so there is no hindrance for K187R mutant in accessing the chromatin. While in the case of HAT mutant, it can acetylate the K187 site and thus is able to enter the nucleus however this residual catalytic activity is either not able to autoacetylate other residues required for its efficient chromatin binding or to counter activities of HDAC’s deacetylating the TIP60.

      • If K187 acetylation is key to Tip60 function, it would be most logical (and classical) to test a K187Q acetyl-mimic substitution. In that sense, what happens with the R188Q mutant? That all goes back to the fact that this cluster of basic residues looks quite like an NLS.

      Response: As suggested we will generate acetylation mimicking mutant for K187 site and examine it. Result will be added in the revised manuscript.

      • The effect of the mutant on the TIP60 complex itself needs to be analyzed, e.g. for associated subunits like p400, ING3, TRRAP, Brd8...

      Response: As suggested we will examine the effect of mutations on TIP60 complex

    1. Author Response:

      Reviewer #1:

      Summary:

      This research study utilizes a realistic motoneuron model to explore the potential to trace back the appropriate levels of excitation, inhibition, and neuromodulation in the firing patterns of motoneurons observed in in-vitro and in-vivo experiments in mammals. The research employs high-performance computing power to achieve its objectives. The work introduces a new framework that enhances understanding of the neural inputs to motoneuron pools, thereby opening up new avenues for hypothesis testing research.

      Strengths: The significance of the study holds relevance for all neuroscientists. Motoneurons are a unique class of neurons with known distribution of outputs for a wide range of voluntary and involuntary motor commands, and their physiological function is precisely understood. More importantly, they can be recorded in-vivo using minimally invasive methods, and they are directly impacted by many neurodegenerative diseases at the spinal cord level. The computational framework developed in this research offers the potential to reverse engineer the synaptic input distribution when assessing motor unit activity in humans, which holds particular importance. Overall, the strength of the findings focuses on providing a novel framework for studying and understanding the inputs that govern motoneuron behavior, with broad applications in neuroscience and potential implications for understanding neurodegenerative diseases. It highlights the significance of the study for various research domains, making it valuable to the scientific community.

      Weaknesses: The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.

      We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      Reviewer #2:

      The study presents an extensive computational approach to identify the motor neuron input from the characteristics of single motor neuron discharge patterns during a ramp up/down contraction. This reverse engineering approach is relevant due to limitations in our ability to estimate this input experimentally. Using well-established models of single motor neurons, a (very) large number of simulations were performed that allowed identification of this relation. In this way, the results enable researchers to measure motor neuron behavior and from those results determine the underlying neural input scheme. Overall, the results are very convincing and represent an important step forward in understanding the neural strategies for controlling movement.

      Nevertheless, I would suggest that the authors consider the following recommendations to strengthen the message further. First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.

      We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1: Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Ratio. The summary plots are for the models showing highest 𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Ratio).

      Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (push-pull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree left unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?

      We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task. We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed. 

      See the results below. These results will be added and discussed in the revised manuscript.

      Author response image 1.

      (2) Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (3) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (4) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 2.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract – 

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction – 

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion – 

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72. 

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A.

      2007;104(36):14330–5. 

      (3) Dekker PJT, Keil P, Rassow J, Maarse AC, Pfanner N, Meijer M. Identification of MIM23, a putative component of the protein import machinery of the mitochondrial inner membrane. FEBS Lett. 1993;330(1):66–70. 

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from:

      https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 61-64):

      While collective movement has been extensively studied in various species, including insect swarming, fish schooling, and bird murmuration (Pitcher, Partridge and Wardle, 1976; Partridge, 1982; Strandburg-Peshkin et al., 2013; Pearce et al., 2014; Rosenthal, Twomey, Hartnett, Wu, Couzin, et al., 2015; Bastien and Romanczuk, 2020; Davidson et al., 2021; Aidan, Bleichman and Ayali, 2024), as well as in swarm robotics agents performing tasks such as coordinated navigation and maze-solving (Faria Dias et al., 2021; Youssefi and Rouhani, 2021; Cheraghi, Shahzad and Graffi, 2022), most studies have focused on movement algorithms , often assuming full detection of neighbors (Parrish and Edelstein-Keshet, 1999; Couzin et al., 2002, 2005; Sumpter et al., 2008; Nagy et al., 2010; Bialek et al., 2012; Gautrais et al., 2012; Attanasi et al., 2014). Some models have incorporated limited interaction rules where individuals respond to one or a few neighbors due to sensory constraints (Bode, Franks and Wood, 2011; Jhawar et al., 2020). However, fewer studies explicitly examine how sensory interference, occlusion, and noise shape decision-making in collective systems (Rosenthal et al., 2015).

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      · Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      · Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      · Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 87-94 and 329-330). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      · Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      · Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion.

      We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me.

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight.

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats (Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 460-465.

      If so, what is the difference between phi_target and phi_tx in the model equations?

      represents the angle between the bat and the reflected object (target).

      the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 467-468). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      Author response image 1.

      What is a bat's response to colliding with a conspecific (rather than a wall)?

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldstein et al., 2024).Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both?

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 274-275):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials.

      We clarified in the revised text (Lines 534-535 in Statistical Analysis)

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the answers below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation?

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on well-documented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 430-447).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect?

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma M.), we also have empirical recordings of individuals flying under similar conditions (Goldstein et al., 2024). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities.

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (sell Lines 447-449 in Methods).

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filter bank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003). 

      We have now explicitly highlighted this in the revised version (see Lines 468-470).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation.

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming.

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.

      The reviewer is correct. Indeed, integration over multiple calls improves signal-to-noise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem?

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 518-523 in the revied version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach.

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      · Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m, as observed in Myotis grisescens and Tadarida brasiliensis (Fujioka et al., 2021; Sabol and Hudson, 1995; Betke et al., 2008; Gillam et al, 2010)

      · Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable: (see Methods lines 407-412)

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem.

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler and Bioscience, 2001‏; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 4: The impact of confusion on performance, and lines 345-355 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines XX in the manuscript for further discussion.

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to ensure coherent flight trajectories while maintaining a reasonable collision rate. These distances provide a balance between maneuverability and stability, preventing erratic flight patterns while still enabling effective obstacle avoidance. In the revised paper, we have added supplementary figures illustrating the effect of model parameters on performance, specifically focusing on the avoidance distance.

      The 15-second exit limit was determined as described in the text (Lines 403-404): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer—measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, Such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions? Does it include masking, no masking, or which species?

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss et al., 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking.

      We have revised the text to clarify these details see, lines 466.

      References:

      Aidan, Y., Bleichman, I. and Ayali, A. (2024) ‘Pausing to swarm: locust intermittent motion is instrumental for swarming-related visual processing’, Biology letters, 20(2), p. 20230468. Available at: https://doi.org/10.1098/rsbl.2023.0468.

      Attanasi, A. et al. (2014) ‘Collective Behaviour without Collective Order in Wild Swarms of Midges’. Edited by T. Vicsek, 10(7). Available at: https://doi.org/10.1371/journal.pcbi.1003697.

      Bastien, R. and Romanczuk, P. (2020) ‘A model of collective behavior based purely on vision’, Science Advances, 6(6). Available at: https://doi.org/10.1126/sciadv.aay0792.

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Bialek, W. et al. (2012) ‘Statistical mechanics for natural flocks of birds’, Proceedings of the National Academy of Sciences, 109(13), pp. 4786–4791. Available at: https://doi.org/10.1073/PNAS.1118633109.

      Bode, N.W.F., Franks, D.W. and Wood, A.J. (2011) ‘Limited interactions in flocks: Relating model simulations to empirical data’, Journal of the Royal Society Interface, 8(55), pp. 301–304. Available at: https://doi.org/10.1098/RSIF.2010.0397.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Cheraghi, A.R., Shahzad, S. and Graffi, K. (2022) ‘Past, Present, and Future of Swarm Robotics’, in Lecture Notes in Networks and Systems. Available at: https://doi.org/10.1007/978-3-030-82199-9_13.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Couzin, I.D. et al. (2002) ‘Collective Memory and Spatial Sorting in Animal Groups’, Journal of Theoretical Biology, 218(1), pp. 1–11. Available at: https://doi.org/10.1006/jtbi.2002.3065.

      Couzin, I.D. et al. (2005) ‘Effective leadership and decision-making in animal groups on the move’, Nature, 433(7025), pp. 513–516. Available at: https://doi.org/10.1038/nature03236.

      Davidson, J.D. et al. (2021) ‘Collective detection based on visual information in animal groups’, Journal of the Royal Society, 18(180), p. 2021.02.18.431380. Available at: https://doi.org/10.1098/rsif.2021.0142.

      Faria Dias, P.G. et al. (2021) ‘Swarm robotics: A perspective on the latest reviewed concepts and applications’, Sensors. Available at: https://doi.org/10.3390/s21062062.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gautrais, J. et al. (2012) ‘Deciphering Interactions in Moving Animal Groups’, PLOS Computational Biology, 8(9), p. e1002678. Available at: https://doi.org/10.1371/JOURNAL.PCBI.1002678.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldstein, A. et al. (2024) ‘Collective Sensing – On-Board Recordings Reveal How Bats Maneuver Under Severe 4 Acoustic Interference’, Under Review, pp. 1–25.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042.

      Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at: https://doi.org/10.1073/pnas.1006630107.

      Jhawar, J. et al. (2020) ‘Noise-induced schooling of fish’, Nature Physics 2020 16:4, 16(4), pp. 488–493. Available at: https://doi.org/10.1038/s41567-020-0787-y.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/1545-1542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469–478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Nagy, M. et al. (2010) ‘Hierarchical group dynamics in pigeon flocks’, Nature 2010 464:7290, 464(7290), pp. 890–893. Available at: https://doi.org/10.1038/nature08891.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Parrish, J.K. and Edelstein-Keshet, L. (1999) ‘Complexity, Pattern, and Evolutionary Trade-Offs in Animal Aggregation’, Science, 284(5411), pp. 99–101. Available at: https://doi.org/10.1126/SCIENCE.284.5411.99.

      Partridge, B.L. (1982) ‘The Structure and Function of Fish Schools’, 246(6), pp. 114–123. Available at: https://doi.org/10.2307/24966618.

      Pearce, D.J.G. et al. (2014) ‘Role of projection in the control of bird flocks’, Proceedings of the National Academy of Sciences of the United States of America, 111(29), pp. 10422–10426. Available at: https://doi.org/10.1073/pnas.1402202111.

      Pitcher, T.J., Partridge, B.L. and Wardle, C.S. (1976) ‘A blind fish can school’, Science, 194(4268), pp. 963–965. Available at: https://doi.org/10.1126/science.982056.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S., Couzin, I.D., et al. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/pnas.1420068112.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S. and Couzin, I.D. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/PNAS.1420068112/-/DCSUPPLEMENTAL/PNAS.1420068112.SAPP.PDF.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648–1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001‏, undefined (no date) ‘Echolocation by insect-eating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ‏’, academic.oup.com‏HU Schnitzler, EKV Kalko‏Bioscience, 2001‏•academic.oup.com‏ [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-642-69271-0_20.

      Strandburg-Peshkin, A. et al. (2013) ‘Visual sensory networks and effective information transfer in animal groups’, Current Biology. Cell Press. Available at: https://doi.org/10.1016/j.cub.2013.07.059.

      Sumpter, D.J.T. et al. (2008) ‘Consensus Decision Making by Fish’, Current Biology, 18(22), pp. 1773–1777. Available at: https://doi.org/10.1016/J.CUB.2008.09.064.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight‏’, cs-web.bu.edu‏ [Preprint]. Available at: https://cs-web.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491–8498. Available at: https://doi.org/10.1073/pnas.0703550105.

      Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Youssefi, K.A.R. and Rouhani, M. (2021) ‘Swarm intelligence based robotic search in unknown maze-like environments’, Expert Systems with Applications, 178. Available at: https://doi.org/10.1016/j.eswa.2021.114907.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. Author response:

      We thank the reviewers for their thorough evaluation and constructive feedback on our manuscript.

      We think that their valuable suggestions will strengthen the manuscript and help us clarify several important points.

      All reviewers acknowledged the importance of our theoretical results and network classification in making pattern formation analysis a more tractable problem. At the same time, they have also raised a number of important concerns that we shall carefully consider.

      A. A major clarification that the reviewers found important concerns the definition of non-trivial pattern transformations and its generalization to higher dimensions. In this regard, the reviewers’ comments are:

      Reviewer #1:

      (on non-trivial pattern transformations):

      (3) All modelling is confined to one spatial dimension, and the very definition of a "non-trivial" transformation is framed in terms of peak positions along a line, which clearly must be reformulated for higher dimensions. It's well-known that diffusions in 1, 2, and 3 dimensions are also dramatically different, so the relevance of the three-class taxonomy to real multicellular tissues remains unclear, or at least should be explained in more detail. Reviewer #2 (on non-trivial pattern transformations):

      (5) The definition of non-trivial pattern formation is provided only in the Supplementary Information, despite its central importance for interpreting the main results. It would significantly improve clarity if this definition were included and explained in the main text. Additionally, it remains unclear how the definition is consistently applied across the different initial conditions. In particular, the authors should clarify how slope-based measures are determined for both the random noise and sharp peak/step function initial states. Furthermore, the authors do not specify how the sign function is evaluated at zero. If the standard mathematical definition sgn(0)=0 is used, then even a simple widening of a peak could fulfill the criterion for nontrivial pattern transformation.

      We agree with Reviewer #2 that including a more detailed definition of non-trivial pattern transformation in the main text would enhance the clarity of the paper. The one-dimensional (1D) definition currently provided in the Supplementary Information was chosen because all computations presented therein involve exclusively one-dimensional patterns. However, we acknowledge that this definition, as it was, did not have a totally unambiguous generalization  to higher dimensions. Therefore, in a revised version of the manuscript, we will incorporate an expanded definition applicable to higher-dimensional cases.

      This general definition of a non-trivial pattern transformation should make no reference to the sign of spatial derivatives of either the initial or resulting patterns. Specifically, a pattern transformation is considered non-trivial if it satisfies the following criteria:

      - It is heterogeneous: The resulting pattern is heterogeneous in space.

      - It is rearranging: The arrangement of critical points (i.e. peaks, valleys and saddle points in a gene product concentration) along the domain in the resulting pattern of a gene product is different to the arrangement of critical points in its initial pattern. This includes the emergence of new critical points, the disappearance of existing ones, or the spatial displacement of critical points from one location to another.

      - It is non-replicating: The spatial arrangement of critical points in the pattern of one gene product must differ from that of any other upstream gene product.

      Nonetheless, our two initial patterns are spatially discontinuous functions: in homogeneous initial patterns, the white noise is discontinuous by definition; and for the spike and spike+homogeneous initial patterns, we use sharp spikes defined by the rectangular function, which is discontinuous at the spike boundaries. Therefore, the aforementioned definition should be supplemented with the following two ad hoc assumptions:

      - Homogeneous initial patterns do not comprise any critical point. White noise in this type of initial patterns represents small thermodynamic fluctuations around the steady state and, for the purpose of pattern transformation, this is equivalent to a constant concentration along the domain.

      - Spike and spike+homogeneous initial patterns each contain a single critical point located at the center of the spike. The sharp spikes, modeled using the rectangular function, serve as a theoretical idealization to facilitate mathematical analysis. Once diffusion begins to act, these sharp boundaries are smoothed into differentiable gradients, maintaining a unique critical point at the center of the initial spike, which is the most relevant information for pattern transformation.

      Finally, it is worth recalling that our gene network classification is fundamentally based on an analysis of the dispersion relation associated with the gene network, and the construction of this dispersion relation is independent of the spatial dimensionality of the domain (i.e. it does not require assuming any specific number of dimensions). The fact that the description of this dispersion relation was in the SI may have been non-ideal for the understandability of the article and will, consequently, be moved to the main text in an upcoming version of the article. Thus, the gene networks that can lead to pattern transformation are the same in 1D, 2D or 3D. As for the resulting patterns, the broad description we provide also applies to any number of dimensions; these would be periodic, non periodic as in the amplified noise patterns or non periodic as in the hierarchic networks. For the latter notice that, except for boundary effects that we later discuss, the spike initial condition is radially symmetric and thus, the patterns resulting from it will also be radially symmetric. We will make this point more explicit in a revised version of the article, especially since, as suggested, this important portion of the Supplementary Information will be incorporated into the main text.

      Reviewer 2 suggests that with our definition of non-trivial pattern transformation, the simple widening of a concentration peak would constitute a non-trivial pattern transformation. This is not the case, as already shown in the figures as a example, since in a widening there is no change in the position of the critical point. A different situation applies if a wide and completely flat concentration peak (i.e. a plateau) forms. As we will explain in the coming version this is not possible because of requirement R5.

      We think that this clarification of the definition of non-trivial pattern transformation will also help clarify the next point (B below) since it would make it clearer that this article does not intend to explain which specific resulting pattern would arise from any given gene network.

      B. The main concern among these relates to the validity of our linearization of the model equations and the extension of the results obtained for the linear system to the fully nonlinear system. In this regard, the reviewers’ comments are:

      Reviewer #1:

      (on linearization):

      (2) A central step in the model formulation is the linearisation of the reaction term around a homogeneous steady state; higher-order kinetics, including ubiquitous bimolecular sinks such as A + B → AB, are simply collapsed into the Jacobian without any stated amplitude bound on the perturbations. Because the manuscript never analyses how far this assumption can be relaxed, the robustness of the three-class taxonomy under realistic nonlinear reactions or large spike amplitudes remains uncertain.

      Reviewer #2:

      (on linearization):

      (2) Most of the proofs presented in the Supplementary Information rely on linearized versions of the governing equations, and it remains unclear how these results extend to the fully nonlinear system. We are concerned that the generality of the conclusions drawn from the linear analysis may be overstated in the main text. For example, in Section S3, the authors introduce the concept of dynamic equivalence of transitive chains (Proposition S3.1) and intracellular transitive M-branching (Proposition S3.2), which pertains to the system's steady-state behavior. However, the proof is based solely on the linearized equations, without additional justification for why the result should hold in the presence of nonlinearities. Moreover, the linearized system is used to analyze the response to a "spike initial pattern of arbitrary height C" (SI Chapter S5.1), yet it is not clear how conclusions derived from the linear regime can be valid for large perturbations, where nonlinear effects are expected to play a significant role. We encourage the authors to clarify the assumptions under which the linearized analysis remains valid and to discuss the potential limitations of applying these results to the nonlinear regime.

      In this article, we address two main questions: first, which gene network topologies can give rise to non-trivial pattern transformations; and second, which broad types of resulting patterns can these gene network topologies give rise to resulting pattern. Thus, we are not intending to explain which exact resulting patterns would arise from any given gene network (i.e. a gene network topology with specific functions and interaction strengths or weights), a question for which non-linearities do indeed matter.

      For most known gene regulatory networks, available empirical information is typically limited to the nature of gene product regulations -indicating whether they act as activators or inhibitors- while details about the specific functional form of these regulations are rare. For instance, given two gene products, i and j, the network may indicate that i acts as an activator of j, implying that the concentration of j increases with that of i. However, this increase could follow a variety of functional forms: it may be quadratic (e.g., ), cubic (e.g., ), or any other function f j(gi). As we explain in the description of our model, we restrict our study to functions with a monotonicity constraint: higher concentrations of i lead to increased production of j (i.e., ).  In other words, a given gene interaction is always inhibitory or activatory, it does not change of sign. This monotonicity constraint corresponds to requirement (R5) in our main text. This requirement it is based on the biologically plausible idea that the complexity of gene regulation in development stems more from the topology of gene networks than from the complexity of the regulation by which a gene product may regulate another (i.e. we use simple monotonic functions).

      Question 1: A critical part to understand question 1 is in the dispersion relation that was explained in SI. From the reviewers’ comments it is clear that having this crucial part in the main text of an upcoming version of the article would improve understandability, specially for question 1.

      In brief, any pattern transformation requires the initial pattern to change. The trigger of such change is a change in the concentration of some gene product, either conceptualized as a noise fluctuation (in the homogeneous initial pattern) or a regulated change in a specific point (in the spike initial pattern). Mathematically, both can be conceptualized as perturbations and, for pattern transformation to be possible, such perturbation should grow so that the initial pattern becomes unstable and can change to another resulting pattern.

      If the perturbation is small, one can use the standard linear perturbation analysis in S6.2 of our Supplementary Information. In other words, the linear analysis is enough to ascertain if a small perturbation would grow or not. A gene network in which this will not happen would be unable to lead to pattern transformation, whichever the nonlinear part of f(g). In that sense, the linear approximation provides a necessary condition that any gene network needs to fulfill to lead to pattern transformation.

      However, the linear analysis would not ascertain whether a specific gene network will actually lead to pattern transformation (i.e., the condition is not sufficient). This, as well as the shape of the specific resulting pattern, may actually depend on the non-linear parts too. As we discuss, based on the dispersion relation, and other complementing arguments along the article, we can also get some insights on the possible patterns from the linear approximation alone (question 2). This arguments hold thanks to the imposition of requirements (R1-R5) on function f(g), which prevent strange behaviors stemming from the nonlinear part of the equation.

      The amplitude bound of perturbations mentioned by Reviewer #1 is addressed by requirements (R2) and (R4). Although the solution to the linear system predicts unbounded growth of unstable eigenmodes, the assume functions f(g) on which the nonlinear terms  eventually halt this growth, thereby ensuring the boundedness of solutions as imposed by (R4). This assumption on the nonlinear part is literally requirement R2 on f(g) in the main text.

      The transitive chains and branchings in section S3 of the Supplementary Information mentioned by the Reviewer #2 are topological properties of gene networks and therefore they influence only the linear part of the reaction-diffusion equations. This is why the proofs in that section are based on the linearized equations. We agree that clarifying this point in the text, as suggested by the reviewer, would improve the reader’s understanding of the section.

      Regarding Reviewer #2’s concerns about large perturbations, we acknowledge that the phrasing using “arbitrary height” may be confusing. For the homogeneous initial conditions these perturbations are assumed to be small because they are actually molecular noise (otherwise the initial condition could not be considered homogenous in the classical sense of developmental biology models). In the spike initial conditions in hierarchic networks the perturbation is not necessarily small. For the analysis provided in the SI we indeed assume that the perturbations are small enough for the linear approximation to be possible. Notice, however, that since these networks require an intracellular self-activating loop upstream of the first extracellular signal, the effective perturbation would rapidly grow to a value determined by such loop.

      In general the height of the initial spike does not affect the fact that hierarchic networks can lead to non-trivial pattern transformation. By definition these networks require the secretion of an extracellular signal from the cells in the spike (otherwise no change in gene product concentrations can occur over space). By definition this signal is not produced by any other cells and, thus, its concentration is governed by diffusion from the spike and its production in the cells in the spike. Thus, whichever the initial height of the spike and whichever the non-linearities in f(g), the signal’s concentration would decrease with the distance from the spike. As explained in the main text, this would lead to non-trivial pattern transformations if other general conditions are met. In general, the height of the initial perturbation can affect which specific pattern transformation would arise from a specific gene network but not which gene network topologies can lead to pattern transformation. This will be more clearly stated in an upcoming version of the article. C. In the following, we respond to the remaining concerns raised by the reviewers:

      Reviewer #1:

      (1) The Results section is difficult to follow. Key logical steps and network configurations are described shortly in prose, which constantly require the reader to address either SI or other parts of the text (see numerous links on the requirements R1-R5 listed at the beginning of the paper) to gain minimal understanding. As a result, a scientifically literate but non-specialist reader may struggle to grasp the argument with a reasonable time invested.

      We acknowledge that the current version of the main text may not be as clear as we intended. Initially, we believed that placing the more technical mathematical passages in the Supplementary Information would make the main text more accessible to readers. However, we agree with the reviewer that including some of these computations in the main text could improve clarity. We also believe that adding a summary table outlining all the model’s requirements would further contribute to that goal.

      Reviewer #2:

      (1) We have serious concerns regarding the validity of the simulation results presented in the manuscript. Rather than simulating the full nonlinear system described by Equation (1), the authors base their results on a truncated expansion (Equation S.8.2) that captures only the time evolution of small deviations around a spatially homogeneous steady state. However, it remains unclear how this reduced system is derived from the full equations specifically, which terms are retained or neglected and why- and how the expansion of the nonlinear function can be steady-state independent, as claimed. Additionally, in simulations involving the spike plus homogeneous initial condition, it is not evident -or, where equations are provided, it is not correct- that the assumed global homogeneous background actually corresponds to a steady state of the full dynamics. We elaborate on these concerns in the following:

      We believe there has been a misunderstanding regarding the presentation of the model equations (S8.2) used throughout our simulations. Accordingly, we agree that this relevant section of the Supplementary Information should be rewritten in a revised version of the manuscript to clarify this issue. Below, we address all the concerns raised by the reviewer.

      Equation (S8.2) represents the full nonlinear system described in Equation (1). While we recognize that the model may oversimplify real biological processes, its purpose is to illustrate our general statements about pattern formation rather than to capture any specific or detailed mechanism. In this context, model (S8.2) offers three key advantages for our goals: it allows rapid manipulation of gene network topology simply by modifying the matrix J, making it ideal for illustrating pattern formation across different network classes; it accommodates gene networks of arbitrary size -unlike other models, such as the classical Gierer-Meinhardt model, which are limited to two-element Turing or noise-amplifying networks-; and, due to the simplicity of its nonlinear terms, this model involves relatively few free parameters, facilitating the fine-tuning needed to identify parameter regions where non-trivial pattern transformations occur.

      Indeed, we find that the ability of model (S8.2) to illustrate our results despite having such simple nonlinear terms -bearing in mind that at least some nonlinearity is always necessary for selforganization- strongly supports the claim that the capacity of a gene network to produce pattern transformations is fully determined by the linear part of Equation (1). In this sense, nonlinear terms primarily influence the precise parameter values at which these transformations occur and contribute to shaping specific features of the resulting patterns.

      Model (S8.2) has been successfully employed in pattern formation studies elsewhere in the literature; accordingly, we provide relevant bibliographic references to support its widespread use.

      We believe the misunderstanding arises from our explanation of the biological interpretation of the model. As noted in the accompanying bibliography, the model is based on a general reactiondiffusion mechanism assuming the existence of a steady state. However, this conceptual reactiondiffusion framework is not the same as our Equation (1); rather, it was introduced by the original proponents of the model in the seminal paper cited in our text. In this context, Equation (S8.2) describes small concentration perturbations around that steady state, where the variables represent deviations in concentration relative to the general steady state.

      The aforementioned general steady state corresponds to the trivial equilibrium point g≡0 in equations (S8.2). Consequently, all our simulations based on model (S8.2) start from this steady state, to which we add white noise to generate homogeneous initial patterns or a sharp spike for the two types of spike initial patterns.

      It is also worth noting that Equations (S8.2) represent a non-dimensional model.

      It is assumed that the homogeneous steady states are given by g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i, independently of the specific network structure. However, the basis for this assumption is unclear, especially since some of the functions do not satisfy this condition -for example, f5 as defined below Eq. S8.10.5. Moreover, if g_i=c_i does not correspond to a true steady state, then the time evolution of deviations from this state is not correctly described by Eq. S8.2, as the zeroth-order terms do not vanish in that case.

      From the explanations above, it is important to distinguish two scales in the process: the scale of small perturbations, where equations (S8.2) apply; and the global scale, where the conceptual general reaction-diffusion system operates. Since the specific form of this general system does not affect equations (S8.2), we assume that it follows any of the models cited in the text, which yield a non-zero steady state at .

      In this sense, Equation (S8.2) represent a small concentration deviation of such global system and g(t ,x) is a relative concentration where g≡0 represents the steady-state at are concentrations above , and g<0 are concentrations below .

      As previously mentioned, simulations are performed using Equations (S8.2) on the basis of the equilibrium point g≡0. The result of these simulations is then superimposed on the non-zero steady state and presented in the figures along the article.

      Using the full model instead of the simplified Equations (S8.2) may result in slightly different resulting patterns, but it does not affect the gene network’s ability to produce pattern transformations, nor does it alter the main structural properties of the patterns—for example, the periodic nature of patterns generated by Turing networks.

      Additionally, the equations used contain only linear terms and a cubic degradation term for each species g_i, while neglecting all quadratic terms and cubic terms involving cross-species interactions (i≠j). An explanation for this selective truncation is not provided, and without knowledge of the full equation (f), it is impossible to assess whether this expansion is mathematically justified. If, as suggested in the Supplementary Information, the linear and cubic terms are derived from f, then at the very least, the Jacobian matrix should depend on the background steady-state concentration. However, the equations for the small deviation around a steady state (including the Jacobian matrix) used in the simulations appear to be independent of the particular steady state concentration.

      The Jacobian of Equation (S8.2) is independent of g because g represents a small perturbation around a steady state of a general reaction-diffusion system. Consequently, the matrix J corresponds to the Jacobian of the general system evaluated at that steady state. Evaluating the Jacobian of equations (S8.2) at the equilibrium point g≡0 -which represents the general steady state- recovers the matrix J.

      This is why we believe that the differences observed between the spike-only initial condition and the spike superimposed on a homogeneous background are not due to the initial conditions themselves, but rather result from a modified reaction scheme introduced through a questionable cutoff.

      "In simulations with spike initial patterns, the reference value g≡0 represents an actual concentration of 0 and therefore, we must add to (S8.2) a Heaviside function Φ acting of f (i.e., Φ(f(g))=f(g) if f(g)>0 , Φ(f(g))=0 if f(g){less than or equal to}0 ) to prevent the existence of negative concentrations for any gene product (i.e., g_i<0 for some i )." (SI chapter S8).

      This cutoff alters the dynamics (no inhibition) and introduces a different reaction scheme between the two simulations. The need for this correction may itself reflect either a problem in the original equations (which should fulfill the necessary conditions and prevent negative concentrations (R4 in main text)) or the inappropriateness of using an expanded approximation which assumes independence on the steady state concentration. It is already questionable if the linearized equations with a cubic degradation term are valid for the spike initial conditions (with different background concentration values), as the amplitude of this perturbation seems rather large.

      For homogeneous and spike+homogeneous initial conditions, we interpret equations (S8.2) as small perturbations around a non-zero steady state of a general reaction-diffusion system. For spike-only initial conditions, that steady state is zero. As we mention before, g≡0 will then represent such steady-state of zero concentration, g>0 are positive concentrations of the general system, and g<0 would represent unfeasible negative concentrations of the general system. Therefore, the use of a cutoff function to handle such initial conditions is justified. Moreover, this cutoff function is the same as the one employed in the reference general system cited in our paper.

      We acknowledge that the cutoff influences the simulations and accounts for the differences observed between spike and spike+homogeneous initial conditions. However, this distinction reflects what occurs in real biological systems, which is precisely why we differentiate these two types of initial states. For instance, the emergence of a periodic pattern in a noise-amplifying network depends critically on the formation of regions with concentrations below the steady state near the initial spike. Such regions can form in spike-plus-homogeneous initial patterns but not in spike-only initial patterns, where concentrations below the steady state would correspond to biologically unfeasible negative values.

      Lastly, we note that under the current simulation scheme, it is not possible to meaningfully assess criteria RH2a and RH2b, as they rely on nonlinear interactions that are absent from the implemented dynamics.

      It is explicitly stated in the relevant subsections of Section S7 in the Supplementary Information that, for the simulations involving RH2a and RH2b, the function f(g) in equation (S8.2) is modified by adding an ad hoc quadratic term to enable the assessment of these criteria.

      (3) Several statements in the main text are presented without accompanying proof or sufficient explanation, which makes it difficult to assess their validity. In some cases, the lack of justification raises serious doubts about whether the claims are generally true. Examples are:

      "For the purpose of clarity we will explain our results as if these cells have a simple arrangement in space (e.g., a 1D line or a 2D square lattice) but, as we will discuss, our results shall apply with the same logic to any distribution of cells in space." (Main text l.145-l.148).

      We believe that the confusion in this statement arises from the ambiguous use of the phrase “our results”. We will revise the text to provide a more precise description. Specifically, by “our results,” we refer to the conclusion that it is possible to determine whether a gene network leads to nontrivial pattern transformations based solely on its topology. This conclusion is independent of the dimensionality of space, as none of our arguments rely on assumptions specific to spatial dimensions. While one-dimensional examples are used for clarity and illustration, the underlying reasoning applies generally. In an improved version of the article, we will clarify this point explicitly and move relevant arguments from the Supplementary Information into the main text.

      Critically, our classification of gene networks is ultimately based on an argument concerning the dispersion relation associated with the network, and the construction of this dispersion relation is independent of the spatial dimensionality of the domain. In this sense, the networks identified in the text as capable of producing pattern transformations will be able to generate non-trivial pattern transformations in any spatial domain and in any number of dimensions. While the specific parameter values that permit such transformations may vary depending on the geometry and dimensionality of the domain, the existence of at least one such parameter set remains unaffected.

      The geometry of the domain can influence the specific form of the resulting patterns, but it does not alter the broader class of patterns (e.g., periodic patterns, peaks emerging around a spike, etc.) that a given gene network topology can produce. One such geometric influence, commonly observed in simulations, involves boundary effects. For example, structures such as peaks or rings forming near the boundaries may appear higher, broader, or spatially shifted compared to those arising in the central regions of the domain. However, we think a pattern consisting of a periodic train of peaks where only those near the boundary are slightly different can still be classified as a periodic pattern.

      "For any non-trivial pattern transformation (as long as it is symmetric around the initial spike), there exists an H gene network capable of producing it from a spike initial pattern." (Main text l.366f).

      A justification for this statement is provided shortly after the claim, although we acknowledge that the current explanation is somewhat cumbersome and would benefit from a clearer presentation in a revised version of the main text.

      A more detailed justification is provided in the Supplementary Information, based on three key ideas. First, any pattern (provided it is symmetric with respect to the initial spike) can be described as an arrangement of peaks with varying heights and spatial positions along a one-dimensional domain. Second, there exists a simple gene network—the diamond network—that, through parameter tuning, can produce two peaks of arbitrary height and symmetric position relative to the initial spike. Third, by placing multiple diamond networks positively upstream of a common gene product, that gene product can express peaks at each location where the upstream diamond networks induce them. Under mild additional conditions, this mechanism allows the formation of essentially any symmetric pattern. These mild conditions, along with a detailed analysis of the diamond network’s ability to generate peaks with controllable height and position, are discussed in the Supplementary Information.

      "In 2D there are no peaks but concentric rings of high gene product concentration centered around the spike, while in 3D there are concentric spherical shells." (Main text l. 447ff).

      This result pertains specifically to pattern transformations arising from spike initial patterns. As defined in the text, spike initial patterns are radially symmetric. Since diffusion preserves radial symmetry, pattern transformations from spike initial patterns in two or three dimensions reduce to effectively one-dimensional transformations along each radial direction. In this framework, each pair of concentration peaks symmetric with respect to the spike in one dimension corresponds to a ring surrounding the spike in two dimensions, and each ring in two dimensions becomes a hollow spherical shell around the spike in three dimensions.

      We agree that including a brief section in the Supplementary Information to clarify these subtleties would be helpful for readers to better understand the generalization of certain patterns to higher dimensions.

      (4) The study identifies one-signal networks and examines how combinations of these structures can give rise to minimal pattern-forming subnetworks. However, the analysis of the combinations of these minimal pattern-forming subnetworks remains relatively brief, and the manuscript does not explore how the results might change if the subnetworks were combined in upstream and downstream configurations. In our view, it is not evident that all possible gene regulatory networks can be fully characterized by these categories, nor that the resulting patterns can be reliably predicted. Rather, the approach appears more suited to identifying which known subnetworks are present within a larger network, without necessarily capturing the full dynamics of more complex configurations.

      We acknowledge that our explanation regarding the combination of sub-networks was relatively brief, and we intend to address this in a revised version. Our argument that combining sub-networks does not produce qualitatively new types of pattern transformations -beyond those already described- is based on the dispersion relation. Although this relation was only detailed in the Supplementary Information, it is central to our argument and will therefore be moved to the main text. Below, we provide an outline of this argument:

      Our study identifies two distinct behaviors of the principal branch of the dispersion relation at large wavenumbers. Based on this, gene networks capable of pattern formation can be classified into two categories: networks of the first kind, where the real part of the principal branch diverges to infinity as the wavenumber increases; and networks of the second kind, where the real part of the principal branch converges to a positive finite value for large wavenumbers. Naturally this argument applies to any gene network irrespectively of which, or how many, sub-networks are used to built it.

      Any gene regulatory network capable of pattern formation falls into one of these two categories. We identified that networks of the first kind contain at least one Turing sub-network, whereas networks of the second kind include either an H sub-network or a noise-amplifying sub-network. In this way, the primary objective of our study -namely, achieving a topological classification of gene regulatory networks capable of pattern formation- is fulfilled. It is important to note that while the dispersion relation provides broad information about the possible resulting patterns a gene network topology can produce (e.g., periodic versus noisy), it does not specify the exact patterns that emerge for each particular set of parameter values.

      Finally, regarding the shape of the resulting patterns, Figure S10 in the Supplementary Information exemplifies the notion that the behavior of combined networks can be understood as a combination of the individual behaviors of each constituent sub-network (note that the contribution of each type of sub-network in the resulting pattern is readily distinguishable). Consequently, we focus our detailed analysis on the patterning properties of the fundamental classes.

      (6) The manuscript lacks a clear and detailed explanation of the underlying model and its assumptions. In particular, it is not well-defined what constitutes a "cell" in the context of the model, nor is it justified why spatial features of cells -such as their size or boundaries- can be neglected. Furthermore, the concept of the extracellular space in the one-dimensional model remains ambiguous, making it unclear which gene products are assumed to diffuse.

      The size of cells is ignored in our model because we assume that they are small enough with respect to the total size of the domain that the space continuous reaction-diffusion equation (equation (1) in the main text) holds. Conceptually, one could understand cells in our model each of the pieces in an even partition of the domain into small subdomains surrounding each position x. This is anyway the standard procedure in most models of pattern formation by reaction-diffusion in embryonic development.

      For extracellular signals, we assume that g(t ,x) corresponds to the concentration of the signal in the extracellular space surrounding the cell located at position x. The extracellular space is any fluid medium for which Fick Laws apply and, therfore, the Fickian diffusion term in equation (1) is valid.

      For intracellular gene products, we assume that g(t ,x) corresponds to the concentration of such gene product within the cell at position x (if the gene product in hand is a transcription factor, for example), or on its surface (if it is a membrane-bound receptor). When collapsed in the continuous equations there is not such difference between being strictly within the cell or on its boundary. The only important fact is that these gene products cannot diffuse.

      Regarding cell boundaries, let us consider an extracellular signal s that regulates a transcriptor factor i within cells (in our model, i is an intracellular gene product). Such regulation shall be mediated by a membrane-bound receptor, which corresponds to intracellular gene product j. In terms of the gene regulatory network this is sji. Cell boundary effects mentioned by the reviewer should be encapsulated in the specific functional form of the regulation function f(g), but they have no effect in the actual topology of the network. Consequently, they are out of the scope of this study: as we mentioned before, considering different non-linear terms for f(g) will affect the parameter range for which a gene network is capable of producing non-trivial pattern transformations, but not their overall ability to produce non-trivial pattern transformations (i.e., the existence of at least one choice of model parameters for which such transformations take place).

      Finally, we would like to once again express our sincere gratitude to all reviewers for their insightful and constructive feedback. We are confident that the thorough peer review process will significantly enhance both the clarity and depth of our work. We greatly value the detailed comments provided and will carefully incorporate them in the preparation of a revised manuscript, which we intend to submit in the coming months.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! We evaluated PLMGraph-Inter with the predicted monomers and analyzed the result in details (see the “Impact of the monomeric structure quality on contact prediction” section and Figure 3). To mimic the real cases, we even deliberately reduced the performance of AF2 by using reduced MSAs (see the 2nd paragraph in the ““Impact of the monomeric structure quality on contact prediction” section). We leave some of the results in the supplementary of the current manuscript (Table S2). We will move these results to the main text to emphasize the performance of PLMGraph-Inter with the predicted monomers in the revision.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! Yes! The performance of PLMGraph-Inter drops when the predicted monomers are used in the prediction. However, it is difficult to say which is a fairer comparison, Figure 6 or Figure S2, since AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native templates. We will provide the AFM confidence values of the AFM predictions in the revision.

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion! We would like to notify that AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native template.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number can be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets. In the revision, we will analyze the performance of AFM on targets released after the date cutoff of the AFM training set, but with which we cannot totally remove the redundancy between the training and the test sets of AFM.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. Our test set would have more overlaps with the training set of AFM V3, which is one reason that we think AFM V2 is more appropriate to be used in the comparison.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We agree with the reviewer that testing whether the model can keep its performance on targets with no templates (i.e. non-redundant in structure) is important. We will perform the analysis in the revision.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model. We will further clarity this in the revision.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We will revise the manuscript carefully to address the reviewer’s concerns.

      1. The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! Using different thresholds to reduce the redundancy between the test set and the training set is a very good suggestion, and we will perform the analysis in the revision. In the current version of the manuscript, the 40% sequence identity is used as the cutoff for many previous studies used this cutoff (e.g. the Recent-PDB-Multimers used in AlphaFold-Multimer (see: 7.8 Datasets in the AlphaFold-Multimer paper); the work of DSCRIPT: https://www.cell.com/action/showPdf?pii=S2405-4712%2821%2900333-1 (see: the PPI dataset paragraph in the METHODS DETAILS section of the STAR METHODS)). One reason for using the relatively higher threshold for PPI studies is that PPIs are generally not as conserved as protein monomers.

      We performed a preliminary analysis using different thresholds to remove redundancy when preparing this provisional response letter:

      Author response table 1.

      Table1. The performance of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using native structures(AlphaFold2 predicted structures).

      Method:

      To remove redundancy, we clustered 11096 sequences from the training set and test sets (HomoPDB, HeteroPDB) using MMSeq2 with different sequence identity threshold (40%, 30%, 20%, 10%) (the lowest cutoff for CD-HIT is 40%, so we switched to MMSeq2). Each sequence is then uniquely labeled by the cluster (e.g. cluster 0, cluster 1, …) to which it belongs, from which each PPI can be marked with a pair of clusters (e.g. cluster 0-cluster 1). The PPIs belonging to the same cluster pair (note: cluster n - cluster m and cluster n-cluster m were considered as the same pair) were considered as redundant. For each PPI in the test set, if the pair cluster it belongs to contains the PPI belonging to the training set, we remove that PPI from the test set.

      We will perform more detailed analyses in the revised manuscript.

      1. Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We will add this comparison in the revision.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We will perform such analysis in the revision.

    1. Author response:

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We appreciate the Editorial assessment on our paper’s strengths and novelty.  We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning.  Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      We thank the Reviewers for their comments and suggestions, prompting new analyses and additions that strengthened our report.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      We have previously showed that neural replay of MEG activity representing the practiced skill correlated with micro-offline gains during rest intervals of early learning, 1 consistent with the recent report that hippocampal ripples during these offline periods predict human motor sequence learning2.  However, decoding accuracy in our earlier work1 needed improvement.  Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses: 

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions. 

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while head position was not monitored online for this study, the head was restrained using an inflatable air bladder, and head position was assessed at the beginning and at the end of each recording. Head movement did not exceed 5mm between the beginning and end of each scan for all participants included in the study. Fourth, we agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. However, in order for any such correlations to meaningfully impact decoding performance, such head movements would need to: (A) be consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) systematically vary between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is extremely unlikely.

      Given the task design, a much more likely confound in our estimation would be the contribution of eye movement artefacts to the decoder performance (an issue appropriately raised by Reviewer #3 in the comments below). Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may move their eyes in a way that is systematically related to the task.  Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (or keyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (Overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts).

      In fact, inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. A similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued.  The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals. 1,3-5  Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known.  Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported6-11, and appears to be even more prominent during early fine motor skill learning in the non-dominant hand12,13.  The frontal regions identified in these studies are known to play crucial roles in executive control14, motor planning15, and working memory6,8,16-18 processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations6,8,16-18, in addition to working memory19. Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task.  We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We strongly disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular. To clarify, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications. One could also view this hybrid-space decoding approach as a spatial analogue to common time-frequency based analyses such as theta-gamma phase amplitude coupling (PAC), which combine information from two or more narrow-band spectral features derived from the same time-series data.

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (HybridAlt) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (HybridOrig). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± SD 7.03% for HybridOrig vs. 75.49% ± SD 7.17% for HybridAlt; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04) (Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. HybridAlt: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. HybridOrig:  Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that HybridOrig (the approach used in our manuscript) significantly outperforms the HybridAlt approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns.

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen. 

      We definitely agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated. This has been well documented in the MEG literature20,21 and is a particularly important confound to address in functional or effective connectivity analyses (not performed in the present study). In the present analysis, any correlation between adjacent voxels presents a multi-collinearity problem, which effectively reduces the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. - the effective dimensionality is still greater than 1), the intra-parcel spatial patterns could still meaningfully contribute to the decoder performance. Two specific results support this assertion.

      First, we obtained higher decoding accuracy with voxel-space features [74.51% (± SD 7.34%)] compared to parcel space features [68.77% (± SD 7.6%)] (Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel-space features.  Second, Individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding supports the Reviewer’s assertion that neighboring voxels express similar information, but also shows that the correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside in.

      Author response image 3.

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding.

       

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment. 

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics22,23 muscle activation patterns24 and temporal sequencing25 during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).  

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions". 

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these substantial shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans performing a similar sequence learning task showed that flexibility in brain network composition (i.e. – changes in brain region members displaying coordinated activity) is up-regulated in novel learning environments and explains differences in learning rates across individuals26.  This work supports our interpretation of the present study data, that brain networks engaged in sequential motor skills rapidly reconfigure during early learning.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning27,28. For example, reactivation events in the posterior parietal29 and medial prefrontal30,31 cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains32, including motor sequence learning1,33,34.  Further, synchronized interactions between MPFC and hippocampus are more prominent during early learning as opposed to later stages27,35,36, perhaps reflecting “redistribution of hippocampal memories to MPFC” 27.  MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning37. Consistently, coupling between hippocampus and MPFC has been shown during, and importantly immediately following (rest) initial memory encoding38,39.  Importantly, MPFC activity during initial memory encoding predicts subsequent recall40. Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” 28, also engaged in the development of an abstract representation of the sequence41.  In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” 42-44 required during early learning42-44. The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice45, all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding46,47.  Thus, several prefrontal and frontoparietal regions contributing to long term learning 48 are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning.  We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here. 

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power and neural replay density during inter-practice rest periods) to observed micro-offline gains49.

      Reviewer #2 (Public review): 

      Summary 

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond. <br /> Strengths 

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea. 

      Weaknesses 

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation. The issue can essentially be framed as a mixing problem. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Moreover, if the representation distance is largely driven by this mixing effect, it’s also possible that the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      We also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Overall, we do strongly agree with the Reviewer that the naturalistic, self-paced, generative task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the keyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study. 

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide some insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans.  This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider these specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study.  We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself. 

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the keyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses.  We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the keyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder.  Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the keyDown event (t0 = 0 ms).  We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window.  Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study.  Ongoing work in our lab, as pointed out above, is investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well. 

      The Reviewer suggests that the current data is not convincing enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last IndexOP5 and first IndexOP1 from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Author response image 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest period.

      Author response image 4.

      Distribution of individual subject correlation coefficients between contextualization changes occurring during practice or rest with  micro-online and micro-offline performance gains. Note that, the correlation distributions were significantly higher for the relationship between contextualization changes during rest and micro-offline gains than for contextualization changes during practice and either micro-online or offline gain.

      With respect to the second concern highlighted above, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the reviewed manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out.   When quantifying online changes in contextualization from the first IndexOP1 the last IndexOP5 keypress in the same trial we observed no learning-related trend (Author response image 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Author response image 6).

      Author response image 5.

      Trial by trial trend of offline (left panel) and online (middle and right panels) changes in contextualization. Offline changes in contextualization were assessed by calculating the distance between neural representations for the last IndexOP5 keypress in the previous trial and the first IndexOP1 keypress in the present trial. Two different approaches were used to characterize online contextualization changes. The analysis included in the reviewed manuscript (middle panel) calculated the distance between IndexOP1 and IndexOP5 for each correct sequence, which was then averaged across the trial. This approach is limited by the lack of control for the passage of time when making online versus offline comparisons. Thus, the second approach controlled for the passage of time by calculating distance between the representations associated with the first IndexOP1 keypress and the last IndexOP5 keypress within the same trial. Note that while the first approach showed an increase online contextualization trend with practice, the second approach did not.

      Author response image 6.

      Relationship between online contextualization and online learning is shown for both within-sequence (left; note that this is the online contextualization measure used in the reviewd manuscript) and across-sequence (right) distance calculation. There was no significant relationship between online learning and online contextualization regardless of the measurement approach.

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals. 

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. <br /> Strengths: 

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter). 

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?). 

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.  

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.  

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. –  3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space.  We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption. 

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions50. In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above and agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above replay to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would not address our experimental question: “do neural representations of the same action performed at different locations within a skill sequence contextually differentiate or remain stable as learning evolves”.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023). 

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial (which is pre-planned offline) is performed in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes.  The Reviewer is particularly concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. However, in contrast to the Reviewers stated argument above, findings from Korneysheva et. al (2019) showed that neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence.  Thus, mixing effects are likely still present for the first keypress in a trial. Also note that we now present new control analyses in multiple responses above confirming that hypothetical mixing effects between adjacent keypresses do not explain our reported contextualization finding. A statement addressing these possibilities raised by the Reviewer has been added to the Discussion in the revised manuscript.

      In relation to pre-planning, ongoing MEG work in our lab is investigating contextualization within different time windows tailored specifically for assessing how sequence skill action planning evolves with learning.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice).  It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable. 

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualization effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts in general on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement-related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. Notably, the minimal participant engagement with the visual task display observed in this study highlights an important difference between behavior observed during explicit sequence learning motor tasks (which is highly generative in nature) with reactive responses to stimulus cues in a serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when comparing findings across studies. All elements pertaining to this new control analysis are now included in the revised manuscript.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"? 

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differention” vs micro-online gains, (2) “online differention” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Author response images 4, 5 and 6 above). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      This statement is incorrect. The original Bonstrup et al (2019) 49 paper clearly states that micro-offline gains must be carefully interpreted based upon the behavioral context within which they are observed, and lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning.  In fact, the excellent meta-analysis of Pan & Rickard (2015) 51, which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study49, as well as in all our subsequent work. Pan & Rickard stated:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943). It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks52,53. Rickard, Cai, Rieth, Jones, and Ard (2008) and Brawn, Fenn, Nusbaum, and Margoliash (2010) 52,53 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008) massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard51 made several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They stated:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead 51. One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead 51. That design appears sufficient to eliminate at least the majority of the reactive inhibition effect 52,53.”

      We mindfully incorporated recommendations from Pan and Rickard51  into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects. 

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.”  The initial Bönstrup et al. (2019) 49 report was followed up by a large online crowd-sourcing study (Bönstrup et al., 2020) 54. This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 7 below for further details on these conditions).

      Author response image 7.

      Micro-offline gains observed in learning and non-learning contexts are attributed to different underlying causes. (A) Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from Bönstrup et al. (2019) 49. During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also 54). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature 55-57, argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning.  The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds.

      Evidence documented in that paper54 showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118);  3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) 54.  Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve Pan and Rickard51 refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects1. Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study1) linked to micro-offline gains during early skill learning. 33 These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice58. Third, even more recently, Chen et al. (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple events (which are known markers for neural replay59) in the hippocampus (80-120 Hz in humans) with micro-offline gains during early skill learning. The authors report that the strong increase in ripple rates tracked learning behavior, both across blocks and across participants. The authors conclude that hippocampal ripples during resting offline periods contribute to motor sequence learning. 2

      Thus, there is actually now substantial evidence in the literature directly supporting the assertion “that micro-offline gains really result from offline learning”.  On the contrary, according to Gupta & Rickard (2024) “…the mechanism underlying RI [reactive inhibition] is not well established” after over 80 years of investigation60, possibly due to the fact that “reactive inhibition” is a categorical description of behavioral effects that likely result from several heterogenous processes with very different underlying mechanisms.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). 

      It is important to point out that the recent work of Gupta & Rickard (2022,2024) 55 does not present any data that directly opposes our finding that early skill learning49 is expressed as micro-offline gains during rest breaks. These studies are essentially an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.  To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning. Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods. Again, we reported the same finding for trials following the early learning period in our original Bönstrup et al. (2019) paper49 (Author response image 7). Also, please note that we reported in this paper that cumulative micro-offline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later49 (see the Results section and further elaboration in the Discussion). Thus, while the composition of our data is supportive of a short-term memory consolidation process operating over several seconds during early learning, it likely differs from those involved over longer training times and offline periods, as assessed by Gupta & Rickard (2022).

      In the recent preprint from Das et al (2024) 61,  the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data.   The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”.  The study utilizes a spaced vs. massed practice group between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis. Crucially, the design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning1,33,49,54,57,58,62.  A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 8):

      Author response image 8.

      (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original Bönstrup et al. (2019) 49 paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report 49  (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) 49 is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range.

      First, participants in the original Bönstrup et al. study 49 experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 8).  Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.  

      Second, and perhaps most importantly, the actual intervention (i.e. – the difference in practice schedule between the Spaced and Massed groups) employed by Das et al. covers a very small fraction of the overall training session. Identical practice schedule segments for both the Spaced & Massed groups are indicated by the red shaded area in Author response image 8. Please note that these identical segments cover 94.84% of the Massed group training schedule and 88.01% of the Spaced group training schedule (since it has 60 seconds of additional rest). This means that the actual interventions cover less than 5% (for Massed) and 12% (for Spaced) of the total training session, which minimizes any chance of observing a difference between groups.

      Also note that the very beginning of the practice schedule (during which Figure R9 shows substantial learning is known to occur) is labeled in the Das et al. study as Test 1.  Test 1 encompasses the first 20 seconds of practice (alternatively viewed as the first two 10-second-long practice trials with no inter-practice rest). This is immediately followed by the Training 1 intervention, which is composed of only three 10-second-long practice trials (with 10-second inter-practice rest for the Spaced group and no inter-practice rest for the Massed group). Author response image 8 also shows that since there is no inter-practice rest after the third Training practice trial for the Spaced group, this third trial (for both Training 1 and 2) is actually a part of an identical practice schedule segment shared by both groups (Massed and Spaced), reducing the magnitude of the intervention even further.

      Moreover, we know from the original Bönstrup et al. (2019) paper49 that 46.57% of all overall group-level performance gains occurred between trials 2 and 5 for that study. Thus, Das et al. are limiting their designed intervention to a period covering less than half of the early learning range discussed in the literature, which again, minimizes any chance of observing an effect.

      This issue is amplified even further at Training 2 since skill learning prior to the long 5-minute break is retained, further constraining the performance range over these three trials. A related issue pertains to the trials labeled as Test 1 (trials 1-2) and Test 2 (trials 6-7) by Das et al. Again, we know from the original Bönstrup et al. paper 49 that 18.06% and 14.43% (32.49% total) of all overall group-level performance gains occurred during trials corresponding to Das et al Test 1 and Test 2, respectively. In other words, Das et al averaged skill performance over 20 seconds of practice at two time-points where dramatic skill improvements occur. Pan & Rickard (1995) previously showed that such averaging is known to inject artefacts into analyses of performance gains.

      Furthermore, the structure of the Test in Das et. al study appears to have an interference effect on the Spaced group performance after the training intervention.  This makes sense if you consider that the Spaced group is required to now perform the task in a Massed practice environment (i.e., two 10-second-long practice trials merged into one long trial), further blurring the true intervention effects. This effect is observable in Figure 1C,E of their pre-print. Specifically, while the Massed group continues to show an increase in performance during test relative to the last 10 seconds of practice during training, the Spaced group displays a marked decrease. This decrease is in stark contrast to the monotonic increases observed for both groups at all other time-points.

      Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (as opposed to after it has been removed) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized49. Extrapolation of this current framework to post-plateau performance periods, longer timespans, or non-learning situations (e.g. – the Non-repeating groups from Experiments 3 & 4 in Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      References

      (1) Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M. & Cohen, L. G. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep 35, 109193 (2021). https://doi.org:10.1016/j.celrep.2021.109193

      (2) Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H. & Staresina, B. P. Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680 (2024). https://doi.org:10.1101/2024.10.06.614680

      (3) Classen, J., Liepert, J., Wise, S. P., Hallett, M. & Cohen, L. G. Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol 79, 1117-1123 (1998).

      (4) Karni, A. et al. Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature 377, 155-158 (1995). https://doi.org:10.1038/377155a0

      (5) Kleim, J. A., Barbay, S. & Nudo, R. J. Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol 80, 3321-3325 (1998).

      (6) Shadmehr, R. & Holcomb, H. H. Neural correlates of motor memory consolidation. Science 277, 821-824 (1997).

      (7) Doyon, J. et al. Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A 99, 1017-1022 (2002).

      (8) Toni, I., Ramnani, N., Josephs, O., Ashburner, J. & Passingham, R. E. Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage 14, 1048-1057 (2001).

      (9) Grafton, S. T. et al. Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci 12, 2542-2548 (1992).

      (10) Kennerley, S. W., Sakai, K. & Rushworth, M. F. Organization of action sequences and the role of the pre-SMA. J Neurophysiol 91, 978-993 (2004). https://doi.org:10.1152/jn.00651.2003 00651.2003 [pii]

      (11) Hardwick, R. M., Rottschy, C., Miall, R. C. & Eickhoff, S. B. A quantitative meta-analysis and review of motor learning in the human brain. Neuroimage 67, 283-297 (2013). https://doi.org:10.1016/j.neuroimage.2012.11.020

      (12) Sawamura, D. et al. Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep 9, 20397 (2019). https://doi.org:10.1038/s41598-019-56956-0

      (13) Lee, S. H., Jin, S. H. & An, J. The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep 9, 14066 (2019). https://doi.org:10.1038/s41598-019-50644-9

      (14) Battaglia-Mayer, A. & Caminiti, R. Corticocortical Systems Underlying High-Order Motor Control. J Neurosci 39, 4404-4421 (2019). https://doi.org:10.1523/JNEUROSCI.2094-18.2019

      (15) Toni, I., Thoenissen, D. & Zilles, K. Movement preparation and motor intention. Neuroimage 14, S110-117 (2001). https://doi.org:10.1006/nimg.2001.0841

      (16) Wolpert, D. M., Goodbody, S. J. & Husain, M. Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci 1, 529-533 (1998). https://doi.org:10.1038/2245

      (17) Andersen, R. A. & Buneo, C. A. Intentional maps in posterior parietal cortex. Annu Rev Neurosci 25, 189-220 (2002). https://doi.org:10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      (18) Buneo, C. A. & Andersen, R. A. The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia 44, 2594-2606 (2006). https://doi.org:S0028-3932(05)00333-7 [pii] 10.1016/j.neuropsychologia.2005.10.011

      (19) Grover, S., Wen, W., Viswanathan, V., Gill, C. T. & Reinhart, R. M. G. Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci 25, 1237-1246 (2022). https://doi.org:10.1038/s41593-022-01132-3

      (20) Colclough, G. L. et al. How reliable are MEG resting-state connectivity metrics? Neuroimage 138, 284-293 (2016). https://doi.org:10.1016/j.neuroimage.2016.05.070

      (21) Colclough, G. L., Brookes, M. J., Smith, S. M. & Woolrich, M. W. A symmetric multivariate leakage correction for MEG connectomes. NeuroImage 117, 439-448 (2015). https://doi.org:10.1016/j.neuroimage.2015.03.071

      (22) Mollazadeh, M. et al. Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci 31, 15531-15543 (2011). https://doi.org:10.1523/JNEUROSCI.2999-11.2011

      (23) Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W. & Donoghue, J. P. Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol 105, 1603-1619 (2011). https://doi.org:10.1152/jn.00532.2010

      (24) Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J Neurophysiol 108, 18-24 (2012). https://doi.org:10.1152/jn.00832.2011

      (25) Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51-56 (2012). https://doi.org:10.1038/nature11129

      (26) Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A 108, 7641-7646 (2011). https://doi.org:10.1073/pnas.1018985108

      (27) Albouy, G., King, B. R., Maquet, P. & Doyon, J. Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus 23, 985-1004 (2013). https://doi.org:10.1002/hipo.22183

      (28) Albouy, G. et al. Neural correlates of performance variability during motor sequence acquisition. Neuroimage 60, 324-331 (2012). https://doi.org:10.1016/j.neuroimage.2011.12.049

      (29) Qin, Y. L., McNaughton, B. L., Skaggs, W. E. & Barnes, C. A. Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci 352, 1525-1533 (1997). https://doi.org:10.1098/rstb.1997.0139

      (30) Euston, D. R., Tatsuno, M. & McNaughton, B. L. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147-1150 (2007). https://doi.org:10.1126/science.1148979

      (31) Molle, M. & Born, J. Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron 61, 496-498 (2009). https://doi.org:S0896-6273(09)00122-6 [pii] 10.1016/j.neuron.2009.02.002

      (32) Frankland, P. W. & Bontempi, B. The organization of recent and remote memories. Nat Rev Neurosci 6, 119-130 (2005). https://doi.org:10.1038/nrn1607

      (33) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A 117, 23898-23903 (2020). https://doi.org:10.1073/pnas.2009576117

      (34) Albouy, G. et al. Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage 108, 423-434 (2015). https://doi.org:10.1016/j.neuroimage.2014.12.049

      (35) Gais, S. et al. Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A 104, 18778-18783 (2007). https://doi.org:0705454104 [pii] 10.1073/pnas.0705454104

      (36) Sterpenich, V. et al. Sleep promotes the neural reorganization of remote emotional memory. J Neurosci 29, 5143-5152 (2009). https://doi.org:10.1523/JNEUROSCI.0561-09.2009

      (37) Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057-1070 (2012). https://doi.org:10.1016/j.neuron.2012.12.002

      (38) van Kesteren, M. T., Fernandez, G., Norris, D. G. & Hermans, E. J. Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A 107, 7550-7555 (2010). https://doi.org:10.1073/pnas.0914892107

      (39) van Kesteren, M. T., Ruiter, D. J., Fernandez, G. & Henson, R. N. How schema and novelty augment memory formation. Trends Neurosci 35, 211-219 (2012). https://doi.org:10.1016/j.tins.2012.02.001

      (40) Wagner, A. D. et al. Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. Science (New York, N.Y.) 281, 1188-1191 (1998).

      (41) Ashe, J., Lungu, O. V., Basford, A. T. & Lu, X. Cortical control of motor sequences. Curr Opin Neurobiol 16, 213-221 (2006).

      (42) Hikosaka, O., Nakamura, K., Sakai, K. & Nakahara, H. Central mechanisms of motor skill learning. Curr Opin Neurobiol 12, 217-222 (2002).

      (43) Penhune, V. B. & Steele, C. J. Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res. 226, 579-591 (2012). https://doi.org:10.1016/j.bbr.2011.09.044

      (44) Doyon, J. et al. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioural brain research 199, 61-75 (2009). https://doi.org:10.1016/j.bbr.2008.11.012

      (45) Schendan, H. E., Searl, M. M., Melrose, R. J. & Stern, C. E. An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron 37, 1013-1025 (2003). https://doi.org:10.1016/s0896-6273(03)00123-5

      (46) Morris, R. G. M. Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. The European journal of neuroscience 23, 2829-2846 (2006). https://doi.org:10.1111/j.1460-9568.2006.04888.x

      (47) Tse, D. et al. Schemas and memory consolidation. Science 316, 76-82 (2007). https://doi.org:10.1126/science.1135935

      (48) Berlot, E., Popp, N. J. & Diedrichsen, J. A critical re-evaluation of fMRI signatures of motor sequence learning. Elife 9 (2020). https://doi.org:10.7554/eLife.55241

      (49) Bonstrup, M. et al. A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol 29, 1346-1351 e1344 (2019). https://doi.org:10.1016/j.cub.2019.02.049

      (50) Kornysheva, K. et al. Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron 101, 1166-1180 e1163 (2019). https://doi.org:10.1016/j.neuron.2019.01.018

      (51) Pan, S. C. & Rickard, T. C. Sleep and motor learning: Is there room for consolidation? Psychol Bull 141, 812-834 (2015). https://doi.org:10.1037/bul0000009

      (52) Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J. & Ard, M. C. Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn 34, 834-842 (2008). https://doi.org:10.1037/0278-7393.34.4.834

      53) Brawn, T. P., Fenn, K. M., Nusbaum, H. C. & Margoliash, D. Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci 30, 13977-13982 (2010). https://doi.org:10.1523/JNEUROSCI.3295-10.2010

      (54) Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N. & Cohen, L. G. Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn 5, 7 (2020). https://doi.org:10.1038/s41539-020-0066-9

      (55) Gupta, M. W. & Rickard, T. C. Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn 7, 25 (2022). https://doi.org:10.1038/s41539-022-00140-z

      (56) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proceedings of the National Academy of Sciences 117, 23898-23903 (2020).

      (57) Brooks, E., Wallis, S., Hendrikse, J. & Coxon, J. Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn 9, 23 (2024). https://doi.org:10.1038/s41539-024-00238-6

      (58) Deleglise, A. et al. Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex 33, 6120-6131 (2023). https://doi.org:10.1093/cercor/bhac489

      (59) Buzsaki, G. Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073-1188 (2015). https://doi.org:10.1002/hipo.22488

      (60) Gupta, M. W. & Rickard, T. C. Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep 14, 4661 (2024). https://doi.org:10.1038/s41598-024-52726-9

      (61) Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P. & Azanon, E. “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795 (2024). https://doi.org:10.1101/2024.07.11.602795

      (62) Mylonas, D. et al. Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci 44 (2024). https://doi.org:10.1523/JNEUROSCI.1839-23.2024

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others.

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction to proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/) showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically, but showed that ancient states often show more favourable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine. There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in PostLUCA than in LUCA, vs. Ancient (Table II) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Table I). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      Author response table 1.

      Amino acid fractional difference of all coenzymes at residue level

      Author response table 2.

      Amino acid fractional difference of all coenzymes

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleftalpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript.<br /> Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Table III and IV (below) show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      Author response table 3.

      Amino acid fractional difference of non-phosphate containing coenzymes

      Author response table 4.

      Amino acid fractional difference of non-phosphate containing coenzymes at residue level

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study. We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone. Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzymepeptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphatecontaining coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

    1. Author response:

      eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are only very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies which relates MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 1.5 years.

      We took care of the validity of our results with two measures; first, assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 38 individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential  neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022).

      In the present study, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022)) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we will add the recently published MRS quality assessment form to the supplementary materials. Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we will more clearly indicate that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject.

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences.  

      In the revised manuscript, we will discuss the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable.

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we will clearly indicate that the exploratory correlation analyses are reported to put forth hypotheses for future studies.

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to  be predominantly driven by an effect of chronological age.

      In the revised manuscript, we will add the linear regressions with age as a covariate included below, for the relationship between aperiodic intercept and Glx concentration in the CC group. 

      a. A linear regression was conducted within the CC group to predict the intercept during visual stimulation, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.82_, t_(2,7)=16.1_, 𝑝=0.0024._ Note that the coefficient for age was not significant, 𝛽=0.007, t(7)=0.82, 𝑝=0.439. The regression coefficients and their respective statistics are presented in Author response table 1.

      Author response table 1.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Visual Stimulation) in the CC group

      b. A linear regression was conducted to predict the intercept during eye opening at rest, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.842_, t_(2,7)=18.6,  𝑝=0.00159_._ Note that the coefficient for age was not significant, 𝛽=−0.005, t(7)=−0.90, 𝑝=0.400. The regression coefficients and their respective statistics are presented in Author response table 2.

      Author response table 2.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Eyes Open) in the CC group

      c. Given that the Glx coefficient is significant in both models and age does not significantly predict either outcome, it can be concluded that Glx independently predicts the intercept of the aperiodic intercept.

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we will improve the phrasing. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we will add this analysis to the supplementary material.

      Author response image 1.

      Aperiodic intercept (top) and slope (bottom) for congenital cataract-reversal (CC, red) and age-matched normally sighted control (SC, blue) individuals. Distributions of these parameters are displayed as violin plots for three conditions; at rest with eyes closed (EC), at rest with eyes open (EO) and during visual stimulation (LU). Aperiodic parameters were calculated across electrodes Fp1 and Fp2. Solid black lines indicate mean values, dotted black lines indicate median values. Coloured lines connect values of individual participants across conditions.

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration. However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature.

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz).  

      As in the discussion section and response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the  Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we will more clearly indicate in the discussion that these are possible post-hoc interpretations. We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such. The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we will detail the advantages and disadvantages of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. It is worth noting that our EEG results fully align  with those of a larger sample of CC individuals (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures, as predicted.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      We marked the correlation analyses as exploratory; note that we do not base most of our discussion on the results of these analyses. As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot.  Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights.  

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      In the revised manuscript, we will add permutation analyses for our outcomes, as well as multiple regression models investigating whether the variance in visual history might have driven these results. Note that in the supplementary materials (S6, S7), we have reported the correlations between visual history metrics and MRS/EEG outcomes.

      The alpha level used for the ANOVA models specified in the methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age and sex. Moreover, the controls were recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition, the ANOVA conducted on the EEG metrics was 2x3 because it had group (CC, SC) and condition (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the methods and figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and will upload the consistency report with the revised supplementary material.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we will cite those studies not already included in the introduction.

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/ sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects. There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. We will clarify our interpretation in the revised manuscript. Note that Ossandon et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range. We will allude to these findings in the revised manuscript.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in addition to monkey ECoG (Medel et al., 2020) (now published as (Medel et al., 2023)) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG. We will make more clear in the introduction of the revised manuscript that this metric is indirect.

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged . We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two channels, O1 and O2, neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023).

      In both published works, we did not consider frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations. The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used the cleanline.m function to remove line noise before filtering, and the group differences remained stable. We will report this analysis in the supplementary version of the revised manuscript. Further, both groups were measured in the same lab, making line noise as an account for the observed group effects highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition is below. Mean percentage of 6.25 long segments rejected in each group for the visual stimulation condition are also included, and will be added to the revised manuscript:

      Author response table 3.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This will be explicitly stated in the revised manuscript.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values.  Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023); The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group. We will add the fit quality metrics and show individual subjects’ fits in the revised manuscript.

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey, which was released in 2020 and uses linear combination modeling to fit the peak as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited spectral analysis toolbox at the time, and still is widely used. In the revised manuscript, we will include a supplementary section reanalyzing the main findings with Osprey.

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript we will remove the statement regarding stability, and add the Blant-Altman plot.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we will rephrase this sentence accordingly. 

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We indicate clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the discussion section as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revision, we will check that speculations are clearly marked and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript we will add “might imply” to better indicate the hypothetical character of this idea.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we will add a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandon et al. We will adapt the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandon et al. (2023).

      References

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Medel, V., Irani, M., Ossandón, T., & Boncompte, G. (2020). Complexity and 1/f slope jointly reflect cortical states across different E/I balances. bioRxiv, 2020.09.15.298497. https://doi.org/10.1101/2020.09.15.298497

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

    1. Author response:

      Reviewer #1 (Public review):

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, in cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Author response image 1 below). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      Author response image 1.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 2 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 2.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (7) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution. We have updated in the cryoEM table

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see figure; response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study reveals that TRPV1 signaling plays a key role in tympanic membrane (TM) healing by promoting macrophage recruitment and angiogenesis. Using a mouse TM perforation model, researchers found that blood-derived macrophages accumulated near the wound, driving angiogenesis and repair. TRPV1-expressing nerve fibers triggered neuroinflammatory responses, facilitating macrophage recruitment. Genetic Trpv1 mutation reduced macrophage infiltration, angiogenesis, and delayed healing. These findings suggest that targeting TRPV1 or stimulating sensory nerve fibers could enhance TM repair, improve blood flow, and prevent infections. This offers new therapeutic strategies for TM perforations and otitis media in clinical settings. This is an excellent and high-quality study that provides valuable insights into the mechanisms underlying TM wound healing.

      Strengths:

      The work is particularly important for elucidating the cellular and molecular processes involved in TM repair. However, there are several concerns about the current version.

      We sincerely thank Reviewer #1 for their time and effort in evaluating and improving our study. Below, we are pleased to address the Reviewer's concerns point by point.

      Weaknesses:

      Major concerns

      (1) The method of administration will be a critical factor when considering potential therapeutic strategies to promote TM healing. It would be beneficial if the authors could discuss possible delivery methods, such as topical application, transtympanic injection, or systemic administration, and their respective advantages and limitations for targeting TRPV1 signaling. For example, Dr. Kanemaru and his colleagues have proposed the use of Trafermin and Spongel to regenerate the eardrum.

      We are grateful to the reviewer for raising this important point. While the present study primarily focuses on the mechanistic role of TRPV1 in TM repair, we agree that the mode of therapeutic delivery will be pivotal in translating these findings into clinical practice. In response, we will expand the discussion to explore possible delivery methods—such as topical application, transtympanic injection, and systemic routes—along with their respective benefits and challenges. We will also cite the work by Dr. Kanemaru and colleagues as an example of how local delivery systems may facilitate TM regeneration.

      (2) The authors appear to have used surface imaging techniques to observe the TM. However, the TM consists of three distinct layers: the epithelial layer, the fibrous middle layer, and the inner mucosal layer. The authors should clarify whether the proposed mechanism involving TRPV1-mediated macrophage recruitment and angiogenesis is limited to the epithelial layer or if it extends to the deeper layers of the TM.

      We apologize for any confusion caused by our previous description. In our study, we utilized Z-stack confocal imaging to capture the full thickness of the TM, as illustrated in Author response image 1 (reconstructed from the acquired Z-sections). This imaging technique allowed us to encompass all three layers of the TM entirely. Each sample was imaged using a 10X objective on an Olympus fluorescence microscope. Given the conical shape and size of the TM, we imaged it in four quadrants, acquiring approximately 30 optical sections (with a 3 µm step) per region. Each acquired images were projected and exported using FV10ASW 4.2 Viewer, then stitched together using Photoshop. The resulting Z-stack projections enabled us to visualize the distribution of macrophages, angiogenesis, and the localization of nerve fibers throughout the TM. We will include this detailed methodology in our revision to clarify any potential confusion.

      Author response image 1.

      Representative confocal images showing one quadrant of the TM collected from collected from CSR1F<sup>EGFP</sup> bone marrow transplanted mouse at day 7 post-perforation. (A-B) 3D-rendered views from different angles reveal the close spatial relationship between CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) within the TM. (C) Cross-sectional view highlights the depth-wise distribution of CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) across the layered TM architecture. All images were processed using Imaris Viewer x64 (version 10.2.0).

      Minor concerns

      In Figure 8, the schematic illustration presents a coronal section of the TM. However, based on the data provided in the manuscript, it is unclear whether the authors directly obtained coronal images in their study. To enhance the clarity and impact of the schematic, it would be helpful to include representative images of coronal sections showing macrophage infiltration, angiogenesis, and nerve fiber distribution in the TM.

      As noted above, we utilized Z-stack confocal imaging to capture the full thickness of the TM, enabling us to visualize structures across all three layers. This approach ensured that all layers were included in our analysis. Due to the thin and curved nature of the TM, traditional cross-sectional imaging often struggles to clearly depict the spatial relationships between macrophages, blood vessels, and nerve fibers, especially at low magnification as shown in Author response image 2. In response to the reviewer's suggestion, we will include representative coronal images in the revised manuscript to better illustrate the distribution of these structures at higher magnification.

      Author response image 2.

      Confocal images of eardrum cross-sections collected at day 1 (A), 3 (B), and 7 (C) post perforation to demonstrate the wound healing processes.

      Reviewer #2 (Public review):

      Summary:

      This study examines the role of TRPV1 signaling in the recruitment of monocyte-derived macrophages and the promotion of angiogenesis during tympanic membrane (TM) wound healing. The authors use a combination of genetic mouse models, macrophage depletion, and transcriptomic approaches to suggest that neuronal TRPV1 activity contributes to macrophage-driven vascular responses necessary for tissue repair.

      Strengths:

      (1) The topic of neuroimmune interactions in tissue regeneration is of interest and underexplored in the context of the TM, which presents a unique model due to its anatomical features.

      (2) The use of reporter mice and bone marrow chimeras allows for some dissection of immune cell origin.

      (3) The authors incorporate transcriptomic data to contextualize inflammatory and angiogenic processes during wound healing.

      We sincerely thank Reviewer #2 for their time and effort in improving our study and recognizing its strengths. Below, we are pleased to address the reviewer's concerns point by point.

      Weaknesses:

      (1) The primary claims of the manuscript are not convincingly supported by the evidence presented. Most of the data are correlative in nature, and no direct mechanistic experiments are included to establish causality between TRPV1 signaling and macrophage recruitment or function.

      We appreciate Reviewer #2's perspective on the lack of molecular mechanisms linking TRPV1 signaling and macrophages. However, our data demonstrates that TRPV1 mutations significantly affect macrophage recruitment and angiogenesis. This initial study primarily focuses on the intriguing phenomenon of how sensory nerve fibers are involved in eardrum immunity and wound healing, an area that has not been clearly reported in the literature before. We believe that further research is necessary to explore this topic in greater depth.

      (2) Functional validation of key molecular players (such as Tac1 or Spp1) is lacking, and their roles are inferred primarily from gene expression data rather than experimentally tested.

      Although we have identified the TAC1 and SPP1 signals as potentially important for TM wound healing for the first time, we agree with the Reviewer's view regarding the lack of molecular mechanisms explored in this study. We have not yet tested the downstream signaling pathways, but we plan to investigate them in a series of future studies. As this is an early report, we will continue to explore these signals and their potential clinical applications based on our initial findings moving forward.

      (3) The reuse of publicly available scRNA-seq data is not sufficiently integrated or extended to yield new biological insights, and it remains largely descriptive.

      We appreciate Reviewer #2 for highlighting this point. Leveraging publicly available scRNA-seq databases and established analysis pipelines not only saves time and resources—my lab recently collected macrophages from the eardrums of postnatal P15 mice, with each trial requiring 20 eardrums from 10 animals to obtain a sufficient number of cells—but also allows researchers to build on previous work and focus on new biological questions without the need to repeat experiments. A prior study conducted by Dr. Tward and his team utilized scRNA-seq data to make initial discoveries related to eardrum wound healing, primarily focusing on epithelial cells rather than macrophages. We are building on their raw data to uncover new biological insights regarding macrophages, even though we have not yet tested the unidentified signals, which we believe will be valuable to our peers.

      (4) The macrophage depletion model (CX3CR1CreER; iDTR) lacks specificity, and possible off-target or systemic effects are not addressed.

      We agree with reviewer #2, although macrophage depletion model used in our study is a standard and well-used animal model (Shi, Hua et al. 2018), which has been used by many other laboratories, it is important to note that any macrophage depletion model may have potential issues. We will discuss this in our revision.

      (5) Several interpretations of the data appear overstated, particularly regarding the necessity of TRPV1 for monocyte recruitment and wound healing.

      We thank the reviewer for pointing this out. We will revise our manuscript where it is overstated accordingly.

      (6) Overall, the study appears to apply known concepts - namely, TRPV1-mediated neurogenic inflammation and macrophage-driven angiogenesis - to a new anatomical site without providing new mechanistic insight or advancing the field substantially.

      Although our study may not seem highly innovative at first glance, it reveals a previously unknown role of the TRPV1 pain signaling pathway in promoting eardrum healing for the first time. This healing process includes the recruitment of monocyte-derived macrophages and the formation of new blood vessels (angiogenesis). While this process has been documented in other organs, most research on macrophage-driven angiogenesis has been conducted using in vitro models, with very few studies demonstrating this process in vivo. Our findings could lead to new translational opportunities, especially considering that tympanic membrane perforation, along with damage-induced otitis media and conductive hearing loss, are common clinical issues affecting millions of people worldwide. Targeting TRPV1 signaling could enhance tympanic membrane immunity, improve blood circulation, promote the repair of damaged tympanic membranes, and ultimately prevent middle ear infections—an idea that has not been previously proposed.

      Overall:

      While the study addresses an interesting topic, the current version does not provide sufficiently strong or novel evidence to support its major conclusions. Additional mechanistic experiments and more rigorous validation would be necessary to substantiate the proposed model and clarify the relevance of the findings beyond this specific tissue context.

      We greatly thank the two reviewers for their helpful critiques to improve our study. We especially thank the Section Editors for their insightful and constructive comments on this initial study.

      References:

      Shi, J., L. Hua, D. Harmer, P. Li and G. Ren (2018). "Cre Driver Mice Targeting Macrophages." Methods Mol Biol 1784: 263-275.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      We test whether changing the ratio between the state and control weight matrices can generate the observed effect. As shown in Author response image 1 and Author response image 2, the cost function change cannot produce a reduced peak velocity/acceleration and their timing advance simultaneously, but a mass estimation change can. In other words, using mass underestimation alone can explain the two key findings, amplitude reduction and timing advance. Yes, we cannot exclude the possibility of a change in cost function on top of the mass underestimation, but the principle of Occam’s Razor would support to adhering to a simple explanation, i.e., using body mass underestimation to explain the key findings. We will include our exploration on possible changes in cost function in the revision (in the Supplemental Materials).

      Author response image 1.

      Simulation using an altered cost function with α = 3.0. Panels A, B, and E show simulated position, velocity, and acceleration profiles, respectively, for the three movement directions. Solid lines correspond to pre- and post-exposure conditions, while dashed lines represent the in-flight condition. Panels C and D display the peak velocity and its timing across the three phases (Pre, In, Post), and Panels F and G show the corresponding peak acceleration and its timing. Note, varying the cost function, while leading to reduced peak velocity/acceleration, leads to an erroneous prediction of delayed timing of peak velocity/acceleration.

      Author response image 2.

      Simulation results using a cost function with α = 0.3. The format is the same as in Author response image 1. Note, this ten-fold decrease in α, while finally getting the timing of peak velocity/acceleration right (advanced or reduced), leads to an erroneous prediction of increased peak velocity/acceleration.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Our paper does not aim to quantitatively reproduce human reaching movements in microgravity. We will make this more clearly in the revision.

      (1) The model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques, while the actual situation is that people move their finger across a touch screen. The two-link arm model assumes planar movements, but our participants move their hand on a table top without vertical support to constrain their movement in 2D.

      (2) Our study merely uses well-established (though simplified) models to qualitatively predict the overall behavioral patterns if mass underestimation is at play. For this purpose, the results are well in line with models’ qualitative predictions: we indeed confirm that key kinematic features (peak velocity and acceleration) follow the same ranking order of movement direction conditions as predicted.

      (3) Using model simulation to qualitatively predict human behavioral patterns is a common practice in motor control studies, prominent examples including the papers on optimal feedback control (Todorov, 2004 and 2005) and movement vigor (Shadmehr et al., 2016). In fact, our model was inspired by the model in the latter paper.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      We are happy to include exemplary speed and acceleration trajectories. One example subject’s detailed trajectories are shown below and will be included in the revision. The reduced and advanced velocity/acceleration peaks are visible in typical trials.

      Author response image 3.

      Hand speed profiles (upper panels), hand acceleration profiles (middle panels) and speed profiles of the primary submovements (lower panels) towards different directions from an example participant.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response: In brief, our simulations show that Coriolis and centripetal forces, despite having some directional anisotropy, only have small effects on predicted kinematics (see our responses to Reviewer 2). We will move descriptions of the model into the main text with more justifications for using a simple model.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response: Indeed, the percentage of submovements only increases slightly, but the more important change is that the IPI (the inter-peak interval between submovements) also increases at the same time. Moreover, it is the effect of IPI that significantly predicts the duration increase in our linear mixed model. We will highlight this fact in our revision to avoid confusion.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45{degree sign} condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45{degree sign}, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45{degree sign} (beyond its low effective mass). In such planar movements, 45{degree sign} often corresponds to a movement which is close to single-joint, whereas 90{degree sign} and 135{degree sign} involve multi-joint movements. If so, the increased proportion of submovements in 90{degree sign} and 135{degree sign} could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45{degree sign} direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      We agree that the effect of mass underestimation is less in the 45° direction than the other two directions, possibly related to its reliance on single-joint (elbow) as opposed to two-joints (elbow and shoulder) movements. Plus, movement correction using one joint is probably easier (as also suggested by another reviewer), this possibility will be further discussed in the revision. However, we find that our model simplification (excluding Coriolis and centripetal torques) does not affect our main conclusions at all. First, we performed a simple simulation and found that, under the current optimal hand trajectory, incorporating Coriolis and centripetal torques has only a limited impact on the resulting joint torques (see simulations in Author response image 4). One reason is that we used smaller movements than Hallerbach & Flash did. In addition, we applied an optimal feedback control model to a more realistic 2-joint arm configuration. Despite its simplicity, this model produced a speed profile consistent with our current predictions and made similar predictions regarding the effects of mass underestimation (Author response image 5). We will provide a more realistic 2-joint arm model muscle dynamics in the revision to improve the simulation further, but the message will be same: including or excluding Coriolis and centripetal torques will not affect the theoretical predictions about mass underestimation. Second, as the reviewer correctly pointed out, the mass (and its underestimation) also affects these two torque terms, thus its effect on kinematic measures is not affected much even with the full model.

      Author response image 4.

      Joint angles and joint torque of shoulder and elbow with simulated trajectories towards different directions. A. Shoulder (green) and elbow (blue) angles change with time for the 45° movement direction. B. Components of joint interaction torques at the shoulder. Solid line: net torque at the shoulder; dotted line: shoulder inertia torque; dashed line: shoulder Coriolis and centripetal torque. C. Same plot as B for the elbow joint. D–F. Coriolis and centripetal components in the full 360° workspace, beyond three movement directions (45°, 90°, and 135°). D. Net torque. E. Inertial torque. F. Combined Coriolis and centripetal torque. Note the polar plots of Coriolis/centripetal torques (F) have a scale that is two magnitudes smaller than that of inertial torque in our simulation. All torques were simulated with the optimal movement duration. Torques were squared and integrated over each trajectory.

      Author response image 5.

      Comparison between simulation results from the full model with the addition of Coriolis/centripetal torques (left) and the simplified model (right). The position profiles (top) and the corresponding speed profiles low) are shown. Solid lines are for normal mass estimation and dashed lines for mass underestimation in microgravity. The three colors represent three movement directions (dark red: 45°, red: 90°, yellow: 135°). The full model used a 2-link arm model without realistic muscle dynamics yet (will include in the formal revision) thus the speed profile is not smooth. Importantly, the full model also predict the same effect of mass underestimation, i.e., reduced peak velocity/acceleration and their timing advance.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response: Neuromuscular deconditioning is indeed a space or microgravity effect; thanks for bringing this up as we omitted the discussion of its possible contribution in the initial submission. However, muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). The handgrip strength decreases 5% to 15% after several months (Moosavi et al., 2021); shoulder and elbow muscles atrophy, though not directly measured, was estimated to be minimal (Shen et al., 2017). The muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Coriolis/centripetal torques does not affect the putative mass effect (as shown above simulations). The reviewer suggests that their poor coordination in microgravity might contribute to slowing down + more submovements. Poor coordination is an umbrella term for any motor control problems, and it can explain any microgravity effect. The feedforward control changes caused by mass underestimation can also be viewed as poor coordination. If we limit it as the coordination of the two joints or coordinating Coriolis/centripetal torques, we should expect to see some trajectory curvature changes in microgravity. However, we further analyzed our reaching trajectories and found no sign of curvature increase in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here. We will include descriptive statistics of these new analyses in our revision.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Moosavi, D., Wolovsky, D., Depompeis, A., Uher, D., Lennington, D., Bodden, R., & Garber, C. E. (2021). The effects of spaceflight microgravity on the musculoskeletal system of humans and animals, with an emphasis on exercise as a countermeasure: A systematic scoping review. Physiological Research, 70(2), 119.

      Shen, H., Lim, C., Schwartz, A. G., Andreev-Andrievskiy, A., Deymier, A. C., & Thomopoulos, S. (2017). Effects of spaceflight on the muscles of the murine shoulder. The FASEB Journal, 31(12), 5466.

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      We appreciate these very helpful suggestions about our model presentation. Indeed, our initial submission did not give detailed model descriptions in the main text, due to text limits for early submissions. We actually used a finite-horizon framework throughout, with a pre-specified duration derived from the utility model. In the revision, we will make that point clear, and we will also revise the Methods section to explicitly distinguish feedforward vs. feedback components, clarify the use of mass underestimation in both utility and control models, and update the equations accordingly.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the Control Center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. In fact, all the pre-, in-, and post-flight test sessions were administered by the same group of experimenters with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. We will include these experimental details and the rationales for emphasizing fast movements in the revision.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      We believe the differences between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. One of the most salient differences is that their adaptation is about incorporating microgravity in control for minimizing effort, while our adaptation is about rightfully perceiving body mass. We will elaborate on possible reasons for the lack of learning in the light of this previous study.

      We can elaborate on “sensory bias” and “fundamental constraint of the sensorimotor system”. If an inertial change is perceived (like an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching in tens of trials. In this case, sensory cues are veridical as they correctly inform about the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from correctly adapt. Our statement was too brief in the initial submission; we will expand it in the revision.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting and parameter estimation; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic properties (Figure 2 and Figure 3 as questioned) follow the same ranking order of directions as predicted.

      We thank the reviewer for pointing out the apparent discrepancy between model simulation and observed data. We will elaborate on the reasons behind the discrepancy in the revision.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1 and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10-15N). Thus, friction anisotropy is unlikely to explain our data.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We would argue shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study from Zhu and colleagues, a clear role for MED26 in mouse and human erythropoiesis is demonstrated that is also mapped to amino acids 88-480 of the human protein. The authors also show the unique expression of MED26 in later-stage erythropoiesis and propose transcriptional pausing and condensate formation mechanisms for MED26's role in promoting erythropoiesis. Despite the author's introductory claim that many questions regarding Pol II pausing in mammalian development remain unanswered, the importance of transcriptional pausing in erythropoiesis has actually already been demonstrated (Martell-Smart, et al. 2023, PMID: 37586368, which the authors notably did not cite in this manuscript). Here, the novelty and strength of this study is MED26 and its unique expression kinetics during erythroid development.

      Strengths:

      The widespread characterization of kinetics of mediator complex component expression throughout the erythropoietic timeline is excellent and shows the interesting divergence of MED26 expression pattern from many other mediator complex components. The genetic evidence in conditional knockout mice for erythropoiesis requiring MED26 is outstanding. These are completely new models from the investigators and are an impressive amount of work to have both EpoR-driven deletion and inducible deletion. The effect on red cell number is strong in both. The genetic over-expression experiments are also quite impressive, especially the investigators' structure-function mapping in primary cells. Overall the data is quite convincing regarding the genetic requirement for MED26. The authors should be commended for demonstrating this in multiple rigorous ways.

      Thank you for your positive feedback.

      Weaknesses:

      (1) The authors state that MED26 was nominated for study based on RNA-seq analysis of a prior published dataset. They do not however display any of that RNA-seq analysis with regards to Mediator complex subunits. While they do a good job showing protein-level analysis during erythropoiesis for several subunits, the RNA-seq analysis would allow them to show the developmental expression dynamics of all subunit members.

      Thank you for this helpful suggestion. While we did not originally nominate MED26 based on RNA-seq analysis, we have analyzed the transcript levels of Mediator complex subunits in our RNA-seq data across different stages of erythroid differentiation (Author response image 1). The results indicate that most Mediator subunits, including MED26, display decreased RNA expression over the course of differentiation, with the exception of MED25, as reported previously (Pope et al., Mol Cell Biol 2013. PMID: 23459945).

      Notably, our study is based on initial observations at the protein level, where we found that, unlike most other Mediator subunits that are downregulated during erythropoiesis, MED26 remains relatively abundant. Protein expression levels more directly reflect the combined influences of transcription, translation and degradation processes within cells, and are likely more closely related to biological functions in this context. It is possible that post-transcriptional regulation (such as m6A-mediated improvement of translational efficiency) or post-translational modifications (like escape from ubiquitination) could contribute to the sustained levels of MED26 protein, and this will be an interesting direction for future investigation.

      Author response image 1.

      Relative RNA expression of Mediator complex subunits during erythropoiesis in human CD34+ erythroid cultures. Different differentiation stages from HSPCs to late erythroblasts were identified using CD71 and CD235a markers, progressing sequentially as CD71-CD235a-, CD71+CD235a-, CD71+CD235a+, and CD71-CD235a+. Expression levels were presented as TPM (transcripts per million).

      (2) The authors use an EpoR Cre for red cell-specific MED26 deletion. However, other studies have now shown that the EpoR Cre can also lead to recombination in the macrophage lineage, which clouds some of the in vivo conclusions for erythroid specificity. That being said, the in vitro erythropoiesis experiments here are convincing that there is a major erythroid-intrinsic effect.

      Thank you for this insightful comment. We recognize that EpoR-Cre can drive recombination in both erythroid and macrophage lineages (Zhang et al., Blood 2021, PMID: 34098576). However, EpoR-Cre remains the most widely used Cre for studying erythroid lineage effects in the hematopoietic community. Numerous studies have employed EpoR-Cre for erythroid-specific gene knockout models (Pang et al, Mol Cell Biol 2021, PMID: 22566683; Santana-Codina et al., Haematologica 2019, PMID: 30630985; Xu et al., Science 2013, PMID: 21998251.).

      While a GYPA (CD235a)-Cre model with erythroid specificity has recently been developed (https://www.sciencedirect.com/science/article/pii/S0006497121029074), it has not yet been officially published. We look forward to utilizing the GYPA-Cre model for future studies. As you noted, our in vivo mouse model and primary human CD34+ erythroid differentiation system both demonstrate that MED26 is essential for erythropoiesis, suggesting that the regulatory effects of MED26 in our study are predominantly erythroid-intrinsic.

      (3) Te donor chimerism assessment of mice transplanted with MED26 knockout cells is a bit troubling. First, there are no staining controls shown and the full gating strategy is not shown. Furthermore, the authors use the CD45.1/CD45.2 system to differentiate between donor and recipient cells in erythroblasts. However, CD45 is not expressed from the CD235a+ stage of erythropoiesis onwards, so it is unclear how the authors are detecting essentially zero CD45-negative cells in the erythroblast compartment. This is quite odd and raises questions about the results. That being said, the red cell indices in the mice are the much more convincing data.

      Thank you for your careful and thorough feedback. We have now included negative staining controls (Author response image 2A, top). We agree that CD45 is typically not expressed in erythroid precursors in normal development. Prior studies have characterized BFU-E and CFU-E stages as c-Kit+CD45+Ter119−CD71low and c-Kit+CD45−Ter119−CD71high cells in fetal liver (Katiyar et al, Cells 2023, PMID: 37174702).

      However, our observations indicate that erythroid surface markers differ during hematopoiesis reconstitution following bone marrow transplantation.  We found that nearly all nucleated erythroid progenitors/precursors (Ter119+Hoechst+) express CD45 after hematopoiesis reconstitution (Author response image 2A, bottom).

      To validate our assay, we performed next-generation sequencing by first mixing mouse CD45.1 and CD45.2 total bone marrow cells at a 1:2 ratio. We then isolated nucleated erythroid progenitors/precursors (Ter119+Hoechst+) by FACS and sequenced the CD45 gene locus by targeted sequencing. The resulting CD45 allele distribution matched our initial mixing ratio, confirming the accuracy of our approach (Author response image 2B).

      Moreover, a recent study supports that reconstituted erythroid progenitors can indeed be distinguished by CD45 expression following bone marrow transplantation (He et al., Nature Aging 2024, PMID: 38632351. Extended Data Fig. 8). 

      In conclusion, our data indicate that newly formed erythroid progenitors/precursors post-transplant express CD45, enabling us to identify nucleated erythroid progenitors/precursors by Ter119+Hoechst+ and determine their origin using CD45.1 and CD45.2 markers.

      Author response image 2.

      Representative flow cytometry gating strategy of erythroid chimerism following mouse bone marrow transplantation. A. Gating strategy used in the erythroid chimerism assay. B. Targeted sequencing result of Ter119+Hoechst+ cells isolated by FACS. The cell sample was pre-mixed with 1/3 CD45.2 and 2/3 CD45.1 bone marrow cells. Ptprc is the gene locus for CD45.

      (4) The authors make heavy use of defining "erythroid gene" sets and "non-erythroid gene" sets, but it is unclear what those lists of genes actually are. This makes it hard to assess any claims made about erythroid and non-erythroid genes.

      Thank you for this helpful suggestion. We defined "erythroid genes" and "non-erythroid genes" based on RNA-seq data from Ludwig et al. (Cell Reports 2019. PMID: 31189107. Figure 2 and Table S1). Genes downregulated from stages k1 to k5 are classified as “non-erythroid genes,” while genes upregulated from stages k6 to k7 are classified as “erythroid genes.” We will add this description in the revised manuscript.

      (5) Overall the data regarding condensate formation is difficult to interpret and is the weakest part of this paper. It is also unclear how studies of in vitro condensate formation or studies in 293T or K562 cells can truly relate to highly specialized erythroid biology. This does not detract from the major findings regarding genetic requirements of MED26 in erythropoiesis.

      Thank you for the rigorous feedback. Assessing the condensate properties of MED26 protein in primary CD34+ erythroid cells or mouse models is indeed challenging. As is common in many condensate studies, we used in vitro assays and cellular assays in HEK293T and K562 cells to examine the biophysical properties (Figure S7), condensation formation capacity (Figure 5C and Figure S7C), key phase-separation regions of MED26 protein (Figure S6), and recruitment of pausing factors (Figure 6A-B) in live cells. We then conducted functional assays to demonstrate that the phase-separation region of MED26 can promote erythroid differentiation similarly to the full-length protein in the CD34+ system and K562 cells (Figure 5A). Specifically, overexpressing the MED26 phase-separation domain accelerates erythropoiesis in primary human erythroid culture, while deleting the Intrinsically Disordered Region (IDR) impairs MED26’s ability to form condensates and recruit PAF1 in K562 cells.

      In summary, we used HEK293T cells to study the biochemical and biophysical properties of MED26, and the primary CD34+ differentiation system to examine its developmental roles. Our findings support the conclusion that MED26-associated condensate formation promotes erythropoiesis.

      (6) For many figures, there are some panels where conclusions are drawn, but no statistical quantification of whether a difference is significant or not.

      Thank you for your thorough feedback. We have checked all figures for statistical quantification and added the relevant statistical analysis methods to the corresponding figure legends (Figure 2L and Figure S4C) to clarify the significance of the observed differences. The updated information will be incorporated into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al describes a novel role for MED26, a subunit of the Mediator complex, in erythroid development. The authors have discovered that MED26 promotes transcriptional pausing of RNA Pol II, by recruiting pausing-related factors.

      Strengths:

      This is a well-executed study. The authors have employed a range of cutting-edge and appropriate techniques to generate their data, including: CUT&Tag to profile chromatin changes and mediator complex distribution; nuclear run-on sequencing (PRO-seq) to study Pol II dynamics; knockout mice to determine the phenotype of MED26 perturbation in vivo; an ex vivo erythroid differentiation system to perform additional, important, biochemical and perturbation experiments; immunoprecipitation mass spectrometry (IP-MS); and the "optoDroplet" assay to study phase-separation and molecular condensates.

      This is a real highlight of the study. The authors have managed to generate a comprehensive picture by employing these multiple techniques. In doing so, they have also managed to provide greater molecular insight into the workings of the MEDIATOR complex, an important multi-protein complex that plays an important role in a range of biological contexts. The insights the authors have uncovered for different subunits in erythropoiesis will very likely have ramifications in many other settings, in both healthy biology and disease contexts.

      Thank you for your thoughtful summary and encouraging feedback.

      Weaknesses:

      There are almost no discernible weaknesses in the techniques used, nor the interpretation of the data. The IP-MS data was generated in HEK293 cells when it could have been performed in the human CD34+ HSPC system that they employed to generate a number of the other data. This would have been a more natural setting and would have enabled a more like-for-like comparison with the other data.

      Thank you for your positive feedback and insightful suggestions. We will perform validation of the immunoprecipitation results in CD34+ derived erythroid cells to further confirm our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to explore whether other subunits besides MED1 exert specific functions during the process of terminal erythropoiesis with global gene repression, and finally they demonstrated that MED26-enriched condensates drive erythropoiesis through modulating transcription pausing.

      Strengths:

      Through both in vitro and in vivo models, the authors showed that while MED1 and MED26 co-occupy a plethora of genes important for cell survival and proliferation at the HSPC stage, MED26 preferentially marks erythroid genes and recruits pausing-related factors for cell fate specification. Gradually, MED26 becomes the dominant factor in shaping the composition of transcription condensates and transforms the chromatin towards a repressive yet permissive state, achieving global transcription repression in erythropoiesis.

      Thank you for your positive summary and feedback.

      Weaknesses:

      In the in vitro model, the author only used CD34+ cell-derived erythropoiesis as the validation, which is relatively simple, and more in vitro erythropoiesis models need to be used to strengthen the conclusion.

      Thank you for your thoughtful suggestions. We have shown that MED26 promotes erythropoiesis using the primary human CD34+ differentiation system (Figure 2 K-M and Figure S4) and have demonstrated its essential role in erythropoiesis through multiple mouse models (Figure 2A-G and Figure S1-3). Together, these in vitro and in vivo results support our conclusion that MED26 regulates erythropoiesis. However, we are open to further validating our findings with additional in vitro erythropoiesis models, such as iPSC or HUDEP erythroid differentiation systems.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Author response image 1.

      Population genetic structure of the Eurasian Tree Sparrow (Passer montanus). The genetic structure generated using FRAPPE. The colors in each column represent the contribution from each subcluster (Qu et al. 2020). Yellow, highland population; blue, lowland population.

      Q4. Impact of the work There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598. Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      Author response image 2.

      The median expression levels of the conserved genes (i.e., coefficient of variance ≤ 0.3 and average TPM ≥ 1 for each sample) did not differ among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05).

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reinforcing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      Author response image 3.

      Figure 2a (left) and Figure 2b (right). Frequencies of genes with reinforcement and reversion plasticity (>50%) and their subsets that acquire strong support in the parametric bootstrap analyses (≥ 950/1000).

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      Author response image 4.

      (A) and Figure S4 (B). Frequencies of genes with reinforcement and reversion plasticity in the flight and cardiac muscle. (A) For genes identified by WGCNA, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for both the flight and cardiac msucles. (B) For genes that associated with muscle phentoypes, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for the flight muscle, while more than 50% of comparisons support an excess of genes with reversion plasticity for the cardiac muscle. Two-tailed binomial test, NS, non-significant; , P < 0.05; , P < 0.01; **, P < 0.001.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural, or endothelial). Using RNA-in situ hybridization and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.

      While the text is well written, it often elides details of experiments and relies on implicit understanding on the part of the reader. Similarly, the figure legends are laconic and often fail to provide all the relevant details.

      Thanks for this reviewer on his/her overall supports on our manuscript. We have now revised the manuscript text and figure legends making them to have all relevant details as much as we can. 

      Specific comments:

      (1) While the evidence from cKO's implicating glial betapix in vascular stability/angiogenesis is exciting, glia-specific rescue of betapix in the global KOs/mutants (like those performed for stathmin) would be necessary to make a water-tight case for glial betapix.

      We fully agree with the reviewer that it would be ideal to examine glia-specific rescue of betaPix in its global KOs. At the same time, it is difficult to achieve optimal transient expression of betaPix by injecting plasmid clone of gfap:betaPix while it takes long time to establish stable transgenic line gfap:betaPix for rescuing mutant phenotypes. We would like to pursue this line of researches in the future.

      (2) Splice variants of betapix have been shown to have differential roles in haemorrhaging (Liu, 2007). What are the major glial isoforms, and are there specific splice variants in the glial that contribute to the phenotypes described?

      We agree that it would be important to address whether any specific splice variants in glia contribute to betaPix mutant phenotypes. Previous studies have shown that the isoform a of betaPix is ubiquitously expressed across various tissues, while isoforms b, c, and d are predominantly expressed in the nervous system. In mice, the expression level of isoform betaPix-d is essential for the neurite outgrowth and migration. In the nervous system, we have not assessed glial specific betaPix isoforms directly. Our current data cannot rule out whether specific isoform is involved in its function in glial responses. The Zwitch cassette of betaPix resides on intron 5, thus disrupting all transcripts when Cre is activated. However, we are fully aware of the potential of identifying glial betaPix isoform with direct downstream targets. Further studies to dissect their roles in cerebral vascular development and diseases are part of our future plans.

      (3) Liu et al, 2012 demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis. Are there proliferation/survival defects in endothelial cells in the glial KOs?

      We thank the reviewer for highlighting endothelial cell phenotypes in betaPix mutants. We are aware of endothelial cells might directly link to the mutant defects in angiogenesis. We assessed and quantified endothelial migration by measuring the length of developing central arteries, but we did not examine endothelial cell proliferation/survival defects in glial KOs. In our scRNA-seq analysis, the proportion of endothelial cells reduced among betaPix deficiency, indicating that endothelial cell proliferation/survival might decrease in mutants. In this endothelial cell cluster, we found disrupted transcriptional landscape in a set of angiogenic associated genes (Figure 6M). While these analysis highlights altered angiogenic transcriptome profile in endothelial cells of betaPix knockouts, we acknowledge that our study does not directly address proliferation/survival phenotypes in endothelial cells, which warrants future investigations on the role of betaPix in regulating glia-endothelial cell interaction.  

      Reviewer #2 (Public review):

      Summary:

      Using a genetic model of beta-pix conditional trap, the authors are able to regulate the spatio-temporal depletion of beta-pix, a gene with an established role in maintaining vascular integrity (shown elsewhere). This study provides strong in vivo evidence that glial beta-pix is essential to the development of the blood-brain barrier and maintaining vascular integrity. Using genetic and biochemical approaches, the authors show that PAK1 and Stathmins are in the same signaling axis as beta-pix, and act downstream to it, potentially regulating cytoskeletal remodeling and controlling glial migration. How exactly the glial-specific (beta-pix driven-) signaling influences angiogenesis or vascular integrity is not clear.

      Strengths:

      (1) Developing a conditional gene-trap genetic model which allows for tracking knockin reporter driven by endogenous promoter, plus allowing for knocking down genes. This genetic model enabled the authors to address the relevant scientific questions they were interested in, i.e., a) track expression of beta-pix gene, b) deletion of beta-pix gene in a cell-specific manner.

      (2) The study reveals the glial-specific role of beta-pix, which was unknown earlier. This opens up avenues for further research. (For instance, how do such (multiple) cell-specific signaling converge onto endothelial cells which build the central artery and maintain the blood-brain barriers?)

      We thank this reviewer for his/her overall supports on our work.

      Weaknesses:

      Major:

      (1) The study clearly establishes a role of beta-pix in glial cells, which regulates the length of the central artery and keeps the hemorrhages under control. Nevertheless, it is not clear how this is accomplished.

      (a) Is this phenotype (hemorrhage) a result of the direct interaction of glial cells and the adjacent endothelial cells? If direct, is the communication established through junctions or through secreted molecules?

      Thanks for this critical question. We attempted to address this issue by performing live imaging using light-sheet confocal microscopy, but failed to achieve sub-cellular resolution. We don’t have data to address this critical issue that warrants future investigations. 

      (b) The authors do not exclude the possibility that the effects observed on endothelial cells (quantified as length of central artery) could be secondary to the phenotype observed with deletion of glial beta-pix. For instance, can glial beta-pix regulate angiogenic factors secreted by peri-vascular cells, which consequently regulate the length of the central artery or vascular integrity?

      Thank the reviewer for this critical point. While we found the major defects of endothelial cell migration quantified by the central artery length, could not rule out the participation of signals from other peri-vascular cells. We fully agree that it will be important to address the cell-type specific relationship by angiogenic factors. Of note, degradation of extracellular matrix and focal adhesion is critical for the hemorrhagic phenotypes of bbh mutants. In a previous published study in our group, we found that suppressing the globally induced MEK/ERK/MMP9 signaling in bbh mutants significantly decreases hemorrhages. Accordingly, we edited a paragraph in the Discussion section on pages 24-25. We plan to continue investigating whether the complex interactions in the perivascular space contribute to vascular integrity disruption, as well as the cross-talks among different cell types during vascular development in these mutants. We believe that our model of glial specific betaPix function will guide us to further study cellular interactions in the follow-up studies.

      (c) The pictorial summary of the findings (Figure 7) does not include Zfhx or Vegfa. The data do not provide clarity on how these molecules contribute (directly or indirectly) to endothelial cell integrity. Vegfaa is expressed in the central artery, but the expression of the receptor in these endothelial cells is not shown. Similarly, all other experimental analyses for Zfhx and Vegfa expression were performed in glial cells. More experimental evidence is necessary to show the regulation of angiogenesis (of endothelial cells) by glial beta-pix. Is the Vegfaa receptor present on central arteries, and how does glial depletion of beta-pix affect its expression or response of central artery endothelial cells (both pertaining to angiogenesis and vascular integrity).

      Thank this reviewer for pointing out this critical issue. We have now revised the pictorial summary including Zfhx or Vegfa information in Figure 7. The key receptors of VEGF-A ligand are VEGFR-1 and VEGFR-2. In zebrafish, expression of Vegfr-2, as known as kdrl, is well-documented at endothelial cells including the hindbrain central arteries. We fully agree that it would indeed be of great value to assess changes of kdrl expression pattern after betaPix deficiency in vivo. It warrants future investigations to address how the VEGFA-VEGFR2 signaling in endothelial cells is altered in betaPix mutants.

      (2) Microtubule stabilization via glial beta-pix, claimed in Figure 5M, is unclear. Magnified images for h-betapix OE and h-stmn-1 glial cells are absent. Is this migration regulated by beta-pix through its GEF activity for Cdc42/Rac?

      We have now revised Figure 5M to include magnified images for h-betaPIX and h-STMN1 overexpression groups. It has been shown that there is a positive feedback loop of microtubule regulation consisting of Rac1-Pak1-Stathmin at the cell edge (Zeitz and Kierfeld, 2014 Biophys J.). Previous studies have shown betaPix activates Rac1 through its GEF activity and also regulates the activity of Pak1 via direct binding. As reported by Kwon et al., betaPix-d isoform promotes neurite outgrowth via the PAK-dependent inactivation of Stathmin1. In this work, we did not assess binding activity of betaPix to Rac1 or Pak1. Nevertheless, our data on the rescue experiments via IPA-3 suggest that betaPix deficiency impaired migration through Pak1 signaling. 

      (3) Hemorrhages are caused by compromised vascular integrity, which was not measured (either qualitatively or quantitatively) throughout the manuscript. The authors do measure the length of the central artery in several gene deletion models (2I, 3C. 5F/J, 6G/K), which is indicative of artery growth/ angiogenesis. How (if at all) defects in angiogenesis are an indication of hemorrhage should be explained or established. Do these angiogenic growth defects translate into junctional defects at later developmental time points? Formation and maintenance of endothelial cell junctions within the hemorrhaging arteries should be assessed in fish with deleted beta-pix from astrocytes.

      We appreciate the reviewer’s point and agree that this is a key aspect we need to clarify. To address junctional defects in our model, we re-examined the scRNA-seq data and found mild downregulation of junction protein claudin-5a (cldn5a) levels in the transcriptome analysis of the endothelial cluster (Author response image 1). We agree in principle that single cell RNA sequencing findings should be validated by immunostaining. While we did not measure junctional defects directly in this work, we have previously reported comparable tight junction protein zonula occludens-1 (ZO1) expression between siblings and bbh mutants (Yang et al., 2017 Dis Model Mech). In zebrafish, functionally characterized blood brain barrier (BBB) is only identified after 3 dpf. The lack of mature BBB might be due to the immature status of barrier signature at this developmental stage. Hemorrhage phenotype occurred around 40 hpf, and hematomas would be almost completely absorbed at later stage since most mutants recover and survive to adulthood. Thus future studies are needed to address the junctional characteristics on the cellular and molecular level in later developmental stages of betaPix mutants.   

      Author response image 1.

      Violin plots showing cdh5, cldn5a, cldn5b and oclna expression levels in endothelial sub-cluster. ctrl, control siblings; ko, betaPix knockouts (CRISPR mutants); 1d or 2d, 1 or 2 days post fertilization.

      (4) More information is required about the quality control steps for 10X sequencing (Figure 4, number of cells, reads, etc.). What steps were taken to validate the data quality? The EC groups, 1 and 2-days post-KO are not visible in 4C. One appreciates that the progenitor group is affected the most 2 days post-KO. But since the effects are expected to be on the endothelial cell group as well (which is shown in in vivo data), an extensive analysis should be done on the EC group (like markers for junctional integrity, angiogenesis, mesenchymal interaction, etc.). Are Stathmins limited to glial cells? Are there indicators for angiogenic responses in endothelial cells?

      Thank the reviewer for these critical suggestions. The detailed statements about the quality control steps for 10X sequencing are now provided in the Materials and Methods section. We validate the data quality through multiple steps, including verification of the number of viable cells used in experiment, assessment of peak shapes and fragment sizes of scRNA-seq libraries, confirmation of sufficient cell counts and sequencing reads for data analyses, and implementation of stringent filtering steps to exclude low-quality cells. Stathmins expressions as shown in Violin plots in Figure 4E and stmn1a, stmn1b and stmn4l expressions in UMAP plots in Figure S6C. These expressions are not limited to glial cells but distributed more widely among zebrafish tissues. We would like to point out that despite the small amount, the endothelial cell clusters are presented in Figure 4C with color brown. The proportions of EC groups split by four sample are visualized in Figure S6B and shown significant reduction among betaPix knockouts at 2 dpf, which had similar trend as glial progenitors. In addition, gene ontology analysis identified a set of down-regulated angiogenic genes expression in endothelial cluster (Figure 6M). We realize our interpretation of endothelial cell phenotypes was not sufficiently clear in this work and have now added sentences to the manuscript text on pages 16-17. As noted above, future studies are needed to address how glial betaPix regulates endothelial cell and BBB function. 

      Reviewing Editor Comments:

      comments on your manuscript. Addressing comments 1-3 from Reviewer 1 and comment 1 and its subparts from Reviewer 2 (major weaknesses) will significantly improve the manuscript by reinforcing the cell autonomous requirement of betaPix and also gain mechanistic insights. In addition, extensive proofreading and editing of the text, as well as changes to the figure, figure legends, and the discussion as indicated by both reviewers, will improve the readability and clarity of this manuscript.

      Thanks for Reviewing Editor on his/her supports on this manuscript. As noted above, we are trying to address the reviewers’ comments using the data we obtained in this work, as well as our plans for future investigations. We have now made extensive proofreading and editing of manuscript text and figure legends for improving the readability and clarity of this manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) The Discussion is written like an introduction with very little engagement with the data generated in the manuscript. The role of betapix-Pak-stathmin and betapix-zfhx3/4-vegfaa is barely discussed and contextualised vis-à-vis the current knowledge in the field.

      We appreciate the reviewer’s critical comments regarding the Discussion section. We have now revised the manuscript text on pages 20-23 to address the role of betapix-Pak-stathmin and betapix-zfhx3/4-vegfaa axis with contributions from this work.

      (2) Line 145: "light sheet microscopy" - explain that this was only for experiments involving fluorescence. Currently, it reads as if the data presented in Figures 1D and E are also obtained via light sheet microscopy. E.g., the paragraph starting on line 139 does not say what line was imaged (and what it labels) to reach the conclusions reached. This detail is not there even in the associated figure legend. Similarly, line 153 discusses radial glia, but there is no indication that these were labelled using Tg (GFAP:GFP) except in the figure annotation. There are various instances of such omissions throughout the text, and they should be remedied to indicate what each line is and what it labels, at least in the first instance.

      Thank the reviewer for their thoughtful points. In this revised version, we have incorporated more statements of the objectives and methodologies in the text in pages 8-9. We hope that the revised manuscript can better present the data with clarifying methodologies and materials used in this work. 

      (3) Figure 1E legend: What is the haemorrhage percentage? Is it the number of embryos per experiment showing hemorrhage? Indicate in the text. In the right panel, what is the number of embryos used? Please ensure all numbers (number of embryos, experiments, etc) used to plot any data in the set of figures in the entire manuscript are clearly indicated.

      Thank the reviewer for the suggestion. In this revised version, we have incorporated more detailed statements in figures and figure legends in the manuscript to show the numbers of embryos used.

      (4) The Discussion section suddenly introduces the blood-brain barrier and extensively discusses it. However, while cerebral haemorrhage can disrupt the BBB and exacerbate the effects of the haemorrhage, this manuscript does not suggest that a weakened BBB is the cause of haemorrhages in betapix mutants. More likely, betapix stabilises and maintains vascular integrity, and loss of this function causes haemorrhaging and subsequent disruption of the BBB. The glial function noted in this study is likely to be distinct from the glial function in BBB development and maintenance. The authors do not show any direct evidence for the latter. These should be shortened, and only relevant aspects facilitating contextualisation of data generated in this manuscript should be retained.

      We have now revised the Discussion section to reduce the introduction of blood-brain barrier and add statements according to the suggestions from both reviewers. We hope that the revisions provide a more relevant and balanced discussion.

      (5) Is the scratch assay in Figure 5 controlled for differences in cell proliferation among the different manipulations?

      We plated the same numbers of cells and cultured them in the same condition. Before conducted scratch assay we replaced medium with serum-free culture medium to reduce the effect from cell proliferation among the different manipulation groups. 

      (6) In the glioblastoma experiments involving betapix KD, does stathmin RNA/protein decrease? What about Ser 16 phosphorylation (as shown for neurons in Kwon et al, 2020)?

      STMN1 RNA was down-regulated by betaPIX deficiency, which was rescued by betaPIX overexpression in glial cells (Author response image 2). These results are similar to those from in vivo analysis (Figure 5A, 5B and S7A). We agree with the reviewer that it would been ideal to examine Ser 16 phosphorylation of Stathmin in our models. However, we believe that our data have established Stathmins function downstream to betaPix.

      Author response image 2.

      qRT-PCR analysis showing that betaPIX over-expression (betaPix OE) rescued STMN1 expression in betaPIX siRNA knockdown (betaPix KD) in U251 cells. Data are presented in mean ± SEM; one-way ANOVA analysis with Dunnett's test, individual P values mentioned in the figure

      (7) How was the rescue of betapix in glioblastoma cells with siRNA-mediated betapix knockdown performed? Is this by betapix-resistant cDNA? Further, no information about isoforms of betapix (both for siRNA-mediated KD and rescue) or stathmin is provided.

      As similar to our Zwitch method that disrupting all betaPix transcripts in vivo, the knockdown of human betaPIX were designed to target conserved region of all transcripts in glioblastoma cell lines. And the rescue human betaPIX were obtained from the U251 cDNA library, ideally all isoforms enriched in the glioblastoma cell line would be isolated. The missing details are now provided in the Materials and Methods section, page 26. 

      (8) It is unclear what the authors' thoughts are on the decrease in stathmin observed and the functional outcome of this decrease. The Discussion could benefit from this.

      Thanks. We have now incorporated a new paragraph in the Discussion section at pages 21-22 addressing that down-regulated expression of Stathmins is associated with functional outcome of this decrease.

      (9) Zfhx4 mRNA injection is performed on bbh and betapixKO (is this a global or glial KO?) and found to rescue haemorrhaging. While vegfaa mRNA increases, it is formally possible that the rescue is not due to the increase in vegfaa (or that vegfaa is sufficient). Injection of vegfaa mRNA could address this issue.

      Zfhx4 mRNA injection was performed on bbh mutants and global betapix knockouts (crispr mutants). To avoid confusion, we have now included a sentence highlighting global knockout mutants used for this rescue experiment. For the second part, we acknowledge that this study cannot definitively prove the necessity of increased vegfaa levels in the rescue experiment. However, our data established Zhfx3/4 as novel downstream effectors to betaPix in cerebral vessel development. And these effects might partly be linked to angiogenic responses regulated by Zhfx3/4. In this revised version, we carefully proposed that Vegfaa signals act downstream of betaPix-Zfhx3/4 axis and highlighted the weakness of our manuscript on not fully investigating sufficiency of Vegfaa in the Discussion section at page 24. We intend to pursue more extensive analysis in our follow-up studies.

      (10) A significant part of the manuscript looks at angiogenesis/vascularisation, however, the title of the paper only reflects vessel integrity (which can be distinct from angiogenesis).

      Thanks. We have now changed the title to: Glial betaPix is essential for blood vessel development in the zebrafish brain

      (11) Line 366: The BBB abbreviation is used without indicating the full form. Perhaps this can be introduced in the preceding sentence.

      We have now edited the following sentence: “The maturation hallmark of central nervous system (CNS) vasculature is acquisition of blood brain barrier (BBB) properties, establishing a stable environment ...” in lines 386-387, Discussion section.

      (12) Line 371: "rupture" and not "rapture".

      We thank the reviewer for pointing out the spelling error, and have now made this correction. 

      (13) Line 416: "is enriched" instead of "enriches"?

      We have now edited as: “...end feet that is enriched with aquaporin-4 ...” in line 411, page 19. 

      (14) The sentence in lines 121-123 should be simplified.

      We have now revised this sentence as the following: “A previous work has shown that bubblehead (bbh<sup>fn40a</sup>) mutant has a global reduction in betaPix transcripts, and bbh<sup>m292</sup> mutant has a hypomorphic mutation in betaPix, thus establishing that betaPix is responsible for bubblehead mutant phenotypes [10]”. 

      (15) No mention in the text of what o-dianisine labels.

      We have now edited the following sentence: “By using o-dianisidine staining to label hemoglobins, we found severe brain hemorrhages ...” in lines 131-133.

      (16) Line 165: Sentence requires improvement. Perhaps "Vascularisation of the central arteries in the zebrafish hindbrain ...".

      We have now edited this sentence as: “Vascularisation of the central arteries in the zebrafish hindbrain starts at 29 hpf.” in this revised version (line 176). 

      (17) Line 184: Why is "hematopoiesis" mentioned? The genesis of blood cells is not tested anywhere in the manuscript.

      Thanks. We have now edited this statement as: “IPA-3 treatment had no effect on heamorrhage induction in betaPix<sup>ct/ct</sup> control siblings.” 

      (18) Line 222-223: Improve "increasing trends". Perhaps "increased relative proportions". Clarify "progenitors" means neuronal and glial progenitors.

      We have now edited this statement: “we found that most neuronal clusters increased relative proportions ...” in this revised version.

      (19) Line 232-233: "arrow indicates" - perhaps "indicated by the arrow"? Also, the arrow indicating gfap needs to be mentioned in the Figure S6A legend. Cannot understand what is meant by "as of its enriched gfap".

      We have now edited in the text as: “Figure S6A, indicated by the arrow”, and added “Box area and arrow highlighting gfap expressions.” in Figure S6 legend. To avoid confusion, we have revised "as of its enriched gfap" sentence as the following: “We next focused on the progenitor cluster owing to the enriched gfap expression and the significantly reduced numbers of cells in this cluster by betaPix deficiency.”

      (20) Line 239 - 240: While the sentence says "... revealed three major categories:", well, more than 3 are mentioned subsequently.

      To avoid possible confusion in the text, we have now removed the sub-category examples and presented the data as: “three major categories: epigenetic remodeling, microtubule organizations and neurotransmitter secretion/transportation (Figure 4D).” 

      (21) Line 252: Stathmins negatively regulate microtubule stability. Why are they referred to as "microtubule polymerization genes stathmins"?

      We are thankful to the reviewer for pointing out this error, and we have now made correction in the text as “microtubule-destabilizing protein Stathmins”.

      (22) Line 262-265: The citation used to indicate concurrence with mouse data is disingenuous. That study did not show a reduction in stathmin levels upon betapix loss. Rather, it showed an increase in Ser16 phosphorylation on stathmin, which reduces stathmin's microtubule destabilising function. Please elaborate on the difference between the two studies.

      We completely agree with the reviewer’s statement that in the cited article, increased Ser16 phosphorylation on stathmin reduces its microtubule destabilising function. While that study did not show a reduction in Stathmin levels, others have shown that transcriptionally downregulated Stathmins are associated with the impaired neuronal and glial development. We have now revised the Discussion section by adding a new paragraph to address the disrupted homeostasis of Stathmins in these previous studies and their possible association with our data. We hope that these changes we made can clarify this issue. 

      (23) Line 310: While ZFHX3 levels are reduced in betapix mutants and KD in glioblastomas, were ZFHX3 and 4 up- or downregulated in the scRNA-Seq data?

      Thanks for this critical point. Indeed, our results showed that ZFHX3 and 4 down-regulated in the glial progenitor cluster in the scRNA-Seq data (Figure S8A) in betaPix knockouts and the FACS-sorted glia cells (Figure S8B). 

      (24) Line 317: "... betaPix acts upstream to Zfhx3/4-VEGFA signaling in regulating angiogenesis ...". While this is established later, the data at the time of this sentence does not warrant this claim.

      We agree with the reviewer’s statement and restated this sentence in the following way: “Zfhx3/4 might act as downstream effector of betaPix.”

      Reviewer #2 (Recommendations for the authors):

      (1) The images shown in 2E/H, 3B, 6F/J can use a schematic that helps readers to understand what to expect or look for. Splitting up the channels may also help in visualizing the vasculature clearly.

      Thank the reviewer for these suggestions. In this revised version, we have included schematic diagrams in the figures and incorporated more detailed statements in the legends.

      (2) Many times, arrows are pointing to structures (2E/H, 3B), but are not explained clearly (neither in the text nor in the legends). In 3B, the arrow is pointing to a negative space.

      (3) Legends are minimalistic and do not provide much information. The reader is left to interpret the data on their own.

      We apologize for not explaining the figures in enough details. In this revised version, we have now incorporated more detailed statements in the figure legends and have adjusted arrows in all figures.

      (4) The text needs heavy proofreading. For example:

      (a) Line 208- the title does not seem appropriate since the following text does not discuss Stathmins at all, which comes later.

      We agree with the reviewer’s statement and restated the title in the following way: “Single-cell transcriptome profiling reveals that gfap-positive progenitors were affected in betaPix knockouts.”

      (b) There is no mention of Figure 7 throughout the text.

      (c) Figure 7 does not include Zfhx or Vegfaa.

      Thank the reviewer for pointing out these errors. We have now revised Figure 7 and incorporated it to corresponding paragraphs in the Discussion section. 

      (5) The discussion seems incoherent in its current state.

      We have now revised the Discussion section according to the suggestions from both reviewers. We hope these revisions adequately address your concerns.

      (6) Please include some of the following points, if possible, in the discussion.

      (a) How is GEF activity of Rac/Cdc42 expected to be affected in beta-pix KO fishes?

      (b) What are the possible different ways the angiogenic pathways merge onto endothelial cells? Or do the authors imagine this process to be entirely driven by glial cells (directly)?

      We would like to thank the reviewer for his/her invaluable suggestions. We have now revised the Discussion section and hope that these changes can provide better and more balanced discussion. Since we have no data directly related to GEF activity of Rac/Cdc42 that might be affected in betaPix mutants, as well as have very limited data showing how glial betaPix regulates cerebral endothelial cells and BBB function, we would like to have the Discussion focused on the CRISPR-induced KI and cKO technologies, glial betaPix function and brain hemorrhage, and the putative role of betaPix-Zfhx3/4-VEGF function in central artery development. 

      References:

      Daub, H., Gevaert, K., Vandekerckhove, J., Sobel, A., and Hall, A. (2001). Rac/Cdc42 and p65PAK regulate the microtubule-destabilizing protein stathmin through phosphorylation at serine 16. J Biol Chem 276, 1677-1680. 10.1074/jbc.C000635200.

      Kim S, Park H, Kang J, Choi S, Sadra A, Huh SO. β-PIX-d, a Member of the ARHGEF7 Guanine Nucleotide Exchange Factor Family, Activates Rac1 and Induces Neuritogenesis in Primary Cortical Neurons. Exp Neurobiol. 2024;33(5):215-224. doi:10.5607/en24026

      Kwon Y, Jeon YW, Kwon M, Cho Y, Park D, Shin JE. βPix-d promotes tubulin acetylation and neurite outgrowth through a PAK/Stathmin1 signaling pathway [published correction appears in PLoS One. 2020 May 13;15(5):e0233327. doi: 10.1371/journal.pone.0233327.]. PLoS One. 2020;15(4):e0230814. Published 2020 Apr 6. doi:10.1371/journal.pone.0230814

      Kwon Y, Lee SJ, Shin YK, Choi JS, Park D, Shin JE. Loss of neuronal βPix isoforms impairs neuronal morphology in the hippocampus and causes behavioral defects. Anim Cells Syst (Seoul). 2025;29(1):57-71. Published 2025 Jan 8. doi:10.1080/19768354.2024.2448999

      Wittmann, T., Bokoch, G.M., and Waterman-Storer, C.M. (2004). Regulation of microtubule destabilizing activity of Op18/stathmin downstream of Rac1. J Biol Chem 279, 6196-6203.10.1074/jbc.M307261200.

      Zeitz, M., and Kierfeld, J. (2014). Feedback mechanism for microtubule length regulation by stathmin gradients. Biophys J 107, 2860-2871.10.1016/j.bpj.2014.10.056.

    1. Author response:

      Thank you for sharing a detailed review of our manuscript titled, Variations and predictability of epistasis on an intragenic fitness landscape. We have now carefully gone through the reviewers’ and the editor’s comments and have the following preliminary responses.

      (1) Measurement noise in the folA fitness landscape. All three reviewers and the editors raise the important matter of incorporating measurement noise in the fitness landscape. The paper by Papkou and coworkers makes the fitness measurements of the landscape in six independent repeats. They show that the fitness data is highly correlated in each repeat, and use the weighted mean of the repeats to report their results. They do not study how measurement noise influences their findings. The results by Papkou and coworkers were our starting point, and hence, we built on the landscape properties reported in their study. As a result, we also analyse our results working with the same mean of the six independent measurements.

      The main result of the work by Papkou and coworkers is that largest subgraph in the landscape has 514 fitness peaks. 

      We revisit this result by quantifying how measurement noise changes this number. By doing this, we note the subgraph contains only 127 peaks which are statistically significant. We define a sequence as a peak when its corresponding fitness is greater than all its one-distance neighbours with a p-value < 0.05. This shows that, as pointed out in the reviews, incorporating noise in the landscape results significantly changes how we view the landscape – a facet not included in Papkou et al and the current version of our manuscript. 

      Not incorporating measurement noise means that the entire landscape has 4055 peaks. When measurement noise is included in the analysis, this number reduces to 137, out of which 136 are high fitness backgrounds (functional). 

      In the revised version of our manuscript, we will incorporate measurement noise in our analysis. Through this, we will also address the concern regarding the use of an arbitrary cut-off to study “fluid” epistasis. However, we note that arbitrary cut-offs to define DFEs have been recently used (Sane et al., PNAS, 2023).

      We also note that previous work with large scale landscapes (Wu et al, eLife, 2016) also reported a fitness landscape with a single experiment, with no repeats. 

      (2) Global nonlinearities and higher-order leading to fluid epistasis. Attempts at building models for higher-order epistasis from empirical data have largely been confined to landscapes of a limited data size. For example, Sailer & Harms, Genetics, 2017 propose models for higher-order epistasis from seven empirical data sets, each with less than a 100 data points. Another recent attempt (Park et al, Nat Comm, 2024) proposes rule for protein structure-function with 20 fitness landscapes. In this study, only one landscape which used fitness as a phenotype had ~160000 data points (of which only 42% were included for analysis). All other data sets which used fitness as a phenotype contained less than 10000 data points. While these statistical proposals of how higher-order epistasis operates exist, none of them are reliant of large scale, exhaustive network, like the one proposed by Papkou and coworkers.  

      In the edited manuscript, we will replace our arbitrary cut-off with results of statistical tests carried out based on measurement noise. 

      Global non-linearities shape evolutionary responses. We would like to emphasize that the goal of this work to study and understand how these global non-linearities result in patterns on a large fitness landscape by presenting the sum total of these fundamental factors in shaping statistical patterns. 

      While we understand that we may not have sufficiently explained the effects of global non-linearities on our results, we do not agree with the reviewer’s conclusion that our results are artifacts of these non-linearities. We will expand on the role of these nonlinearities on the patterns that we observe (like, fitness being bounded, as pointed out by reviewer 2, or differential impact of a mutation in functional vs. non-functional variants).

      We also speculate that changing our arbitrary cut-off (selection coefficient of 0.05) to measurement noise will not alter our results qualitatively. 

      The question we address in our work is, therefore, how does the nature of epistasis change with genetic background over a large, exhaustive landscape. The nature of epistasis between two mutations is analysed in all 4<sup>7</sup> backgrounds. The causative agents for the change in epistasis will be context-dependent, depending on the precise nature of the two mutations and the background. For instance, a certain background might simply introduce a Stop codon in the sequence. Notwithstanding these precise, local mechanistic explanations, we seek to answer how epistasis changes statistically in a sequence. Investigating statistical patterns which explain switch in nature of epistasis in deep, exhaustive landscapes is a long-term goal of this research.

      (3) Last, in our revised manuscript, we will address the reviewers’ other minor comments on the various aspects of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The idea is appealing, but the authors have not sufficiently demonstrated the utility of this approach.

      Strengths: 

      Novelty of the approach, potential impli=cations for discovering novel interactions

      Weaknesses:

      The Duong had introduced their highly elegant peptidisc approach several years ago. In this present work, they combine it with thermal proteome profiling (TPP) and attempt to demonstrate the utility of this combination for identifying novel membrane protein-ligand interactions.

      While I find this idea intriguing, and the approach potentially useful, I do not feel that the authors had sufficiently demonstrated the utility of this approach. My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary. In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      We thank the reviewer for their thoughtful comments. In this revision, we have experimentally addressed the reviewer’s concerns in three ways:

      (1) To demonstrate the utility of our MM-TPP method over the detergent-based TPP workflow (termed DB-TPP), we performed a side-by-side comparison using ATP–VO₄ at 51 °C (Figure 3B and Figure 4A). From the DB-TPP dataset, 7.4% of all identified proteins were annotated as ATP-binding, while 6.4% of proteins differentially stabilized were annotated as ATP-binding. In contrast, in the MM-TPP dataset, 9.3% of all identified proteins were annotated as ATP-binding proteins, while 17% of proteins differentially stabilized were annotated as ATP-binding. The lack of enrichment in the detergent-based approach indicates that the observed differences are likely stochastic, rather than a result of specific ATP–VO₄-mediated stabilization as found with MM-TPP. For instance, several key proteins—BCS1, P2RY6, SLC27A2, ABCB1, ABCC2, and ABCC9— found differentially stabilized using the MM-TPP method showed no such pattern in the DB-TPP dataset. This divergence strongly supports the specificity and utility of our Peptidisc approach. 

      (2) To demonstrate that MM-TPP can resolve not only the broader effects of ATP–VO₄ but also specific ligand–protein interactions, we employed 2-methylthio-ADP (2-MeS-ADP), a selective agonist of the P2RY12 receptor [PMID: 24784220]. In that case, we observed clear thermal stabilization of P2RY12, with more than 6-fold increase in stability at both 51 °C and 57 °C (–log₁₀ p > 5.97; Figure 4B and Figure S4). Notably, no other proteins—including the structurally related but non-responsive P2RY6 receptor- showed comparable stabilization fold change at these temperatures.

      (3) To further probe the reproducibility of the method, we performed an independent MMTPP evaluation with ATP–VO₄ at 51 °C using data-independent acquisition (DIA), in contrast to the data-dependent acquisition (DDA) approach used in the initial study (Figure S5). Overall, 7.8% of all identified proteins were annotated as ATP-binding, and as before, this proportion increased to 17% among proteins with log₂ fold changes greater than 0.5. Specifically, BCS1 and SLC27A2 exhibited strong stabilization (log₂ fold change > 1), while P2RY6, ABCB11, ABCC2, and ABCG2 showed moderate stabilization (log₂ fold changes between 0.5 and 1), and consistent with previous results, P2RX4 was destabilized, with a log₂ fold change below –1. These findings support the consistency and reproducibility of the method across distinct data acquisition methods.

      My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary.  

      The primary objective of our study is to establish and benchmark the MM-TPP workflow using known targets, rather than to discover novel ligand–protein interactions. Identifying new binders requires extensive screening and downstream validations, which we believe is beyond the scope of this methodological report. Instead, our study highlights the sensitivity and reliability of the MM-TPP approach by demonstrating consistent and reproducible results with well-characterized interactions.

      We respectfully disagree with the notion that introducing a new methodology must necessarily include the discovery of novel interactions. For instance, Martinez Molina et al. [PMID: 23828940] introduced the cellular thermal shift assay (CETSA) by validating established targets such as MetAP2 with TNP-470 and CDK2 with AZD-5438, without identifying novel protein–ligand pairs. Similarly, Kalxdorf et al. [PMID: 33398190] published their cell-surface thermal proteome profiling (CS-TPP) using Ouabain to stabilize the Na⁺/K⁺-ATPase pump in K562 cells, and SB431542 to stabilize its canonical target JAG1. In fact, when these methods revealed additional stabilizations, these were not validated but instead interpreted through reasoning grounded in the literature. For instance, they attributed the SB431542-induced stabilization of MCT1 to its reported role in cell migration and tumor invasiveness, and explained that SLC1A2 stabilization is related to the disruption of Na⁺/K⁺-ATPase activity by Ouabain. In the same way, our interpretation of ATP-VO₄–mediated stabilization of Mao-B is justified by predictive AlphaFold-3 rather than direct orthogonal assays, which are beyond the scope of our methodological presentation. 

      Collectively, the influential studies cited above have set methodological precedents by prioritizing validation and proof-of-concept over merely finding uncharacterized binders. In the same spirit, our work is centred on establishing MM-TPP as a robust platform for probing membrane protein–ligand interactions in a water-soluble format. The discovery of novel binders remains an exciting future direction—one that will build upon the methodological foundation laid by the present study.

      In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      We deliberately began this study with our model protein, MsbA, examined under both native and overexpressed conditions, to establish an adequation between MMTPP (Figure 2D) and biochemical stability assays (Figure 2A). This validation has provided us with the foundation to confidently extend MM-TPP to the mouse organ proteome. To demonstrate the validity of our workflow, we have used ATP-VO₄ because it has expected targets. 

      We note that orthogonal validation often requires overproduction and purification of the candidate proteins, including suitable antibodies, which is a true challenge for membrane proteins. Here, we demonstrate that MM-TPP can detect ligand-induced thermal shifts directly in native membrane preparations, without requiring protein overproduction or purification. We also emphasize several influential studies in TPP, including Martinez Molina et al. (PMID: 23828940) and Fang et al. (PMID: 34188175), which focused primarily on establishing and benchmarking the methodology, rather than on extensive orthogonal validation. In the same spirit, our study prioritizes methodological development, and accordingly, several orthogonal validations are now included in this revision.

      [...] and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      To clarify, all analyses on ligand-induced stabilization or destabilization were carried out using LFQ values. The sole exception is on Figure 2B, where we used iBAQ values to depict the relative abundance of proteins within a single sample; this to show MsbA's relative level within the E. coli peptidisc library.

      Respectfully, we disagree with the assertion that we are “quantifying rather small differences in abundances using either iBAQ or LFQ.” We were able to clearly distinguish between stabilizations driven by specific ligands binding to their targets versus those caused by non-specific ligands with broader activity. This is further confirmed by comparing 2-MeS-ADP, a selective ligand for P2RY12, with ATP-VO₄, a highly promiscuous ligand, and AMP-PNP, which exhibits intermediate breadth. When tested in triplicate at 51 °C, 2-MeS-ADP significantly altered the thermal stability of 27 proteins,  AMP-PNP 44 proteins, and ATP-VO₄ 230 proteins, consistent with the expectation that broader ligands stabilize more proteins nonspecifically. Importantly, 2-MeS-ADP produced markedly stronger stabilization of its intended target, P2RY12 (–log<sub>10</sub>p = 9.32), than the top stabilized proteins for ATP–VO₄ (DNAJB3, –log₁₀p = 5.87) or AMP-PNP (FTH1, p = 5.34). Moreover, 2-MeS-ADP did not significantly stabilize proteins that were consistently stabilized by the broad ligands, such as SLC27A2, which was strongly stabilized by both ATP-VO<sub>4</sub> and AMP-PNP (–log<sub>10</sub> p>2.5). Together, these findings demonstrate that MMTPP can robustly distinguish between broad-spectrum and target-specific ligands, with selective ligands inducing stronger and more physiologically meaningful stabilization at their intended targets compared to promiscuous ligands.

      Finally, we emphasize that our findings are not marginal, but meet quantitative and statistical rigor consistent with best practices in proteomics. We apply dual thresholds combining effect size (|log₂FC| ≥ 1, i.e., at least a two-fold change) with statistical significance (FDR-adjusted p ≤ 0.05)—criteria commonly used in proteomics methodology studies (e.g., PMID: 24942700, 38724498). Moreover, the stabilization and destabilization events we report are reproducible across biological replicates (n = 3), consistent across adjacent temperatures for most targets, and technically robust across acquisition modes (DDA vs. DIA). Taken together, these results reflect statistically valid and biologically meaningful effects, fully aligned with standards set by prior published proteomics studies.

      Furthermore, the reported changes in abundances are solely based on iBAQ or LFQ analysis. This must be supported by a more quantitative approach such as SILAC or labeled peptides. In summary, I think this story requires a stronger and broader demonstration of the ability of peptidisc-TPP to identify novel physiologically/pharmacologically relevant interactions.

      With respect to labeling strategies, we deliberately avoided using TMT due to concerns about both cost and potential data quality issues. Some recent studies have documented the drawbacks of TMT in contexts directly relevant to our work. For example, a benchmarking study of LiP-MS workflows showed that although TMT increased proteome depth and reduced technical variance, it was less accurate in identifying true drug–protein interactions and produced weaker dose–response correlations compared with label-free DIA approaches [PMID: 40089063]. More broadly, technical reviews have highlighted that isobaric tagging is intrinsically prone to ratio compression and reporterion interference due to co-isolation and co-fragmentation of peptides, which flatten measured fold-changes and obscure biologically meaningful differences [PMID: 22580419, 22036744]. In terms of SILAC, the technique requires metabolic incorporation of heavy amino acids, which is feasible in cultured cells but not in physiologically relevant tissues such as the liver organ used here. SILAC mouse models exist, but they are expensive and time-consuming [PMID: 18662549, 21909926]. We are not a mouse lab, and introducing liver organ SILAC labeling in our workflow is beyond the scope of these revisions. We also note that several hallmark TPP studies have been successfully carried out using label-free quantification [PMID: 25278616, 26379230, 33398190, 23828940], establishing this as an accepted and widely applied approach in the field. 

      To further support our conclusions, we added controls showing that detergent solubilization of mouse liver membranes followed by SP4 cleanup fails to detect ATP-VO₄– mediated stabilization of ATP-binding proteins, underscoring the necessity of Peptidisc reconstitution for capturing ligand-induced thermal stabilization. We also present new data demonstrating selective stabilization of the P2Y12 receptor by its agonist 2-MeS-ADP, providing orthogonal, receptor-specific validation within the MM-TPP framework. Finally, an orthogonal DIA acquisition on separate replicates confirmed robust ATP-vanadate stabilization of ATP-binding proteins, including BCS1l and SLC27A2. Together, these additions reinforce that the observed stabilizations are genuine, physiologically relevant ligand–protein interactions and highlight the unique advantage of the Peptidisc-based workflow in capturing such events.

      Cited Reference:

      24784220: Zhang J, Zhang K, Gao ZG, et al. Agonist-bound structure of the human P2Y₁₂ receptor. Nature.  2014;509(7498):119-122. doi:10.1038/nature13288. 

      23828940: Martinez Molina D, Jafari R, Ignatushchenko M, et al. Monitoring drug target engagement in cells and tissues using the cellular thermal shift assay. Science. 2013;341(6141):84-87. doi:10.1126/science.1233606.

      33398190: Kalxdorf M, Günthner I, Becher I, et al. Cell surface thermal proteome profiling tracks perturbations and drug targets on the plasma membrane. Nat Methods. 2021;18(1):84-91. doi:10.1038/s41592-020-01022-1.

      34188175: Fang S, Kirk PDW, Bantscheff M, Lilley KS, Crook OM. A Bayesian semi-parametric model for thermal proteome profiling. Commun Biol. 2021;4(1):810. doi:10.1038/s42003-021-02306-8.

      24942700: Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014;13(9):2513-2526. doi:10.1074/mcp.M113.031591.

      38724498: Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun. 2024;15(1):3922. doi:10.1038/s41467-02447899-w. 

      40089063: Koudelka T, Bassot C, Piazza I. Benchmarking of quantitative proteomics workflows for limited proteolysis mass spectrometry. Mol Cell Proteomics. 2025;24(4):100945. doi:10.1016/j.mcpro.2025.100945.

      22580419: Christoforou AL, Lilley KS. Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem. 2012;404(4):1029-1037. doi:10.1007/s00216-012-6012-9. 

      22036744: Christoforou AL, Lilley KS. Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem. 2012;404(4):1029-1037. doi:10.1007/s00216-012-6012-9. 

      18662549: Krüger M, Moser M, Ussar S, et al. SILAC mouse for quantitative proteomics uncovers kindlin-3 as an essential factor for red blood cell function. Cell. 2008;134(2):353-364. doi:10.1016/j.cell.2008.05.033.

      21909926: Zanivan S, Krueger M, Mann M. In vivo quantitative proteomics: the SILAC mouse. Methods Mol Biol. 2012;757:435-450. doi:10.1007/978-1-61779-166-6_25. 

      25278616: Kalxdorf M, Becher I, Savitski MM, et al. Temperature-dependent cellular protein stability enables highprecision proteomics profiling. Nat Methods. 2015;12(12):1147-1150. doi:10.1038/nmeth.3651.

      26379230: Savitski MM, Reinhard FBM, Franken H, et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science. 2015;346(6205):1255784. doi:10.1126/science.1255784. 

      33452728: Leuenberger P, Ganscha S, Kahraman A, et al. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science. 2020;355(6327):eaai7825. doi:10.1126/science.aai7825. 

      23066101: Savitski MM, Zinn N, Faelth-Savitski M, et al. Quantitative thermal proteome profiling reveals ligand interactions and thermal stability changes in cells. Nat Methods. 2013;10(12):1094-1096. doi:10.1038/nmeth.2766.  

      30858367: Piazza I, Kochanowski K, Cappelletti V, et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nat Commun. 2019;10(1):1216. doi:10.1038/s41467019-09199-0. 

      Reviewer #2 (Public Review):

      Summary:

      The membrane mimetic thermal proteome profiling (MM-TPP) presented by Jandu et al. seems to be a useful way to minimize the interference of detergents in efficient mass spectrometry analysis of membrane proteins. Thermal proteome profiling is a mass spectrometric method that measures binding of a drug to different proteins in a cell lysate by monitoring thermal stabilization of the proteins because of the interaction with the ligands that are being studied. This method has been underexplored for membrane proteome because of the inefficient mass spectrometric detection of membrane proteins and because of the interference from detergents that are used often for membrane protein solubilization.

      Strengths:

      In this report the binding of ligands to membrane protein targets has been monitored in crude membrane lysates or tissue homogenates exalting the efficacy of the method to detect both intended and off-target binding events in a complex physiologically relevant sample setting.

      The manuscript is lucidly written and the data presented seems clear. The only insignificant grammatical error I found was that the 'P' in the word peptidisc is not capitalized in the beginning of the methods section "MM-TPP profiling on membrane proteomes". The clear writing made it easy to understand and evaluate what has been presented. Kudos to the authors.

      Weaknesses:

      While this is a solid report and a promising tool for analyzing membrane protein drug interactions, addressing some of the minor caveats listed below could make it much more impactful.

      The authors claim that MM-TPP is done by "completely circumventing structural perturbations invoked by detergents[1] ". This may not be entirely accurate, because before reconstitution of the membrane proteins in peptidisc, the membrane fractions are solubilized by 1% DDM. The solubilization and following centrifugation step lasts at least for 45 min. It is less likely that all the structural perturbations caused by DDM to various membrane proteins and their transient interactions become completely reversed or rescued by peptidisc reconstitution.

      We thank the reviewer for this insightful comment. In response, we have revised the sentence and expanded the discussion to clarify that the Peptidisc provides a complementary approach to detergent-based preparations for studying membrane proteins, preserving native lipid–protein interactions and stabilization effects that may be diminished in detergent.

      To further address the structural perturbations invoked by detergents, and as already detailed to our response to Reviewer 1, we have compared the thermal profile of the Peptidisc library to the mouse liver membranes solubilized with 1% DDM, after incubation with ATP–VO₄ at 51 °C (Figure 4A). The results with the detergent extract revealed random patterns of stabilization and destabilization, with only 6.4% of differentially stabilized proteins being ATP-binding—comparable to the 7.4% observed in the background. In contrast, in the Peptidisc library, 17% of differentially stabilized proteins were ATP-binding, compared to 9.3% in the background. Thus, while Peptidisc reconstitution does not fully avoid initial detergent exposure, these findings underscore the importance of implementing Peptidisc in the TPP workflow when dealing with membrane proteins.

      In the introduction, the authors make statements such as "..it is widely acknowledged that even mild detergents can disrupt protein structures and activities, leading to challenges in accurately identifying drug targets.." and "[peptidisc] libraries are instrumental in capturing and stabilizing IMPs in their functional states while preserving their interactomes and lipid allosteric modulators...'. These need to be rephrased, as it has been shown by countless studies that even with membrane protein suspended in micelles robust ligand binding assays and binding kinetics have been performed leading to physiologically relevant conclusions and identification of protein-protein and protein-ligand interactions.

      We thank the reviewer for this valuable feedback and fully agree with the point raised. In response, we have revised the Introduction and conclusion to moderate the language concerning the limitations of detergent use. We now explicitly acknowledge that numerous studies have successfully used detergent micelles for ligand-binding assays and kinetic analyses, yielding physiologically relevant insights into both protein–protein and protein–ligand interactions [e.g., PMID: 22004748, 26440106, 31776188].

      At the same time, we clarify that the Peptidisc method offers a complementary advantage, particularly in the context of thermal proteome profiling (TPP), which involves mass spectrometry workflows that are incompatible with detergents. In this setting, Peptidiscs facilitate the detection of ligand-binding events that may be more difficult to observe in detergent micelles.

      We have reframed our discussion accordingly to present Peptidiscs not as a replacement for detergent-based methods, but rather as a complementary tool that broadens the available methodological landscape for studying membrane protein interactions.

      If the method involves detergent solubilization, for example using 1% DDM, it is a bit disingenuous to argue that 'interactomes and lipid allosteric modulators' characterized by lowaffinity interactions will remain intact or can be rescued upon detergent removal. Authors should discuss this or at least highlight the primary caveat of the peptidisc method of membrane protein reconstitution - which is that it begins with detergent solubilization of the proteome and does not completely circumvent structural perturbations invoked by detergents.

      We would like to clarify that, in our current workflow, ligand incubation occurs after reconstitution into Peptidiscs. As such, the method is designed to circumvent the negative effects of detergent during the critical steps involving low-affinity interactions.

      That said, we fully acknowledge that Peptidisc reconstitution begins with detergent solubilization (e.g., 1% DDM), and we have revised the conclusion to explicitly state this important caveat. As the reviewer correctly points out, this initial step may introduce some structural perturbations or result in the loss of weakly associated lipid modulators.

      However, reconstitution into Peptidiscs rapidly restores a detergent-free environment for membrane proteins, which has been shown in our previous studies [PMID: 38577106, 38232390, 31736482, 31364989] to mitigate these effects. Specifically, we have demonstrated that time-limited DDM exposure, followed by Peptidisc reconstitution, minimizes membrane protein delipidation, enhances thermal stability, retains functionality, and preserves multi-protein assemblies.

      It would also be important to test detergents that are even milder than 1% DDM and ones which are harsher than 1% DDM to show that this method of reconstitution can indeed rescue the perturbations to the structure and interactions of the membrane protein done by detergents during solubilization step. 

      We selected 1% DDM based on our previous work [PMID: 37295717, 39313981,38232390], where it consistently enabled robust and reproducible solubilization for Peptidisc reconstitution. We agree that comparing milder detergents (e.g., LMNG) and harsher ones (e.g., SDC) would provide valuable insights into how detergent strength influences structural perturbations, and how effectively these can be mitigated by Peptidisc reconstitution. Preliminary data (not shown) from mouse liver membranes indicate broadly similar proteomic profiles following solubilization with DDM, LMNG, and SDC, although potential differences in functional activity or ligand binding remain to be investigated.

      Based on the methods provided, it appears that the final amount of detergent in peptidisc membrane protein library was 0.008%, which is ~150 uM. The CMC of DDM depending on the amount of NaCl could be between 120-170 uM.

      While we cannot entirely rule out the presence of residual DDM (0.008%) in the raw library, its free concentration may be lower than initially estimated. This is related to the formation of mixed micelles with the amphipathic peptide scaffold, which is supplied in excess during reconstitution. These mixed micelles are subsequently removed during the ultrafiltration step. Furthermore, in related work using His-tagged Peptidiscs [PMID: 32364744], we purified the library by nickel-affinity chromatography following a 5× dilution into a detergent-free buffer. Although this purification step reduced the number of soluble proteins, the same membrane proteins were retained, suggesting that any residual detergent does not significantly interfere with Peptidisc reconstitution. Supporting this, our MM-TPP assays on purified libraries (data not shown) consistently demonstrated stabilization of ATP-binding proteins (e.g., SLC27A2, DNAJB3), indicating that the observed ligand–protein interactions result from successful incorporation into Peptidiscs.

      Perhaps, to completely circumvent the perturbations from detergents other methods of detergentfree solubilization such as using SMA polymers and SMALP reconstitution could be explored for a comparison. Moreover, a comparison of the peptidisc reconstitution with detergent-free extraction strategies, such as SMA copolymers, could lend more strength to the presented method.

      We agree that detergent-free methods such as SMA polymers hold promise for membrane protein solubilization. However, in preliminary single-replicate experiments using SMA2000 at 51 °C in the presence of ATP–VO₄ (data not shown), we observed broad, non-specific stabilization effects. Of the 2,287 quantified proteins, 9.3% were annotated as ATP-binding, yet 9.9% of the 101 proteins showing a log₂ fold change >1 or <–1 were ATPbinding, indicating no meaningful enrichment. Given this lack of specificity and the limited dataset, we chose not to pursue further SMA experiments and have not included them here. However, in a recent study (https://doi.org/10.1101/2025.08.25.672181), we directly compared Peptidisc, SMA, and nanodiscs for liver membrane proteome profiling. In that work, Peptidisc outperformed both SMA and nanodiscs in detecting membrane protein dysregulation between healthy and diseased liver. By extension, we expect Peptidisc to offer superior sensitivity and specificity for detecting ligand-induced stabilization events, such as those observed here with ATP–vanadate.

      Cross-verification of the identified interactions, and subsequent stabilization or destabilizations, should be demonstrated by other in vitro methods of thermal stability and ligand binding analysis using purified protein to support the efficacy of the MM-TPP method. An example cross-verification using SDS-PAGE, of the well-studied MsbA, is shown in Figure 2. In a similar fashion, other discussed targets such as, BCS1L, P2RX4, DgkA, Mao-B, and some un-annotated IMPs shown in supplementary figure 3 that display substantial stabilization or destabilization should be cross-verified.

      We appreciate this suggestion and note that a similar point was raised in R1’s comment “In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.” We have developed a detailed response to R1 on this matter, which equally applies here. 

      Cited Reference:

      35616533: Young JW, Wason IS, Zhao Z, et al. Development of a Method Combining Peptidiscs and Proteomics to Identify, Stabilize, and Purify a Detergent-Sensitive Membrane Protein Assembly. J Proteome Res. 2022;21(7):1748-1758. doi:10.1021/acs.jproteome.2c00129. PMID: 35616533.

      31364989: Carlson ML, Stacey RG, Young JW, et al. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries. Elife. 2019;8:e46615. doi:10.7554/eLife.46615. 

      22004748: O'Malley MA, Helgeson ME, Wagner NJ, Robinson AS. Toward rational design of protein detergent complexes: determinants of mixed micelles that are critical for the in vitro stabilization of a G-protein coupled receptor. Biophys J. 2011;101(8):1938-1948. doi:10.1016/j.bpj.2011.09.018.

      26440106: Allison TM, Reading E, Liko I, Baldwin AJ, Laganowsky A, Robinson CV. Quantifying the stabilizing effects of protein-ligand interactions in the gas phase. Nat Commun. 2015;6:8551. doi:10.1038/ncomms9551.

      31776188: Beckner RL, Zoubak L, Hines KG, Gawrisch K, Yeliseev AA. Probing thermostability of detergentsolubilized CB2 receptor by parallel G protein-activation and ligand-binding assays. J Biol Chem. 2020;295(1):181190. doi:10.1074/jbc.RA119.010696.

      38577106: Jandu RS, Yu H, Zhao Z, Le HT, Kim S, Huan T, Duong van Hoa F. Capture of endogenous lipids in peptidiscs and effect on protein stability and activity. iScience. 2024;27(4):109382. doi:10.1016/j.isci.2024.109382.

      38232390: Antony F, Brough Z, Zhao Z, Duong van Hoa F. Capture of the Mouse Organ Membrane Proteome Specificity in Peptidisc Libraries. J Proteome Res. 2024;23(2):857-867. doi:10.1021/acs.jproteome.3c00825.

      31736482: Saville JW, Troman LA, Duong Van Hoa F. PeptiQuick, a one-step incorporation of membrane proteins into biotinylated peptidiscs for streamlined protein binding assays. J Vis Exp. 2019;(153). doi:10.3791/60661. 

      37295717: Zhao Z, Khurana A, Antony F, et al. A Peptidisc-Based Survey of the Plasma Membrane Proteome of a Mammalian Cell. Mol Cell Proteomics. 2023;22(8):100588. doi:10.1016/j.mcpro.2023.100588. 

      39313981: Antony F, Brough Z, Orangi M, Al-Seragi M, Aoki H, Babu M, Duong van Hoa F. Sensitive Profiling of Mouse Liver Membrane Proteome Dysregulation Following a High-Fat and Alcohol Diet Treatment. Proteomics. 2024;24(23-24):e202300599. doi:10.1002/pmic.202300599. 

      32364744: Young JW, Wason IS, Zhao Z, Rattray DG, Foster LJ, Duong Van Hoa F. His-Tagged Peptidiscs Enable Affinity Purification of the Membrane Proteome for Downstream Mass Spectrometry Analysis. J Proteome Res. 2020;19(7):2553-2562. doi:10.1021/acs.jproteome.0c00022.

      32591519: The M, Käll L. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat Commun. 2020;11(1):3234. doi:10.1038/s41467-020-17037-3. 

      33188197: Kurzawa N, Becher I, Sridharan S, et al. A computational method for detection of ligand-binding proteins from dose range thermal proteome profiles. Nat Commun. 2020;11(1):5783. doi:10.1038/s41467-02019529-8. 

      26524241: Reinhard FBM, Eberhard D, Werner T, et al. Thermal proteome profiling monitors ligand interactions with cellular membrane proteins. Nat Methods. 2015;12(12):1129-1131. doi:10.1038/nmeth.3652. 

      23828940: Martinez Molina D, Jafari R, Ignatushchenko M, et al. Monitoring drug target engagement in cells and tissues using the cellular thermal shift assay. Science. 2013;341(6141):84-87. doi:10.1126/science.1233606. 

      32133759: Mateus A, Kurzawa N, Becher I, et al. Thermal proteome profiling for interrogating protein interactions. Mol Syst Biol. 2020;16(3):e9232. doi:10.15252/msb.20199232. 

      14755328: Dorsam RT, Kunapuli SP. Central role of the P2Y12 receptor in platelet activation. J Clin Invest. 2004;113(3):340-345. doi:10.1172/JCI20986. 

      Reviewer #1 (Recommendations for the authors):

      “The authors use iBAC or LFQ to compare across samples. This inconsistency is puzzling. As far as I know, LFQ should always be used when comparing across samples”

      As mentioned above, we use iBAQ only in Fig. 2B to illustrate within-sample relative abundance; all comparative analyses elsewhere use LFQ. We have updated the Fig. 2B legend to state this explicitly.

      We used iBAQ Fig. 2B as it provides a notion of protein abundance within a sample, normalizing the summed peptide intensities by the number of theoretically observable peptides. This normalization facilitates comparisons between proteins within the same sample, offering a clearer understanding of their relative molar proportions [PMID: 33452728]. LFQ, by contrast, is optimized for comparing the same protein across different samples. It achieves this by performing delayed normalization to reduce run-to-run variability and by applying maximal peptide ratio extraction, which integrates pairwise peptide intensity ratios across all samples to build a consistent protein-level quantification matrix [PMID: 24942700]. These features make LFQ more robust to missing values and technical variation, thereby enabling accurate detection of relative abundance changes in the same protein under different experimental conditions. This distinction is well supported by the proteomics literature: Smits et al. [PMID: 23066101] used iBAQ specifically to determine the relative abundance of proteins within one sample, whereas LFQ was applied for comparative analyses between conditions.

      “[Regarding Figure 2A] Why does the control also contain ATP-vanadate? Also, I am not aware of a commercially available chemical "ATP-VO4". I assume this is a mistake”

      The control condition in Figure 2A was mislabeled, and the figure has been corrected to remove this discrepancy. In our experiments, ATP and orthovanadate (VO<sub>4</sub>) were added together, and for simplicity this was annotated as “ATP-VO<sub>4</sub>.” 

      “[Regarding Figure 2B] What is the fold change in MsbA iBAQ values? It seems that the differences are quite small, and as such require a more quantitative approach than iBAQ (e.g SILAC or some other internal standard). In addition, what information does this panel add relative to 2C”

      The figure has been updated to clarify that the values shown are log₂transformed iBAQ intensities. Figures 2B and 2C are complementary: Figure 2B shows that in the control sample, MsbA’s peptide abundance decreases with temperatures (51, 56, and 61 °C) relative to the remaining bulk proteins. Figure 2C shows the specific thermal profiles of MsbA in control and ATP–vanadate conditions. To make this clearer, we have added a sentence to the Results section explaining the specific role of Figure 2B.

      Together, these panels indicate that the method can identify ligand-induced stabilization even for proteins whose abundance decreases faster than the bulk during the TPP assay. We have provided the rationale for not using SILAC or TMT labeling in our public response.

      “[Regarding Figure 2C] Although not mentioned in the legend, I assume this is iBAQ quantification, which as mentioned above isn't accurate enough for such small differences. In addition, I find this data confusing: why is MsbA more stable at the lower temperatures in the absence of ATP-vanadate? The smoothed-line representation is misleading, certainly given the low number of data points”

      The data presented represent LFQ values for MsbA, and we have updated the figure legend to clearly indicate this. Additionally, as suggested, we have removed the smoothing line to more accurately reflect the data. Regarding the reviewer’s concern about stability at lower temperatures, we note that MsbA exhibits comparable abundance at 38 °C and 46 °C under both conditions, with overlapping error bars. We therefore interpret these data as indicating no significant difference in stability at the lower temperatures, with ligand-dependent stabilization becoming apparent only at elevated temperatures. We do not exclude the possibility that MsbA stability at these temperatures is affected by the conformational dynamics of this ABC transporter upon ATP binding and hydrolysis.

      “[Regarding Figure 3A] is this raw LFQ data? Why did the authors suddenly change from iBAQ to LFQ? I find this inconsistency puzzling”

      To clarify, all analyses of protein stabilization or destabilization presented in the manuscript are based on LFQ values. The only instance where iBAQ was used is Figure 2B, where it served to illustrate the relative peptide abundance of MsbA within the same sample. We have revised the figure legends and text to make this distinction explicit and ensure consistency in presentation.

      “[Regarding Figure 3B] The non-specific ATP-dependent stabilization increases the likelihood of false positive hits. This limitation is not mentioned by the authors. I think it is important to show other small molecules, in addition to ATP. The authors suggest that their approach is highly relevant for drug screening. Therefore, a good choice is to test an effect of a known stabilizing drug (eg VX-809 and CFTR)”

      We thank the reviewer for this suggestion. As noted in the manuscript (results and discussion sections), ATP is a natural hydrotrope and is therefore expected to induce broad, non-specific stabilization effects, a phenomenon also observed in previous proteome-wide studies, which demonstrated ATP’s widespread influence on cytosolic protein solubility and thermal stability (PMID: 30858367). To demonstrate that MM-TPP can resolve specific ligand–protein interactions beyond these global ATP effects, we tested 2-methylthio-ADP (2-MeS-ADP), a selective agonist of P2RY12 (PMID: 14755328). In these experiments, we observed robust and reproducible stabilization of P2RY12 at both 51°C and 57°C, with no consistent stabilization of unrelated proteins across temperatures. This provides direct evidence that our workflow can distinguish specific from non-specific ligand-induced effects. We selected 2-MeS-ADP due to its structural stability and receptor higher-affinity over ADP, allowing us to extend our existing workflow while testing a receptor-specific interaction. We agree that extending this approach to clinically relevant small-molecule drugs, such as VX-809 with CFTR, would further underscore the pharmacological potential of MM-TPP, and we have now noted this as an important avenue for future studies.

      “X axis of Figure 3B: Log 2 fold difference of what? iBAQ? LFQ? Similar ambiguity regarding the Y axis of 3E. What peptide? And why the constant changes in estimating abundances?”

      We thank the reviewer for pointing out these inaccuracies in the figure annotations. As mentioned above, all analyses (except Figure 2B) are based on LFQ values. We have revised the figure legends and text to make this clear.

      In Figure 3E, “peptide intensity” refers to log2 LFQ peptide intensities derived from the BCS1L protein, as indicated in the figure caption. 

      “The authors suggest that P2RY6 and P2RY12 are stabilized by ADP, the hydrolysis product of ATP. Currently, the support for this suggestion is highly indirect. To support this claim, the authors need to directly show the effect of ADP. In reference to the alpha fold results shown in Figure 4D, the authors state that "Collectively, these data highlight the ability of MM-TPP to detect the side effects of parent compounds, an important consideration for drug development". To support this claim, it is necessary to show that Mao-B is indeed best stabilized with ADP or AMP, rather than ATP.”

      In this revision, we chose not to test ADP directly, as it is a broadly binding, relatively weak ligand that would likely stabilize many proteins without revealing clear target-specific effects. Since we had already evaluated ATP-VO₄, a similarly broad, non-specific ligand, additional testing with ADP would provide limited additional insight. Instead, we prioritized 2-methylthio-ADP, a selective agonist of P2RY12, to more effectively demonstrate the specificity of MM-TPP. With this ligand, we observed clear and reproducible stabilization of P2RY12, underscoring the ability of MM-TPP to resolve receptor–ligand interactions beyond ATP’s broad hydrotropic effects. Importantly, and as expected, we did not observe stabilization of the related purinergic receptor P2RY6, further supporting the specificity of the observed effect.

      We have also revised the AlphaFold-related statement in Figure 4D to adopt a more cautious tone: “Collectively, these data suggest that MM-TPP may detect potential side effects of parent compounds, an important consideration for drug development.” In this context, we use AlphaFold not as a validation tool, but rather as a structural aid to help rationalize why certain off-target proteins (e.g., ATP with Mao-B) exhibit stabilization.

      Reviewer #2 (Recommendations for the authors):

      “In the main text, it will be useful to include the unique peptides table of at least the targets discussed in the manuscript. For example, in presence of AMP-PNP at 51oC P2RY6 shows 4-6 peptides in all n=3 positive & negative ionization modes. But, for P2RY12 only 1-3 peptides were observed. Depending on the sequence length and the relative abundance in the cell of a protein of interest, the number of peptides observed could vary a lot per protein. Given the unique peptide abundance reported in the supplementary file, for various proteins in different conditions, it appears the threshold of observation of two unique peptides for a protein to be analyzed seems less stringent.”

      By applying a filter requiring at least two unique peptides in at least one replicate, we exclude, on average, 15–20% of the total identified proteins. We consider this a reasonable level of stringency that balances confidence in protein identification with the retention of relevant data. This threshold was selected because it aligns with established LC-MS/MS data analysis practices (PMID: 32591519, 33188197, 26524241), and we have included these references in the Methods section to justify our approach. We have included in this revision a Supplemental Table 2 showing the unique peptide counts for proteins highlighted in this study.  

      “It appears that the time of heat treatment for peptidisc library subjected to MM-TPP profiling was chosen as 3 min based on the results presented in Supplementary Figure 1A, especially the loss of MsbA observed in 1% DDM after 3 min heat perturbation. However, when reconstituted in peptidisc there seems to be no loss in MsbA even after 12 mins at 45oC. So, perhaps a longer heat treatment would be a more efficient perturbation.”

      Previous studies indicate that heat exposure of 3–5 minutes is optimal for visualizing protein denaturation (PMID: 23828940, 32133759). We have added a statement to the Results section to justify our choice of heat exposure. Although MsbA remains stable at 45 °C for extended periods, higher temperatures allow for more effective perturbation to reveal destabilization. Supplementary Figure 1A specifically illustrates MsbA instability in detergent environments.

      “Some of the stabilized temperatures listed in Table 1 are a bit confusing. For example, ABCC3 and ABCG2. In the case of ABCC3 stabilization was observed at 51oC and 60oC, but 56oC is not mentioned. In the same way, 51oC is not mentioned for ABCG2. You would expect protein to be stabilized at 56oC if it is stabilized at both 51oC and 60oC. So, it is unclear if the stabilizations were not monitored for these proteins at the missing temperatures in the table or if no peptides could be recorded at these temperatures as in the case of P2RX4 at 60oC in Figure 4C.”

      Both scenarios are represented in our data. For some proteins, like ABCG2, sufficient peptide coverage was achieved, but no stabilization was observed at intermediate temperatures (e.g., 56 °C), likely because the perturbation was not strong enough to reveal an effect. In other cases, such as ABCC3 at 56 °C or P2RX4 at 60 °C, the proteins were not detected due to insufficient peptide identifications at those temperatures, which explains their omission from the table. 

      “In Figure 4C, it is perplexing to note that despite n = 3 there were no peptide fragments detected for P2RX4 at 60oC in presence of ATP-VO4, but they were detected in presence of AMP-PNP. It will be useful to learn authors explanation for this, especially because both of these ligands destabilize P2RX4. In Figure 4B, it would have been great to see the effect of ADP too, to corroborate the theory that ATP metabolites could impact the thermal stability.”

      In Figure 4C, the absence of P2RX4 peptide detection at 60 °C with ATP–VO₄ mirrors variability observed in the corresponding control (n = 6). Specifically, neither the control nor ATP–VO₄ produced unique peptides for P2RX4 at 60 °C in that replicate, whereas peptides were detected at 60 °C in other replicates for both the control and AMPPNP, and at 64 °C for ATP–VO<sub>4</sub>, the controls, and AMP-PNP. Such missing values are a natural feature of MS-based proteomics and can arise from multiple technical factors, including inconsistent heating, incomplete digestion, stochastic MS injection, or interference from Peptidisc peptides. We therefore interpret the absence of peptides in this replicate as a technical artifact rather than evidence against protein destabilization. Importantly, the overall dataset consistently shows that both ATP–VO₄ and AMP-PNP destabilize P2RX4, supporting their characterization as broad, non-specific ligands with off-target effects.

      Because ATP and ADP belong to the same class of broadly binding, non-specific ligands, additional testing with ADP would not provide meaningful mechanistic insight. Instead, we chose to test 2-methylthio-ADP, a selective P2RY12 agonist. This experiment revealed robust, reproducible stabilization of P2RY12, without consistent effects on unrelated proteins at 51 °C and 57 °C, thereby demonstrating the ability of MM-TPP to detect specific receptor–ligand interactions.

      Finally, we note that P2RX4 is not a primary target of ATP–VO<sub>4</sub> or AMP-PNP. Consequently, the observed destabilization of P2RX4 is expected to be less pronounced than the strong, physiologically consistent stabilization of ABC transporters by ATP–VO<sub>4</sub>, as shown in Figure 3D, where the majority of ABC transporters are thermally stabilized across all tested temperatures.

      “As per Figure 4, P2Y receptors P2RY6 and P2RY12 both showed great thermal stability in presence of ATP-VO4 despite their preference for ADP. The authors argue this could be because of ATP metabolism, and binding of the resultant ADP to the P2RY6. If P2RX4 prefers ATP and not the metabolized product ADP that apparently is available, ideally you should not see a change in stability. A stark destabilization would indicate interaction of some sorts. P2X receptors are activated by ATP and are not naturally activated by AMP-PNP. So, destabilization of P2RX4 upon binding to ATP that can activate P2X receptors is conceivable. However, destabilization both in presence of ATP-VO4 and AMP-PNP is unclear. It is perhaps useful to test effect of ADP using this method, and maybe even compare some antagonists such as TNPATP.”

      In this study, we did not directly test ADP, as we had already demonstrated that MM-TPP detects stabilization by broad-binding ligands such as ATP–VO₄. Instead, we focused on a more selective ligand, 2-MeS-ADP, a specific agonist of P2RY12 [PMID: 14755328]. Here, we observed robust and reproducible stabilization of P2RY12 at 51 °C and 57 °C, while P2RY6 showed no significant changes, and no other proteins were consistently stabilized (Figure 4B, S4). This confirms that MM-TPP can distinguish specific ligand–receptor interactions from broader ATP-induced effects. To further explore the assay’s nuance and sensitivity, testing additional nucleotide ligands—including antagonists like TNP-ATP or ATPγS—would provide valuable insights, and we have identified this as an important future direction.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the present manuscript, Mashiko and colleagues describe a novel phenotype associated with deficient SLC35G3, a testis-specific sugar transporter that is important in glycosylation of key proteins in sperm function. The study characterizes a knockout mouse for this gene and the multifaceted male infertility that ensues. The manuscript is well-written and describes novel physiology through a broad set of appropriate assays.

      Strengths:

      Robust analysis with detailed functional and molecular assays

      Weaknesses:

      (1) The abstract references reported mutations in human SLC35G3, but this is not discussed or correlated to the murine findings to a sufficient degree in the manuscript. The HEK293T experiments are reasonable and add value, but a more detailed discussion of the clinical phenotype of the known mutations in this gene and whether they are recapitulated in this study (or not) would be beneficial.

      Since no patients have been identified, our experiment was conducted to investigate the activity of the mutation found in humans.

      (2) Can the authors expand on how this mutation causes such a wide array of phenotypic defects? I am surprised there is a morphological defect, a fertilization defect, and a transit defect. Do the authors believe all of these are present in humans as well?

      Thank you for your comment. There are many glycoprotein-coding genes that influence sperm head morphology, fertilization defect, and transit defect have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm. 

      Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile, but females were fertile. Slc35g3-null males produced a normal sperm count, but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number of sperm proteins in the testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired their transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations need to be revised.

      Thank you for comments. We revised interpretations.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction could be structured more efficiently. Much of what is discussed in the first paragraph appears to be redundant to the second paragraph (or perhaps unrelated to the present manuscript).

      In the Introduction, we described the process of glycoprotein formation, 1) quality control or nascent glycoproteins in the ER and its relations importance in sperm fertilizing ability, 2) glycan maturation in the Golgi apparatus and its importance in sperm fertilizing ability, and 3) the supply of nucleotide sugars as the basis of these processes. 

      We would like to retain this structure in the revised manuscript and appreciate your understanding.

      (2) Given the significant difference in morphology between murine and human sperm, can the authors comment on whether these findings are directly translatable to humans?

      Thank you for your comment. There are significant differences in sperm morphology between mice and humans, but many glycoprotein-coding genes that influence sperm head morphology have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm head morphology. Observing sperm samples from individuals with SLC35G3 mutations is the most direct approach to verify this point and is considered an important goal for future research. The following text has been added to clarify the point:

      New Line 338; While these proteins are also found in humans, it is still too early to infer the importance of SLC35G3 in the morphogenesis of human sperm heads. Observing sperm samples from individuals with SLC35G3 mutations would be the most direct approach to address this, and we consider it an important objective for future studies.

      (3) Line 194 - while the inability to pass the UTJ may indeed be a component of this infertility phenotype, I would argue that a complete lack of ability to fertilize (even with IVF but not ICSI) suggests that the primary defect is elsewhere. This statement should be removed, and the topic of these two separate mechanisms should be compared/contrasted in the discussion.

      We agree that this is an overstatement, so we changed it;

      New line 187; Thus, the defective UTJ migration is one of the primary causes of Slc35g3-/- male infertility. 

      We believe the current statement in the discussion can stay as it is. 

      Line 379; We reaffirmed that glycosylation-related genes specific to the testis play a crucial role in the synthesis, quality control, and function of glycoproteins on sperm, which are essential for male fertility through their interactions with eggs and the female reproductive system.

      (4) Did the authors consider performing TEM to assess the sperm ultrastructure and the acrosome?

      Since morphological abnormalities were evident even at the macro level, TEM was not performed in this study. In the future, we plan to use immune-TEM against affected/non-affected glycoproteins when the antibodies become available.

      (5) I would argue that Figure 3 should not be labeled as "essential", given the abnormal sperm head morphology compared to humans, the relatively modest difference between the groups on PCA, and more broadly speaking, the relatively poor correlation with morphology and human male infertility. While globozoospermia is clearly an exception, the data in this figure may not translate to human sperm and/or may not be clinically relevant even if it does.

      Indeed, other KO spermatozoa with similar morphological features are known to cause a reduction in litter size but do not result in complete infertility. As discussed in line 1, this head shape is not essential for fertilization. Reviewer 2 also pointed out that the phrase "Slc35g3 is essential for sperm head formation" is too strong; therefore, we would like to revise Fig3 title to "Slc35g3 is involved in the regulation of sperm head morphology."

      (6) Have the authors generated slc35b4 KO mice?

      No, we did not. Since Slc35b4 is expressed throughout the body, a straight knockout may affect other organs or developmental processes. To investigate its role specifically in the testis, it will be necessary to generate a conditional knockout (cKO) model. As this requires considerable cost, time, and labor, we would like to leave it for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 122-123: "it is prominently expressed in the testis, beginning 21 days postpartum (Figure 1B), suggesting expression from the secondary spermatocyte stage to the round spermatid stage in mice." Day 21 indicates the first appearance of round spermatids, but not secondary spermatocytes. Please change to the following: ...suggesting that its expression begins in round spermatids in mice.

      I agree with your comment and have revised the text accordingly (New line 114).

      (2) Figure 1E: What germ cells are they? The type of germ cells needs to be labelled on the image. Double staining with a germ cell marker would be helpful to distinguish germ cells from testicular somatic cells.

      Thank you for your comment. We replaced the Figure 1E as follows.

      To distinguish germ cells from testicular somatic cells, we used the germ cell marker TRA98 antibody. Furthermore, based on the nuclear and GM130 staining pattern, we consider that the Golgi apparatus of round spermatids is labeled.

      (3) Figure 2C: The most abundant WB band is between 20 and 25 kD and is non-specific. Does the arrow point to the expected SLC35G3 band? There are two minor bands above the main non-specific band. Are both bands specific to SLC35G3? Given the strong non-specific band on WB, how specific is the immunofluorescence signal produced by this antibody? These need to be explained and discussed.

      The arrow pointed to the expected size (35kDa).

      We thought that these non-specific bands could be due to blood contamination, so we retried with testicular germ cells. We confirmed that non-specific bands disappeared in the subsequent Western blot analysis. The specificity of the immunofluorescence signal is supported by its complete absence in the KO, as shown in the Supplementary Figures. We have decided to include this improved dataset. Thank you for your comment, which helped us improve the data.

      Author response image 1.

      (4) Line 184: "Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability, but genomic integrity is intact." Producing viable offspring does not necessarily mean that genomic integrity is intact. Suggestion: Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability but produce viable offspring. Likewise, the Figure S9 caption also needs to be changed.

      Thank you for your constructive comment. We have revised the text as you suggested.

      (5) Figure 3. "Slc35g3 is essential for sperm head formation". This statement is too strong. It is not essential for sperm head formation. The sperm head is still formed, but shows subtle deformation.

      Thank you for your suggestion. We changed as follows:

      FIg.3; ”Slc35g3 is involved in the regulation of sperm head morphology.”

      (6) Lines 204-205: Figure 6B: "Interestingly, some bands of sperm acrosome-associated 1 (SPACA1; 26) disappeared in Slc35g3-/- testis lysates." I don't see the absence of SPACA1 bands in -/- testis. This needs to be clearly labeled with arrows. On the contrary, the bands are stronger in Slc35g3-/- testis lysates.

      Thank you for your comment. After carefully considering your comments, we concluded that using "disappeared" is indeed inappropriate. We would like to revise the sentence as follows: New line 197; "Interestingly, SPACA1 (Sperm Acrosome Associated 1; 26) exhibited a subtle difference in banding pattern in the Slc35g3-/- testis lysate."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNAbinding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      We thank the reviewer for raising this important point. Indeed, the deletion of exon 2 introduces a frameshift that is predicted to disrupt the PTBP1 open reading frame and trigger nonsensemediated decay (NMD). While our CPM-normalized coverage plots (Figure 4D) and gene-level expression analysis (Figure 6A) suggest that PTBP1 mRNA levels remain largely unchanged in cKO astrocytes, we acknowledge that this observation is counterintuitive and merits further clarification.

      We suspect that the process of brain tissue dissociation and FACS sorting for bulk or single cell RNA-seq may enrich for nucleic material and thus dilute the NMD signal, which occurs in the cytoplasm. Alternatively, the transcripts (like other genes) may escape NMD for unknown mechanisms. Although a frameshift is a strong indicator for triggering NMD, it does not guarantee NMD will occur in every case. (lines 346-353)

      Regarding the validation of PTBP1 protein depletion in cKO astrocytes by Western blotting, we acknowledge that orthogonal approaches to confirm PTBP1 elimination would address uncertainty around the effect of exon 2 deletion on PTBP1 expression. The low cell yield of cKO astrocytes vis FACS poses a significant burden on obtaining sufficient samples for immunoblotting detection of PTBP1 depletion. On average 3-5 adult animals per genotype (with three different alleles) are needed for each biological replicate. The manuscript contains PTBP1 immunofluorescence staining of brain slides to demonstrate PTBP1 deletion (Figures 1-2, Figure 3 supplement 1). Our characterization of this Ptbp1 deletion allele in other contexts show the loss of full length PTBP1 proteins in ESCs using Western blotting (PMID: 30496473). Furthermore, germline homozygous mutant mice do not survive beyond embryonic day 6, supporting that it is a loss of function allele.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      We thank the reviewer for pointing out this important question. Although we are using an astrocyte-specific PTBP1 knockout (KO) mouse model, which is designed to delete PTBP1 in all the astrocyte throughout mouse brain, and although we have systematically verified PTBP1 elimination in different mouse brain regions (cortex and striatum) at multiple time points (from 4w to 12w after tamoxifen administration), we agree that it remains necessary and important to demonstrate whether the observed lack of astrocyte-to-neuron conversion is indeed associated with sufficient PTBP1 depletion.

      We have analyzed the PTBP1 expression in the substantia nigra, as we did in the cortex and striatum. We added a new figure (Figure 3-figure supplement 1) to show the results. We found in cKO samples, tdT+ cells lack PTBP1 immunostaining, and there is no overlapping of NeuN+ and tdT+ signals. These results show effective PTBP1 depletion in the substantia nigra, similar to that observed in the cortex and striatum. (line 221-224)

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      We thank the reviewer for this insightful comment. We agree that assessing the positional distribution of CU-rich motifs between PTBP1-activated and PTBP1-repressed exons would provide valuable insight into the position-specific regulatory mechanisms of PTBP1. In response, we have performed separate motif enrichment analyses for PTBP1-activated and PTBP1-repressed exons and examined whether their positional patterns differ (Figure 4–figure supplement 2).

      Our analysis revealed that CU-rich motifs were significantly enriched in the upstream introns of both activated and repressed exons by PTBP1 loss, with higher enrichment observed in repressed exons (Enrichment ratio = 2.14, q = 9.00×10-5) compared to activated exons (Enrichment ratio = 1.72, q = 7.75×10-5) (Figure 4–figure supplement 2B–C). In contrast, no CU-rich motifs were found downstream of activated exons (Figure 4–figure supplement 2D), while a weak, non-significant enrichment was observed downstream of repressed exons (Enrichment ratio = 1.21, q = 0.225; Figure 4–figure supplement 2E). These results do not necessarily fully fit with a couple of earlier PTBP1 CLIP studies showing differential PTBP1 binding for repressed vs activated exons but are more in line with the Black Lab study (PMID: 24499931) that PTBP1 binds upstream introns of both repressed and activated exons. Either case, PTBP1 affects a diverse set of alternative exons and likely involves diverse contextdependent binding patterns (lines 244-257).

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      We would like to express our gratitude for the thoughtful feedback. We agree that transcriptome-wide differences in gene expression between astrocytes and developing neurons could confound the interpretation of splicing differences. To address this concern, we have incorporated publicly available RNA-seq datasets from studies in which astrocytes are reprogrammed into neurons using proneural transcription factors, Ngn2 or PmutNgn2 (PMID: 38956165).

      The results of principal component analysis (PCA) for splicing profiles revealed that the in vivo splicing profiles from this study and the in vitro splicing profiles from PMID 38956165 are well separated on PC1 and PC2. While Ngn2/PmutNgn2-induced neurons and control astrocytes started to show distinction on PC3 (and to some degree on PC4), Ptbp1 cKO samples remained tightly grouped with control astrocytes and showed no directional shift toward the neuronal cluster (Figure 5–figure supplement 2B). These findings further support the conclusion that PTBP1 depletion in mature astrocytes does not induce a neuronal-like splicing program, even when compared against neurons derived from the astrocyte lineage (lines 306318).

      The pairwise correlation analysis of percent spliced in between Ptbp1 cKO, control astrocytes, and induced neurons confirmed that Ptbp1 cKO astrocytes are highly similar to control astrocytes (ρ = 0.81) and clearly distinct from induced neurons (ρ = 0.62) (Figure 5– figure supplement 2C), reinforcing the notion that PTBP1 loss alone is insufficient to drive a neuronal-like splicing transition (lines 319-336).

      Consistent with the analysis for splicing profiles, PCA for gene expression profiles showed that control and Ptbp1 cKO astrocytes clustered tightly together and no directional shift toward the neuronal cluster while Ngn2/PmutNgn2-induced neurons and control astrocytes were distributed across a broader range (Figure 6–figure supplement 1A–B). Correlation analysis further supported this result, with a strong similarity between Ptbp1 cKO and control astrocytes (ρ = 0.97), and low similarity between Ptbp1 cKO astrocytes and induced neurons (ρ = 0.27) (Figure 6–figure supplement 1C). These findings indicate that, even with PTBP1 loss, cKO astrocytes retain a transcriptional profile very distinct from that of neurons, underscoring that Ptbp1 deficiency alone does not induce astrocyte-to-neuron reprogramming at the transcriptomic level (lines 366-373).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.

      To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

      We are grateful for the reviewer’s careful reading and valuable suggestions. We have expanded the Discussion to include discussion of possible origins of glial cells responsible for neuronal transition. (lines 441-461)

      Reviewer #1 (Recommendations for the authors):

      (1) Throughout the text and figures, it is customary to write loxP with a capital "P".

      We have capitalized “P” in loxP throughout the text and figures.

      (2) It would be helpful to indicate the brain regions analyzed above the images in Figure 1B-C, Figure 2A-B, Figure 1 - Supplement 3, and Figure 2 - Supplement 2, as was done in Figure 1 - Supplement 1.

      The labels indicating brain regions of corresponding images have been added to the figures. 

      (3) The arrowheads in Figure 1C, Figure 2B, Figure 3, and several supplemental panels are nearly equilateral triangles, making their direction difficult to discern. Consider using a more slender or indented design (e.g., ➤).

      We have replaced triangular arrowheads with indented arrowheads in the figures. 

      (4) Lines 181-209: This section should be revised, given that the striatum is not a midbrain structure.

      We have revised this section to reflect our analysis of the striatum as a brain region of the nigrostriatal pathway rather than a midbrain structure. 

      Reviewer #2 (Recommendations for the authors):

      In Supplemental Figure 1, the two open triangles are almost indistinguishable. It would be better if the colors of these open triangles were changed so that it is easier to tell what's what. There is not enough contrast between white and yellow.

      We have changed the open triangle arrowheads to solid yellow and violet arrowheads to improve contrast between labels.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper presents a model for sequence generation in the zebra finch HVC, which adheres to cellular properties measured experimentally. However, the model is fine-tuned and exhibits limited robustness to noise inherent in the inhibitory interneurons within the HVC, as well as to fluctuations in connectivity between neurons. Although the proposed microcircuits are introduced as units for sub-syllabic segments (SSS), the backbone of the network remains a feedforward chain of HVC_RA neurons, similar to previous models.

      Strengths:

      The model incorporates all three of the major types of HVC neurons. The ion channels used and their kinetics are based on experimental measurements. The connection patterns of the neurons are also constrained by the experiments.

      Weaknesses:

      The model is described as consisting of micro-circuits corresponding to SSS. This presentation gives the impression that the model's structure is distinct from previous models, which connected HVC_RA neurons in feedforward chain networks (Jin et al 2007, Li & Greenside, 2006; Long et al 2010; Egger et al 2020). However, the authors implement single HVC_RA neurons into chain networks within each micro-circuit and then connect the end of the chain to the start of the chain in the subsequent micro-circuit. Thus, the HVC_RA neuron in their model forms a single-neuron chain. This structure is essentially a simplified version of earlier models.

      In the model of the paper, the chain network drives the HVC_I and HVC_X neurons. The role of the micro-circuits is more significant in organizing the connections: specifically, from HVC_RA neurons to HVC_I neurons, and from HVC_I neurons to both HVC_X and HVC_RA neurons.

      We thank Reviewer 1 for their thoughtful comments.

      While the reviewer is correct about the fact that the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, we need to emphasize that this is true only if there is no intrinsic or synaptic perturbation to the HVC network. For example, we showed in Figures 10 and 12 how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC<sub>RA</sub> neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics. Moreover, all existing models that describe premotor sequence generation in the HVC either assume a distributed model (Elmaleh et al., 2021) that dictates that local HVC circuitry is not sufficient to advance the sequence but rather depends upon moment to-moment feedback through Uva (Hamaguchi et al., 2016), or assume models that rely on intrinsic connections within HVC to propagate sequential activity. In the latter case, some models assume that HVC is composed of multiple discrete subnetworks that encode individual song elements (Glaze & Troyer, 2013; Long & Fee, 2008; Wang et al., 2008), but lacks the local connectivity to link the subnetworks, while other models assume that HVC may have sufficient information in its intrinsic connections to form a single continuous network sequence (Long et al. 2010). The HVC model we present extends the concept of a feedforward network by incorporating additional neuronal classes that influence the propagation of activity (interneurons and HVC<sub>X</sub> neurons). We have shown that any disturbance of the intrinsic or synaptic conductances of these latter neurons will disrupt activity in the circuit even when HVC<sub>RA</sub> neurons properties are maintained. 

      In regard to the similarities between our model and earlier models, several aspects of our model distinguish it from prior work. In short, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. We tuned the intrinsic and the synaptic properties bases on the traces collected by Daou et al. (2013) and Mooney and Prather (2005) as shown in Figure 3. The three classes of model neurons incorporated to our network as well as the synaptic currents that connect them are based on Hodgkin- Huxley formalisms that contain ion channels and synaptic currents which had been pharmacologically identified. This is an advancement over prior models that primarily focused on the role of synaptic interactions or external inputs. The model is based on feedforward chain of microcircuits that encode for the different sub-syllabic segments and that interact with each other through structured feedback inhibition, defining an ordered sequence of cell firing. Moreover, while several models highlight the critical role of inhibitory interneurons in shaping the timing and propagation of bursts of activity in HVC<sub>RA</sub> neurons, our work offers an intricate and comprehensive model that help understand this critical role played by inhibition in shaping song dynamics and ensuring sequence propagation.

      How useful is this concept of micro-circuits? HVC neurons fire continuously even during the silent gaps. There are no SSS during these silent gaps.

      Regarding the concern about the usefulness of the 'microcircuit' concept in our study, we appreciate the comment and we are glad to clarify its relevance in our network. While we acknowledge that HVC<sub>RA</sub> neurons interconnect microcircuits, our model's dynamics are still best described within the framework of microcircuitry particularly due to the firing behavior of HVC<sub>X</sub> neurons and interneurons. Here, we are referring to microcircuits in a more functional sense, rather than rigid, isolated spatial divisions (Cannon et al. 2015), and we now make this clear on page 21. A microcircuit in our model reflects the local rules that govern the interaction between all HVC neuron classes within the broader network, and that are essential for proper activity propagation. For example, HVC<sub>INT</sub> neurons belonging to any microcircuit burst densely and at times other than the moments when the corresponding encoded SSS is being “sung”. What makes a particular interneuron belong to this microcircuit or the other is merely the fact that it cannot inhibit HVC<sub>RA</sub> neurons that are housed in the microcircuit it belongs to. In particular, if HVC<sub>INT</sub> inhibits HVC<sub>RA</sub> in the same microcircuit, some of the HVC<sub>RA</sub> bursts in the microcircuit might be silenced by the dense and strong HVC<sub>INT</sub> inhibition breaking the chain of activity again. Similarly, HVC<sub>X</sub> neurons were selected to be housed within microcircuits due to the following reason: if an HVC<sub>X</sub> neuron belonging to microcircuit i sends excitatory input to an HVC<sub>INT</sub> neuron in microcircuit j, and that interneuron happens to select an HVC<sub>RA</sub> neuron from microcircuit i, then the propagation of sequential activity will halt, and we’ll be in a scenario similar to what was described earlier for HVC<sub>INT</sub> neurons inhibiting HVC<sub>RA</sub> neurons in the same microcircuit.

      We agree that there are no sub-syllabic segments described during the silent gaps and we thank the reviewer to pointing this out. Although silent gaps are integral to the overall process of song production, we have not elaborated on them in this model due to the lack of a clear, biophysically grounded representation for the gaps themselves at the level of HVC. Our primary focus has been on modeling the active, syllable-producing phases of the song, where the HVC network’s sequential dynamics are critical for song. However, one can think the encoding of silent gaps via similar mechanisms that encode SSSs, where each gap is encoded by similar microcircuits comprised of the three classes of HVC neurons (let’s call them GAP rather than SSS) that are active only during the silent gaps. In this case, the propagation of sequential activity is carried throughout the GAPs from the last SSS of the previous syllable to the first SSS of the subsequent syllable. This is no described more clearly on page 22 of the manuscript.

      A significant issue of the current model is that the HVC_RA to HVC_RA connections require fine-tuning, with the network functioning only within a narrow range of g_AMPA (Figure 2B). Similarly, the connections from HVC_I neurons to HVC_RA neurons also require fine-tuning. This sensitivity arises because the somatic properties of HVC_RA neurons are insufficient to produce the stereotypical bursts of spikes observed in recordings from singing birds, as demonstrated in previous studies (Jin et al 2007; Long et al 2010). In these previous works, to address this limitation, a dendritic spike mechanism was introduced to generate an intrinsic bursting capability, which is absent in the somatic compartment of HVC_RA neurons. This dendritic mechanism significantly enhances the robustness of the chain network, eliminating the need to fine-tune any synaptic conductances, including those from HVC_I neurons (Long et al 2010). Why is it important that the model should NOT be sensitive to the connection strengths?

      We thank the reviewer for the comment. While mathematical models designed for highly complex nonlinear biological processes tangentially touch the biological realism, the current network as is right now is the first biologically realistic-enough network model designed for HVC that explains sequence propagation. We do not include dendritic processes in our network although that increases the realistic dynamics for various reasons. 1) The ion channels we integrated into the somatic compartment are known pharmacologically (Daou et al. 2013), but we don’t know about the dendritic compartment’s intrinsic properties of HVC neurons and the cocktail of ion channels that are expressed there. 2) We are able to generate realistic bursting in HVC<sub>RA</sub> neurons despite the single compartment, and the main emphasis in this network is on the interactions between excitation and inhibition, the effects of ion channels in modulating sequence propagation, etc … 3) The network model already incorporates thousands of ODEs that govern the dynamics of each of the HVC neurons, so we did not want to add more complexity to the network especially that we don’t know the biophysical properties of the dendritic compartments.

      Therefore, our present focus is on somatic dynamics and the interaction between HVC<sub>RA</sub> and HVC<sub>INT</sub> neurons, but we acknowledge the importance of these processes in enhancing network resiliency. Although we agree that adding dendritic processes improves robustness, we still think that somatic processes alone can offer insightful information on the sequential dynamics of the HVC network. While the network should be robust across a wide range of parameters, it is also essential that certain parameters are designed to filter out weaker signals, ensuring that only reliable, precise patterns of activity propagate. Hence, we specifically chose to make the HVC<sub>RA</sub>-to-HVC<sub>RA</sub> excitatory connections more sensitive (narrow range of values) such that only strong, precise and meaningful stimuli can propagate through the network representing the high stereotypy and precision seen in song production.

      First, the firing of HVC_I neurons is highly noisy and unreliable. HVC_I neurons fire spontaneous, random spikes under baseline conditions. During singing, their spike timing is imprecise and can vary significantly from trial to trial, with spikes appearing or disappearing across different trials. As a result, their inputs to HVC_RA neurons are inherently noisy. If the model relies on precisely tuned inputs from HVC_I neurons, the natural fluctuations in HVC_I firing would render the model non-functional. The authors should incorporate noisy HVC_I neurons into their model to evaluate whether this noise would render the model non-functional.

      We acknowledge that under baseline and singing settings, interneurons fire in an extremely noisy and inaccurate manner, although they exhibit time locked episodes in their activity (Hahnloser et al 2002, Kozhinikov and Fee 2007). In order to mimic the biological variability of these neurons, our model does, in fact, include a stochastic current to reflect the intrinsic noise and random variations in interneuron firing shown in vivo (and we highlight this in the Methods). However, to make sure the network is resilient to this randomness in interneuron firing, introduced a stochastic input current of the form I<sub>noise</sub> (t)= σ.ξ(t) where ξ(t) is a Gaussian white noise with zero mean and unit variance, and σ is the noise amplitude. This stochastic drive was introduced to every model neuron and it mimics the fluctuations in synaptic input arising from random presynaptic activity and background noise. For values of σ within 1-5% of the mean synaptic conductance, the stochastic current has no effect on network propagation. For larger values of σ, the desired network activity was disrupted or halted. We now talk about this on page 22 of the manuscript.  

      Second, Kosche et al. (2015) demonstrated that reducing inhibition by suppressing HVC_I neuron activity makes HVC_RA firing less sparse but does not compromise the temporal precision of the bursts. In this experiment, the local application of gabazine should have severely disrupted HVC_I activity. However, it did not affect the timing precision of HVC_RA neuron firing, emphasizing the robustness of the HVC timing circuit. This robustness is inconsistent with the predictions of the current model, which depends on finely tuned inputs and should, therefore, be vulnerable to such disruptions.

      We thank the reviewer for the comment. The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. Hence, Kosche et al. (2015) findings do not invalidate the approach of our model, but highlights that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision. 

      Third, the reliance on fine-tuning of HVC_RA connections becomes problematic if the model is scaled up to include groups of HVC_RA neurons forming a chain network, rather than the single HVC_RA neurons used in the current work. With groups of HVC_RA neurons, the summation of presynaptic inputs to each HVC_RA neuron would need to be precisely maintained for the model to function. However, experimental evidence shows that the HVC circuit remains functional despite perturbations, such as a few degrees of cooling, micro-lesions, or turnover of HVC_RA neurons. Such robustness cannot be accounted for by a model that depends on finely tuned connections, as seen in the current implementation.

      Our model of individual HVC<sub>RA</sub> neurons and as stated previously is reductive model that focuses on understanding the mechanisms that govern sequential neural activity. We agree that scaling the model to include many of HVC<sub>RA</sub> neurons poses challenges, specifically concerning the summation of presynaptic inputs. However, our model can still be adapted to a larger network without requiring the level of fine-tuning currently needed. In fact, the current fine-tuning of synaptic connections in the model is a reflection of fundamental network mechanisms rather than a limitation when scaling to a larger network. Besides, one important feature of this neural network is redundancy. Even if some neurons or synaptic connections are impaired, other neurons or pathways can compensate for these changes, allowing the activity propagation to remain intact.

      The authors examined how altering the channel properties of neurons affects the activity in their model. While this approach is valid, many of the observed effects may stem from the delicate balancing required in their model for proper function. In the current model, HVC_X neurons burst as a result of rebound activity driven by the I_H current. Rebound bursts mediated by the I_H current typically require a highly hyperpolarized membrane potential. However, this mechanism would fail if the reversal potential of inhibition is higher than the required level of hyperpolarization. Furthermore, Mooney (2000) demonstrated that depolarizing the membrane potential of HVC_X neurons did not prevent bursts of these neurons during forward playback of the bird's own song, suggesting that these bursts (at least under anesthesia, which may be a different state altogether) are not necessarily caused by rebound activity. This discrepancy should be addressed or considered in the model.

      In our HVC network model, one goal with HVC<sub>X</sub> neurons is to generate bursts in their underlying neuron population. Since HVC<sub>X</sub> neurons in our model receive only inhibitory inputs from interneurons, we rely on inhibition followed by rebound bursts orchestrated by the I<sub>H</sub> and the I<sub>CaT</sub> currents to achieve this goal. The interplay between the T-type Ca<sup>++</sup> current and the H current in our model is fundamental to generate their corresponding bursts, as they are sufficient for producing the desired behavior in the network. Due to this interplay, we do not need significant inhibition to generate rebound bursts, because the T-type Ca<sub>++</sub> current’s conductance can be stronger leading to robust rebound bursting even when the degree of inhibition is not very strong. This is now highlighted on page 42 in the revised version.

      Some figures contain direct copies of figures from published papers. It is perhaps a better practice to replace them with schematics if possible.

      We wanted on purpose to keep the results shown in Mooney and Prather (2005) to be shown as is, in order to compare them with our model simulations highlighting the degree of resemblance. We believe that creating schematics of the Mooney and Prather (2005) results will not have the same impact, similarly creating a schematic for Hahnloser et al (2002) results won’t help much. However, if the reviewer still believes that we should do that, we’re happy to do it.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry? And how is the sparse precise bursting in HVC related to a songbird's vocalizations? The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three major classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers. The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K+ currents or Ca-dependent K+ currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. In the introduction, the authors do not discuss adequately which questions recent experimental and theoretical work have failed to explain adequately, concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      We thank the reviewer for this valuable comment, and we agree that we did not clarify enough throughout the paper the utility of our model or how it advanced our understanding of the HVC dynamics and circuitry. To that end, we revised several places of the manuscript and made sure to cite and highlight the relevance and relatedness of the mentioned papers.

      In short, and as mentioned to Reviewer 1, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015; Jin et al., 2007), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. 

      No existing hypothesis had been challenged with our model, rather; our model is a distillation of the various models that’s been proposed for the HVC network. We go over this in detail in the Discussion. We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, which includes some experimental details but ignores many others, match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied. However, these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      A major novelty of our work is the incorporation of experimental data with detailed network models. While earlier works have established robust burst propagation, our model uses realistic ion channel kinetics and feedback inhibition not only to reproduce experimental neural activity patterns but also to suggest prospective mechanisms for song sequence production in the most biophysical way possible. This aspect that distinguishes our work from other feed-forward models. We go over this in detail in the Discussion. However, the reviewer is right regarding the details of the calculations conducted for the fits, we will make sure to highlight this in the Methods and throughout the manuscript with more details.

      We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      Also missing is a discussion, or at least an acknowledgment, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some models can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of the nucleus HVC acts as a premotor circuit that drives the nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVCRA neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits. 

      Recent results showed a tight correlation between the intrinsic properties of neurons and features of song (Daou and Margoliash 2020, Medina and Margoliash 2024), where adult birds that exhibit similar songs tend to have similar intrinsic properties. While this is relevant, we acknowledge that not all details may be necessary for every aspect of vocalization, and future models could simplify concentrate on core dynamics and exclude certain features while still providing insights into the primary mechanisms.

      The question of whether HVC<sub>X</sub> neurons are relevant for burst propagation given that our model includes these neurons as part of the network for completeness, the reviewer is correct, the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, but only if there is no perturbation to the HVC network. For example, we have shown how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics.

      We agree with the reviewer however that a potential drawback of our model is that its sole focus is on local excitatory connectivity within the HVC (Kornfeld et al., 2017; Long et al., 2010), while HVC neurons receive afferent excitatory connections (Akutagawa & Konishi, 2010; Nottebohm et al., 1982) that plays significant roles in their local dynamics. For example, the excitatory inputs that HVC neurons receive from Uvaeformis may be crucial in initiating (Andalman et al., 2011; Danish et al., 2017; Galvis et al., 2018) or sustaining (Hamaguchi et al., 2016) the sequential activity. While we acknowledge this limitation, our main contribution in this work is the biophysical insights onto how the patterning activity in HVC is largely shaped by the intrinsic properties of the individual neurons as well as the synaptic properties where excitation and inhibition play a major role in enabling neurons to generate their characteristic bursts during singing. This is true and holds irrespective of whether an external drive is injected onto the microcircuits or not. We elaborated on this further in the revised version in the Discussion.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what the fitting algorithm was (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what the error function used for fitting (sum of least squares?) was, and what data were used for the fitting.

      The authors should also state clearly the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depended on the choice of the initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      We thank the reviewer again. The fitting process in our model occurred only at the first stage where the synaptic parameters were fit to the Mooney and Prather as well as the Kosche results. There was no data shared and we merely looked at the figures in those papers and checked the amplitude of the elicited currents, the magnitudes of DC-evoked excitations etc … and we replicated that in our model. While this is suboptimal, it was better for us to start with it rather than simply using equations for synaptic currents from the literature for other types of neurons (that are not even HVC’s or in the songbird) and integrate them into our network model. The number of ODEs that govern the dynamics of every model neuron is listed on page 10 of the manuscript as well as in the Appendix.  Moreover, we highlighted the details of this fitting process in the revised version.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      We agree that making experimental testable predictions is crucial for the advancement of the model. Our predictions include testing whether eradication of a class of neurons such as HVC<sub>X</sub> neurons disrupts activity propagation which can be done through targeted neuron elimination. This also can be done through preventing rebound bursting in HVC<sub>X</sub> by pharmacologically blocking the I<sub>H</sub> channels. Others include down regulation of certain ion channels (pharmacologically done through ion blockers) and testing which current is fundamental for song production (and there a plenty of test based our results, like the SK current, the T-type Ca<sup>2+</sup> current, the A-type K<sup>+</sup> current, etc…). We incorporated these into the Discussion of the revised manuscript to better demonstrate the model's applicability and to guide future research directions.

      Main issues:

      (1) Parameters are overly fine-tuned and often do not match known biology to generate chains. This fine-tuning does not reveal fundamental insights.

      (1a) Specific conductances (e.g. AMPA) are finely tweaked to generate bursts, in part due to a lack of a dendritic mechanism for burst generation. A dendritic mechanism likely reflects the true biology of HVC neurons.

      We acknowledge that the model does not include active dendritic processes and we do not regard this as a limitation. In fact, our present approach, although simplified, is intended to focus on somatic mechanisms to identify minimal conditions required for stable sequential propagation. We know HVC<sub>RA</sub> neurons possess thin, spiny dendrites which can contribute to burst initiation and shaping. Future models that include such nonlinear dendritic mechanisms would likely reduce the need for fine tuning of specific conductances at the soma and consequently better match the known biology of HVC<sub>RA</sub> neurons. 

      In text: “While our simplified, somatically driven architecture enables better exploration of mechanisms for sequence propagation, future extensions of the model will incorporate dendritic compartments to more accurately reflect the intrinsic bursting mechanisms observed in HVC<sub>RA</sub> neurons.”

      (1b) In this paper, microcircuits are simulated and then concatenated to make the HVC chain, resulting in no representations during silent gaps. This is out of touch with the known HVC function. There is no anatomical nor functional evidence for microcircuits of the kind discussed in this paper or in the earlier and rather similar paper by Eve Armstrong and Henry Abarbanel (J. Neurophy 2016). One can write a large number of papers in which one makes arbitrary unconstrained guesses of network structure in HVC and, unless they reveal some novel principle or surprising detail, they are all going to be weak.

      Although the model is composed of sequentially activated microcircuits, the gaps between each microcircuit’s output do not represent complete silence in the network. During these periods, other neurons such as those in other microcircuits may still exhibit bursting activity. Thus, what may appear as a 'silent gap' from the perspective of a given output microcircuit is, in fact, part of the ongoing background dynamics of the larger HVC neuron network. We fully acknowledge the reviewer's point that there is no direct anatomical or physiological evidence supporting the presence of microcircuits with this structure in HVC. Our intention was not to propose the existence of such a physical model but to use it as a computational simplification to make precise sequential bursting activity feasible given the biologically realistic neuronal dynamics used. Hence, our use of 'microcircuits' refers to a modeling construct rather than a structural hypothesis. Even if the network topology is hypothetical, we still believe that the temporal structuring suggested allows us to generate specific predictions for future work about burst timing and neuronal connections.

      (1c) HVC interneuron discharge in the author's model is overly precise; addressing the observation that these neurons can exhibit noisy discharge. Real HVC interneurons are noisy. This issue is critical: All reviewers strongly recommend that the authors should, at the minimum in a revision, focus on incorporating HVC-I noise in their model.

      We agree that capturing the variability in interneuron bursting is critical for biological realism. In our model, HVC interneurons receive stochastic background current that introduces variability in their firing patterns as observed in vivo. This variability is seen in our simulations and produces more biologically realistic dynamics while maintaining sequence propagation. We clarify this implementation in the Methods section. 

      (1d) Address the finding that Kosche et al show that even with reduced inhibition, HVCra neuronal timing is preserved; it is the burst pattern that is affected.

      The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. 

      We acknowledged this point in the discussion: “While findings of Kosche et al. (2015) emphasize the robustness of the HVC timing circuit to inhibition, our model is more sensitive to inhibition, highlighting that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision.”

      (1e) The real HVC is robust to microlesions, cooling, and HVCra neuron turnover. The model in this paper relies on precise HVCra connectivity and is not robust.

      Although our model is grounded in the biologically observed behavior of HVC neurons in vivo, we don’t claim that it fully captures the resilience seen in the HVC network. Instead, we see this as a simplified framework that helps us explore the basic principles of sequential activity. In the future, adding features like recurrent excitation, synaptic plasticity, or homeostatic mechanisms could make the model more robust.

      (1f) There is unclear motivation for Ih-driven HVCx bursting, given past findings from the Mooney group.

      Daou et al (2013) noticed that the observed in HVC<sub>X</sub> and HVC<sub>INT</sub> neurons in response to hyperpolarizing current pulses (Dutar et al. 1998; Kubota and Saito 1991; Kubota and Taniguchi 1998) was completely abolished after the application of the drug ZD 7288 in all of the neurons tested indicating that the sag in these HVC neurons is due to the hyperpolarization-activated inward current (I<sub>h</sub>). in addition, the sag and the rebound seen in these two neuron groups were larger as for larger hyperpolarization current pulses.

      (1g) The initial conditions of the network and its activity under those conditions, as well as the possible reliance on external inputs, are not defined.

      In our model, network activity is initiated through a brief, stochastic excitatory input to a small HVC<sub>RA</sub> neuron of one microcircuit. This drive represents a simplified version of external input from upstream brain regions known to project to HVC, such as nuclei in the high vocal center's auditory pathways such as Nif and Uva. Modeling the activity of these upstream regions and their influence on HVC dynamics is an ongoing research work to be published in the future.

      (1h) It has been known from the time of Hodgkin and Huxley how to include temperature dependences for neuronal dynamics so another suggestion is for the authors to add such dependences for the three classes of neurons and see if their simulation causes burst frequencies to speed up or slow down as T is varied.

      We added this as limitation to the discussion section: “Our model was run at a fixed physiological temperature, but it's well known going all the way back to Hodgkin and Huxley that both ion channel activity and synaptic dynamics can change with temperature. In future work, adding temperature scaling (like Q10 factors) could help us explore how burst timing and sequence speed change with temperature changes, and how neural activity in HVC would/would not preserve its precision under different physiological conditions.”

      (2) The scope of the paper and its objectives must be clearly defined. Defining the scope and providing caveats for what is not considered will help the reader contextualize this study with other work.

      (2a) The paper does not consider the role of external inputs to HVC, which are very likely important for the capacity of the HVC chain to tile the entire song, including silent gaps.

      The role of afferent input to HVC particularly from nuclei such as Uva and Nif is critical in shaping the timing and initiation of HVC sequences throughout the song, including silent intervals. In fact, external inputs are likely involved in more than just triggering sequences, they may also influence the continuity of activity across motifs. However, in this study, we chose to focus on the intrinsic dynamics of HVC as a step toward understanding the internal mechanisms required for generating temporally precise sequences and for this reason, we used a simplified external input only to initiate activity in the chain.

      (2b) The paper does not consider important dendritic mechanisms that almost certainly facilitate the all-or-none bursting behavior of HVC projection neurons. the authors need to mention and discuss that current-clamped neuronal response - in which an electrode is inserted into the soma and then a constant current-step is applied - bypasses dendritic structure and dendritic processing and so is an incomplete way to characterize a neuron's properties. In particular, claiming to fit current-clamp data accurately and then claiming that one now has a biophysically accurate network model, as the authors do, is greatly misleading.

      While we addressed this is 1a, we do not suggest that our model is a fully accurate biophysical representation of HVC network. Instead, we see it as a simplified framework that helps reveal how much of HVC’s sequential activity can be explained by somatic properties and synaptic interactions alone. However, additional biological mechanisms, like dendritic processing, are likely to play an important role and should be explored in future work.

      (2c) The introduction does not provide a clear motivation for the paper - what hypotheses are being tested? What is at stake in the model outcomes? It is not inherently informative to take a known biological representation and fine-tune a limited model to replicate that representation.

      We explicitly added the hypotheses to the revised introduction.

      (2d) There have been several published modeling efforts applied to the HVC chain (Seung, Fee, Long, Greenside, Jin, Margoliash, Abarbanel). These and others need to be introduced adequately, and it needs to be crystal clear what, if anything, the present study is adding to the canon.

      While several influential models have explored how HVC might generate sequences ranging from synfire chains to recurrent dynamics or externally driven sequences (e.g., Seung, Fee, Long, Greenside, Jin, Abarbanel, and others), these models could not capture the detailed dynamics observed in vivo. Our aim was to bridge a gap in the modeling literature by exploring how far biophysically grounded intrinsic properties and experimentally supported synaptic connections that are local to the HVC can alone produce temporally precise sequences. We have proven that these mechanisms are sufficient to generate these sequences, although some missing components (such as dendritic mechanisms or external inputs) might be needed to fully capture the complexity and robustness of HVC function.

      (2e) The authors mention learning prominently in the abstract, summary, and introduction but this paper has nothing to do with learning. Most or all mentions of learning should be deleted since they are misleading.

      We appreciate the reviewer’s observation however our intent by referencing learning was not to suggest that our model directly simulates learning processes, but rather to place HVC function within the broader context of song learning and production, where temporal sequencing plays a fundamental role. Yet, repeated references to learning may be misleading given that our current model does not incorporate plasticity, synaptic modification, or developmental changes. Hence, we have carefully revised the manuscript to rephrase mentions of learning unless directly relevant to context. 

      (3) Using the model for hypothesis generation and prediction of experimental results.

      (3a) The utility of a model is to provide conceptual insight into how or why the real HVC functions as it does, or to predict outcomes in yet-to-be conducted experiments to help motivate future studies. This paper does not adequately achieve these goals.

      We revised the Discussion of the manuscript to better emphasize potential contributions and point out many experiments that could validate or challenge the model’s predictions. These include dynamic clamp or ion channel blockers targeting A-type K<sup>+</sup> in HVC<sub>RA</sub> neurons to assess their impact on burst precision, optogenetic disruption of inhibitory interneurons to observe changes in burst timing and sequence propagation, pharmacological modulation of I<sub>h</sub> or I<sub>CaT</sub> in HVC<sub>X</sub> and interneurons etc. 

      (3b) Additionally, it can be interesting to conduct an experiment on an existing model; for example, what happens to the HVCra chain in your model if you delete the HVCx neurons? What happens if you block NMDA receptors? Such an approach in a modeling paper can help motivate hypotheses and endow the paper with a sense of purpose.

      We agree that running targeted experiments to test our computational model such as removing an HVC neuron population or blocking a synaptic receptor can be a powerful way to generate new ideas and guide future experiments. While we didn’t include these specific tests in the current study, the model is well suited for this kind of exploration. For instance, removing interneurons could help us better understand their role in shaping the timing of HVC<sub>RA</sub> bursts. These are great directions for future experiments, and we now highlight this in the discussion as a way the model could be used to guide experiments.

      (4) Changes to the paper's organization may improve clarity.

      (4a) Nearly all equations should be moved to an Appendix so that the main part of the paper can focus on the science: assumptions made, details of simulations, conclusions obtained, and their significance. The authors present many equations without discussion which weakens the paper.

      Equations moved to appendix.

      (4b) There are many grammatical errors, e.g., verbs do not match the subject in terms of being single or plural. The authors need to run their manuscript through a grammar checker.

      Done.

      (4c) Many of the figures are poorly designed and should be substantially modified. E.g. in Figure 1B, too many colors are used, making it hard to grasp what is being plotted and the colors are not needed. Figures 1C and 1D are entire figures taken from other papers, and there is no way a reader will be able to see or appreciate all the details when this figure is published on a single page. Figure 2 uses colors for dots that are almost identical, and the colors could be avoided by using different symbols. Figure 5 fills an entire page but most of the figure conveys no information, there is no need to show the same details for all 120 neurons, just show the top 1/3 of this figure; the same for Figure 7, a lot of unnecessary information is being included. Figure 10, the bottom time series of spikes should be replaced with a time series of rates, cannot extract useful information.

      Adjusted as requested. 

      (4d) Table 1 is long and largely uninteresting, and should be moved to an appendix.

      Table 1 moved to appendix.

      (4e) Many sentences are not carefully written, which greatly weakens the paper. As one typical example, the first sentence in the Discussion section "In this study, we have designed a neural network model that describes [sic] zebra finch song production in the HVC." This is inaccurate, the model does not describe song production, it just explores some properties of one nucleus involved with song production. Just one or few sentences like this is ok but there are so many sentences of this kind that the reader loses faith in the authors.

      Thank you for raising this point, we revised the manuscript to improve the precision of the writing. We replaced the first sentence of the discussion with this: "In this study, we developed a biophysically realistic neural network model to explore how intrinsic neuronal properties and local connectivity within the songbird nucleus HVC may support the generation of temporally precise activity sequences associated with zebra finch song."

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      The authors previously published a study of RGC boutons in the dLGN in developing wild-type mice and developing mutant mice with disrupted spontaneous activity. In the current manuscript, they have broken down their analysis of RGC boutons according to the number of Homer/Bassoon puncta associated with each vGlut3 cluster.

      The authors find that, in the first post-natal week, RGC boutons with multiple active zones (mAZs) are about a third as common as boutons with a single active zone (sAZ). The size of the vGluT2 cluster associated with each bouton was proportional to the number of active zones present in each bouton. Within the author's ability to estimate these values (n=3 per group, 95% of results expected to be within ~2.5 standard deviations), these results are consistent across groups: 1) dominant eye vs. nondominant eye, 2) wild-type mice vs. mice with activity blocked, and at 3) ages P2, P4, and P8. The authors also found that mAZs and sAZs also have roughly the same number (about 1.5) of sAZs clustered around them (within 1.5 um).

      However, the authors do not interpret this consistency between groups as evidence that active zone clustering is not a specific marker or driver of activity dependent synaptic segregation. Rather, the authors perform a large number of tests for statistical significance and cite the presence or absence of statistical significance as evidence that "Eye-specific active zone clustering underlies synaptic competition in the developing visual system (title)". I don't believe this conclusion is supported by the evidence.

      We have revised the title to be descriptive: "Eye-specific differences in active zone addition during synaptic competition in the developing visual system." While our correlative approach does not establish direct causality, our findings provide important structural evidence that complements existing functional studies of activity-dependent synaptic refinement. We have carefully revised the text throughout to avoid causal language, focusing instead on the developmental patterns we observe.

      Strengths

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model for activity dependent synaptic remodeling.

      Weaknesses

      In my previous review I argued that it was not possible to determine, from their analysis, whether the differences they were reporting between groups was important to the biology of the system. The authors have made some changes to their statistics (paired t-tests) and use some less derived measures of clustering. However, they still fail to present a meaningfully quantitative argument that the observed group differences are important. The authors base most of their claims on small differences between groups. There are two big problems with this practice. First, the differences between groups appear too small to be biologically important. Second, the differences between groups that are used as evidence for how the biology works are generally smaller than the precision of the author's sampling. That is, the differences are as likely to be false positives as true positives.

      (1) Effect size. The title claims: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". Such a claim might be supported if the authors found that mAZs are only found in dominant-eye RGCs and that eye-specific segregation doesn't begin until some threshold of mAZ frequency is reached. Instead, the behavior of mAZs is roughly the same across all conditions. For example, the clear trend in Figure 4C and D is that measures of clustering between mAZ and sAZ are as similar as could reasonably be expected by the experimental design. However, some of the comparisons of very similar values produced p-values < 0.05. The authors use this fact to argue that the negligible differences between mAZ and sAZs explain the development of the dramatic differences in the distribution of ipsilateral and contralateral RGCs.

      We have changed the title to avoid implying a causal relationship between clustering and eye-specific segregation. Our key findings in Figures 4C and 4D demonstrate effect sizes >2.0 with high statistical power (Supplemental Table S2). While the absolute magnitude of differences is modest (5-7%), these high effect sizes combined with low inter-animal variability demonstrate consistent, reproducible biological phenomena. During development, small differences during critical periods can have profound downstream consequences for synaptic refinement outcomes.

      We acknowledge that significance in Figure 4 arises due to low variance between biological replicates rather than large mean differences. We have revised the text to describe these as "slight" differences and that "WT mice show a tendency toward forming more synapses near mAZ inputs," reflecting appropriate caution in our interpretation while maintaining the statistical robustness of our findings.

      (2) Sample size. Performing a large number of significance tests and comparing pvalues is not hypothesis testing and is not descriptive science. At best, with large sample sizes and controls for multiple tests, this approach could be considered exploratory. With n=3 for each group, many comparisons of many derived measures, among many groups, and no control for multiple testing, this approach constitutes a random result generator.

      The authors argue that n=3 is a large sample size for the type of high resolution / large volume data being used. It is true that many electron microscopy studies with n=1 are used to reveal the patterns of organization that are possible within an individual. However, such studies cannot control individual variation and are, therefore, not appropriate for identifying subtle differences between groups.

      In response to previous critiques along these lines, the authors argue they have dealt with this issue by limiting their analysis to within-individual paired comparisons. There are several problems with their thinking in this approach. The main problem is that they did not change the logic of their arguments, only which direction they pointed the t-tests. Instead of claiming that two groups are different because p < 0.05, they say that two groups are different because one produced p < 0.05 and the other produced p > 0.05. These arguments are not statistically valid or biologically meaningful.

      We have implemented rigorous statistical controls, applying false discovery rate (FDR) correction using the Benjamini-Hochberg method (α = 0.05) within each experimental condition (age × genotype combination). This correction strategy treats each condition as addressing a distinct experimental question: “What synaptic properties differ between left eye and right eye inputs in this specific developmental stage and genotype?” The approach appropriately controls for multiple testing while preserving power to detect biologically meaningful differences. We applied FDR correction separately to the ~20-34 measurements (varying by age and genotype) within each of the six experimental conditions, resulting in condition-specific adjusted p-values reported in updated Supplemental Table S2. This correction confirmed the robustness of our key findings. We do not base conclusions solely on comparing p-values across conditions. Our interpretations focus on effect sizes, confidence intervals, and consistent patterns within each condition, with statistical significance providing supporting evidence rather than the primary basis for biological conclusions.

      To the best of my understanding, the results are consistent with the following model:

      RGCs form mAZs at large boutons (known)

      About a quarter of week-one RGC boutons are mAZs (new observation)

      Vesicle clustering is proportional to active zone number (~new observation)

      RGC synapse density increases during the first post-week (known)

      Blocking activity reduces synapse density (known)

      Contralateral eye RGCs for more and larger synapses in the lateral dLGN (known)

      While mAZ formation is known in adult and juvenile dLGN, the formation of mAZ boutons during eye-specific competition represents new information with important functional implications. Synapses with multiple release sites should be stronger than single-active-zone synapses, suggesting a structural correlate for competitive advantage during refinement.

      We demonstrate distinct developmental patterns for sAZ versus mAZ contacts during the first postnatal week. Multi-active zone density favors the dominant eye, while single active-zone synapse density from the competing eye increases from P2-P4 to match dominant-eye levels. This reveals that newly formed synapses from the competing eye predominantly contain single release sites, marking P4-P8 as a critical window for understanding molecular mechanisms driving synaptic elimination.

      Our results show that altered retinal activity patterns (β2KO mice) reduce synapse density during eye-specific competition. We relied on β2 knockout mice, which retain retinal waves and spontaneous spike activity but with disrupted patterns and output levels compared to controls. We make no claims about complete activity blockade. Previous studies using different activity manipulations (epibatidine, TTX) have examined terminal morphology, but effects on synapse density during competition remain largely unknown. Achieving complete retinal activity blockade is technically challenging, making it of interest to revisit the role of activity using more precise manipulations to control spike output and relative timing.

      With n=3 and effect sizes smaller than 1 standard deviation, a statistically significant result is about as likely to be a false positive as a true positive.

      A true-positive statistically significant result does is not evidence of a meaningful deviation from a biological model.

      Our conclusions are based on results with effect sizes substantially larger than 1. Key findings demonstrate effect sizes exceeding 2.0. These large effect sizes, combined with rigorous FDR correction and low inter-animal variability, provide evidence against false positive results. During critical developmental periods, consistent structural differences, even those modest in absolute magnitude, can reflect important regulatory mechanisms that influence refinement outcomes. All statistical results, effect sizes, and power analyses are reported in Supplementary Tables S2, with confidence intervals in Supplementary Table S3. We have revised the text in several places where small differences are presented to reflect appropriate caution in our interpretation.

      Providing plots that show the number of active zones present in boutons across these various conditions is useful. However, I could find no compelling deviation from the above default predictions that would influence how I see the role of mAZs in activity dependent eye-specific segregation.

      Below are critiques of most of the claims of the manuscript.

      Claim (abstract): individual retinogeniculate boutons begin forming multiple nearby presynaptic active zones during the first postnatal week.

      Confirmed by data.

      Claim (abstract): the dominant-eye forms more numerous mAZ contacts,

      Misleading: The dominant-eye (by definition) forms more contacts than the nondominant eye. That includes mAZ.

      While the dominant eye forms more total contacts, the pattern depends critically on contact type and developmental stage. The dominant eye forms more mAZ contacts across all ages (Figures 2 and S1). However, for sAZ contacts, the two eyes form similar numbers at P4, with the non-dominant eye showing increased sAZ formation during this critical period. This differential pattern by synapse type represents an important aspect of how synaptic competition unfolds structurally.

      Claim (abstract): At the height of competition, the non-dominant-eye projection adds many single active zone (sAZ) synapses

      Weak: While the individual observation is strong, it is a surprising deviation based on a single n=3 experiment in a study that performed twelve such experiments (six ages, mutant/wildtype, sAZ/mAZ)

      The difference in eye-specific sAZ formation at P2 and P8 had effect sizes of ~5.3 and ~2.7 respectively (after FDR correction the difference was still significant at P2 and trending at P8). At P4, no effect was observed by paired T-test and the 5/95% confidence intervals ranged from -0.021-0.008 synapses/m<sup>3</sup>. The consistency of this pattern across P2 and P8, combined with the large effect sizes, supports the reliability of this developmental finding. We report all effect sizes and power test analyses in Supplemental Table S2, and confidence intervals in Supplemental Table S3. 

      Claim (abstract): Together, these findings reveal eye-specific differences in release site addition during synaptic competition in circuits essential for visual perception and behavior.

      False: This claim is unambiguously false. The above findings, even if true, do not argue for any functional significance to active zone clustering.

      Our phrasing “circuits essential for visual perception and behavior” referred to the general importance of binocular organization in the retinogeniculate system for visual processing and we did not intend to claim direct functional significance of our structural data. For clarity we have deleted the latter part of this sentence. In lines 35-37, the abstract now reads “Together, these findings reveal eye-specific differences in release site addition that correlate with axonal refinement outcomes during retinogeniculate refinement.”

      Claim (line 84): "At the peak of synaptic competition midway through the first postnatal week, the non-dominant-eye formed numerous sAZ inputs, equalizing the global synapse density between the two eyes"

      Weak: At one of twelve measures (age, bouton type, genotype) performed with 3 mice each, one density measure was about twice as high as expected.

      The difference in eye-specific sAZ formation at P2 and P8 had effect sizes of ~5.3 and ~2.7 respectively (after FDR correction the difference was still significant at P2 and trending at P8). At P4, no effect was observed by paired T-test and the 5/95% confidence intervals ranged from -0.021-0.008 synapses/m<sup>3</sup>. The consistency of this pattern across P2 and P8, combined with the large effect sizes, supports the reliability of this developmental finding. We report all effect sizes and power test analyses in Supplemental Table S2, and confidence intervals in Supplemental Table S3. 

      Claim (line 172): "In WT mice, both mAZ (Fig. 3A, left) and sAZ (Fig. 3B, left) inputs showed significant eye-specific volume differences at each age."

      Questionable: There appears to be a trend, but the size and consistency is unclear.

      Claim (line 175): "the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left)."

      Cherry picking. Twelve differences were measured with an n of 3, 3 each time. The biggest difference of the group was cited. No analysis is provided for the range of uncertainty about this measure (2.5 standard deviations) as an individual sample or as one of twelve comparisons.

      Claim (line 174): "In the middle of eye-specific competition at P4 in WT mice, the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left). In contrast, β2KO mice showed a smaller 1.1 fold difference at the same age (Fig. 3A, right panel). For sAZ synapses at P4, the magnitudes of eye-specific differences in VGluT2 volume were smaller: 1.35-fold in WT (Fig. 3B, left) and 0.41-fold in β2KO mice (Fig. 3B, right). Thus, both mAZ and sAZ input size favors the dominant eye, with larger eye-specific differences seen in WT mice (see Table S3)."

      No way to judge the reliability of the analysis and trivial conclusion: To analyze effect size the authors choose the median value of three measures (whatever the middle value is). They then make four comparisons at the time point where they observed the biggest difference in favor of their hypothesis. There is no way to determine how much we should trust these numbers besides spending time with the mislabeled scatter plots. The authors then claim that this analysis provides evidence that there is a difference in vGluT2 cluster volume between dominant and non-dominant RGCs and that that difference is activity dependent. The conclusion that dominant axons have bigger boutons and that mutants that lack the property that would drive segregation would show less of a difference is very consistent with the literature. Moreover, there is no context provided about what 1.35 or 1.1 fold difference means for the biology of the system.

      We focused on P4 for biological reasons rather than post-hoc selection. P4 represents the established peak of synaptic competition when eye-specific synapse densities are globally equivalent. This is a timepoint consistently highlighted throughout our manuscript and supported by previous literature. We have modified our presentation from fold changes to measured eye-specific differences in volume (mean ± standard error) and added confidence intervals in Supplemental Table S3. The effect sizes for eye-specific differences in VGluT2 volume at P4 are robust: ~2.3 and ~1.5 for mAZ and sAZ measurements in WT mice, and ~2.5 and ~1.8 in β2KO mice, with all analyses well-powered (Supplemental Table S2).

      We were unable to identify any mislabeled scatter plots and believe all figures are correctly labeled. While dominant-eye advantage in bouton size is consistent with previous literature, our study provides the first detailed analysis of how this develops specifically during the critical period of competition, with distinct patterns for single versus multi-active zone contacts. Our data show that dominant-eye inputs have larger vesicle pools that scale with active zone number. While this suggests enhanced transmission capacity, we make no direct physiological claims based on structural data alone.

      Claim (189): "This shows that vesicle docking at release sites favors the dominant-eye as we previously reported but is similar for like eye type inputs regardless of AZ number."

      Contradicts core claim of manuscript: Consistent with previous literature, there is an activity dependent relative increase in vGlut2 clustering of dominant eye RGCs. The new information is that that activity dependence is more or less the same in sAZ and mAZ. The only plausible alternative is that vGlut2 scaling only increases in mAZ which would be consistent with the claims of their paper. That is not what they found. To the extent that the analysis presented in this manuscript tests a hypothesis, this is it. The claim of the title has been refuted by figure 3.

      We report the volume of docked vesicle signal (VGluT2) nearby each active zone, finding this is greater for dominant-eye synapses. Within each eye-specific synapse population, vesicle signal per active zone is similar regardless of whether these are part of single- or multi-active zone contacts. This is consistent with a modular program of active zone assembly and maintenance: core molecular programs facilitate docking at each AZ similarly regardless of how many AZs are nearby. 

      This finding does not contradict our main conclusions but rather provides insight into how synaptic advantages are structured. The dominant eye's advantage may arise in part from forming more multi-AZ contacts (which have proportionally more docked vesicles) rather than from enhanced vesicle loading per individual active zone. This organization may reflect how developmental competition operates through contact number and active zone addition rather than fundamental changes to individual release site properties.

      We have changed the title to be descriptive rather than mechanistic.

      Claim (line 235): "For the non-dominant eye projection, however, clustered mAZ inputs outnumbered clustered sAZ inputs at P4 (Fig. 4C, bottom left panel), the age when this eye adds sAZ synapses (Fig. 2C)."

      Misleading: The overwhelming trend across 24 comparisons is that the sAZ clustering looks like mAZ clustering. That is the objective and unambiguous result. Among these 24 underpowered tests (n=3), there were a few p-values < 0.05. The authors base their interpretation of cell behavior on these crossings.

      In Figures 4C and 4D we report significant results with high effect sizes (effect sizes all greater than 2; see Supplemental Table S2). The mean differences are modest (5-7%) and significance arises due to low variance between biological replicates. We acknowledge that clustering patterns are generally similar between mAZ and sAZ inputs across most conditions. We have revised the text to describe these as “slight” differences and that “WT mice show a tendency toward forming more synapses near mAZ inputs”, reflecting appropriate caution in our interpretation while noting the statistical consistency of these patterns.

      Claim (line 328): "The failure to add synapses reduced synaptic clustering and more inputs formed in isolation in the mutants compared to controls."

      Trivially true: Density was lower in mutant.

      We have rewritten the sentence for clarity: “The failure to add synapses could explain the observation that synaptic clustering was reduced and more inputs formed in isolation in the mutants compared to controls.”

      Claim (line 332): "While our findings support a role for spontaneous retinal activity in presynaptic release site addition and clustering..."

      Not meaningfully supported by evidence: I could not find meaningful differences between WT and mutant beside the already known dramatic difference in synapse density.

      We have changed the sentence to avoid overinterpreting the results. The new sentence in lines 415-417 reads: “While our results highlight developmental changes in presynaptic release site addition and clustering, activity-dependent postsynaptic mechanisms also influence input refinement at later stages.”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes of this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of synapses based on Bassoon clustering: the multiple active zone (mAZ) synapse and single active zone (sAZ) synapse. In this revision, the authors have added EM data to support the idea that mAZ synapses represent boutons with multiple release sites. They have also reanalyzed their data set with different statistical approaches.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      While the interpretation of this data set is much more grounded in this second revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence. In particular, the data does not support the title: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". The data show that there are fewer synapses made for both contra- and ipsi- inputs in the β2KO mice-- this fact alone can account for the differences in clustering. There is no evidence linking clustering to synaptic competition. Moreover, the findings of differences in AZ# or distance between AZs that the authors report are quite small and it is not clear whether they are functionally meaningful.

      We thank the reviewer for their helpful suggestions that improved the manuscript in this revision. We have changed the title to remove the reference to “clustering” and to avoid implying any causal relationships. The new title is descriptive: “Eye-specific differences in active zone addition during synaptic competition in the developing visual system”.

      To further address the reviewers comments, we have removed the remaining references to activity-dependent effects on synaptic development (line 36, line 96, line 415). We have also modified the text in lines 411-413 to state that “The failure to add synapses could explain the observation that synaptic clustering was reduced and more inputs formed in isolation in the mutants compared to controls.”

      We have also updated our presentation of results for Figure 4 to ensure that we do not causally link clustering to synaptic competition. In Figures 4C and 4D we report significant results with high effect sizes (effect sizes all greater than 2; see Supplemental Table S2). The mean differences are modest (5-7%) and significance arises due to low variance between biological replicates. We acknowledge that clustering patterns are generally similar between mAZ and sAZ inputs across most conditions. We have revised the text to describe these as “slight” differences and that “WT mice show a tendency toward forming more synapses near mAZ inputs”, reflecting appropriate caution in our interpretation while noting the statistical consistency of these patterns.

      Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contraprojecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters. The authors interpret these results as providing insight into how neural activity drives synapse maturation, the strength of their conclusions is not directly tested by their analysis.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate the eye of origin is what makes this data set unique over previous structural work. The addition of example images from the EM dataset provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of single vs multi-active zone synapses are important and represent a significant advance, the authors continue to make unsupported conclusions regarding the biological processes driving these changes. Although this revision includes additional information about the populations tested and the tests conducted, the authors do not address the issue raised by previous reviews. Specifically, they provide no assessment of what effect size represents a biologically meaningful result. For example, a more appropriate title is "The distribution of eye-specific single vs multiactive zone is altered in mice with reduced spontaneous activity" rather than concluding that this difference in clustering is somehow related to synaptic competition. Of course, the authors are free to speculate, but many of the conclusions of the paper are not supported by their results.

      We appreciate the reviewer’s helpful critique. We have changed the title to be descriptive and avoid implying causal relationships. 

      We have applied false discovery rate (FDR) correction using the Benjamini-Hochberg method with α = 0.05 within each experimental condition (age × genotype combination). The FDR correction treats each condition as addressing a distinct experimental question: 'What synaptic properties differ between left eye and right eye inputs in this specific developmental stage and genotype?'

      This correction strategy is appropriate because: 1) we focus our statistical comparisons within each age/genotype; 2) each age-genotype combination represents a separate biological context where different synaptic properties between eye-of-origin may be relevant; and 3) this approach controls for multiple testing within each experimental question while maintaining statistical power to detect meaningful biological differences.

      We applied FDR correction separately to the ~20-34 measurements (varying with age and genotype) within each of the six experimental conditions (P2-WT, P2-ß2, P4-WT, P4-ß2, P8-WT, P8-ß2), resulting in condition-specific adjusted p-values. These are reported in the updated Supplemental Table S2. Figures have been also been updated to reflect the FDR-adjusted values. Selected between-genotype comparisons are presented descriptively using 5/95% confidence intervals. This correction confirmed the robustness of our key findings.

      With regard to the biological significance of effect sizes, our key findings demonstrate effect sizes >2.0, indicating robust effects. During critical developmental periods, consistent structural differences, even those modest in absolute magnitude, can reflect important regulatory mechanisms that influence refinement outcomes. The differences in synaptic organization we observe occur during the first postnatal week when eyespecific competition is active, suggesting these patterns may be relevant to understanding how structural advantages emerge during synaptic refinement.

      Reviewer #1 (Recommendations for the authors):

      I have tried to understand the analysis and biology of this manuscript as best I can. I believe the analytical approach taken is not reliable and I have explained why in my public comments. I don't believe this manuscript is unique in taking this approach. I have recently published a paper on how common this approach is and why it doesn't work. I don't want to give the impression that the problem with the analysis was that it was not computationally sophisticated enough or that you did not jump through a specific statistical hoop. If I strip out the arguments that depend on misinterpretations of p-values and -instead- look at the scatterplots, I come up with a very different view of the data than what is described in the paper.

      The information in the plots could be translated into a rigorous statistical analysis of estimated differences between groups given the uncertainties of the experimental design. I don't really think that analysis would be useful. I think it would have been enough to publish the plots and report your estimates of the number of active zones in RGCs during development. I don't see evidence of an additional effect.

      We appreciate the reviewer’s helpful comments throughout the review process. Mean active zone numbers per mAZ contact are presented in Figure S2D/E. We look forward to further technical and computational advances that will help us increase our data acquisition throughput and sample sizes when designing future studies. 

      Reviewer #2 (Recommendations for the authors):

      The authors should modify the title and other text to be more consistent with the data. There is no evidence that active zone clustering has any direct relationship to synaptic competition.

      We appreciate the reviewer’s helpful suggestions to ensure appropriate language around causal effects. We have modified the title to accurately reflect the results: "Eyespecific differences in active zone addition during synaptic competition in the developing visual system." We have revised the text in the abstract, introduction, and results section for Figures 4 to be consistent with the data and not imply causality of synapse clustering on segregation phenotypes.

      Reviewer #3 (Recommendations for the authors):

      Change the title.

      We appreciate the reviewer’s feedback throughout the review process. We have modified the title to accurately reflect the results: "Eye-specific differences in active zone addition during synaptic competition in the developing visual system."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for very enthusiastic and supportive comments on our manuscript. 

      Summary:

      This manuscript presents a compelling and innovative approach that combines Track2p neuronal tracking with advanced analytical methods to investigate early postnatal brain development. The work provides a powerful framework for exploring complex developmental processes such as the emergence of sensory representations, cognitive functions, and activity-dependent circuit formation. By enabling the tracking of the same neurons over extended developmental periods, this methodology sets the stage for mechanistic insights that were previously inaccessible.

      Strengths:

      (1) Innovative Methodology:

      The integration of Track2p with longitudinal calcium imaging offers a unique capability to follow individual neurons across critical developmental windows.

      (2) High Conceptual Impact:

      The manuscript outlines a clear path for using this approach to study foundational developmental questions, such as how early neuronal activity shapes later functional properties and network assembly.

      (3) Future Experimental Potential:

      The authors convincingly argue for the feasibility of extending this tracking into adulthood and combining it with targeted manipulations, which could significantly advance our understanding of causality in developmental processes.

      (4) Broad Applicability:

      The proposed framework can be adapted to a wide range of experimental designs and questions, making it a valuable resource for the field.

      Weaknesses:

      No major weaknesses were identified by this reviewer. The manuscript is conceptually strong and methodologically sound. Future studies will need to address potential technical limitations of long-term tracking, but this does not detract from the current work's significance and clarity of vision.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Majnik and colleagues introduces "Track2p", a new tool designed to track neurons across imaging sessions of two-photon calcium imaging in developing mice. The method addresses the challenge of tracking cells in the growing brain of developing mice. The authors showed that "Track2p" successfully tracks hundreds of neurons in the barrel cortex across multiple days during the second postnatal week. This enabled the identification of the emergence of behavioral state modulation and desynchronization of spontaneous network activity around postnatal day 11.

      Strengths:

      The manuscript is well written, and the analysis pipeline is clearly described. Moreover, the dataset used for validation is of high quality, considering the technical challenges associated with longitudinal two-photon recordings in mouse pups. The authors provide a convincing comparison of both manual annotation and "CellReg" to demonstrate the tracking performance of "Track2p". Applying this tracking algorithm, Majnik and colleagues characterized hallmark developmental changes in spontaneous network activity, highlighting the impact of longitudinal imaging approaches in developmental neuroscience. Additionally, the code is available on GitHub, along with helpful documentation, which will facilitate accessibility and usability by other researchers.

      Weaknesses:

      (1) The main critique of the "Track2p" package is that, in its current implementation, it is dependent on the outputs of "Suite2p". This limits adoption by researchers who use alternative pipelines or custom code. One potential solution would be to generalize the accepted inputs beyond the fixed format of "Suite2p", for instance, by accepting NumPy arrays (e.g., ROIs, deltaF/F traces, images, etc.) from files generated by other software. Otherwise, the tool may remain more of a useful add-on to "Suite2p" (see https://github.com/MouseLand/suite2p/issues/933) rather than a fully standalone tool.

      We thank the reviewer for this excellent suggestion. 

      We have now implemented this feature, where Track2p is now compatible with ‘raw’ NumPy arrays for the three types of inputs. For more information, please check the updated documentation: https://track2p.github.io/run_inputs_and_parameters.html#raw-npy-arrays. We have also tested this feature using a custom segmentation and trace extraction pipeline using Cellpose for segmentation.

      (2) Further benchmarking would strengthen the validation of "Track2p", particularly against "CaIMaN" (Giovannucci et al., eLife, 2019), which is widely used in the field and implements a distinct registration approach.

      This reviewer suggested  further benchmarking of Track2P.  Ideally, we would want to benchmark Track2p against the current state-of-the-art method. However, the field currently lacks consensus on which algorithm performs best, with multiple methods available including CaIMaN, SCOUT (Johnston et al. 2022), ROICaT (Nguyen et al. 2023), ROIMatchPub (recommended by Suite2p documentation and recently used by Hasegawa et al. 2024), and custom pipelines such as those described by Sun et al. 2025. The absence of systematic benchmarking studies—particularly for custom tracking pipelines—makes it impossible to identify the current state-of-the-art for comparison with Track2p. While comparing Track2p against all available methods would provide comprehensive evaluation, such an analysis falls beyond the scope of this paper.

      We selected CellReg for our primary comparison because it has been validated under similar experimental conditions—specifically, 2-photon calcium imaging in developing hippocampus between P17-P25 (Wang et al. 2024)—making it the most relevant benchmark for our developmental neocortex dataset.

      That said, to support further benchmarking in mouse neocortex (P8-P14), we will publicly release our ground truth tracking dataset.

      (3) The authors might also consider evaluating performance using non-consecutive recordings (e.g., alternate days or only three time points across the week) to demonstrate utility in other experimental designs.

      Thank you for your suggestion. We have performed a similar analysis prior to submission, but we decided against including it in the final manuscript, to keep the evaluation brief and to not confuse the reader with too many different evaluation methods. We have included the results inAuthor response images 1 and 2 below.

      To evaluate performance in experimental designs with larger time spans between recordings (>1 day) we performed additional evaluation of tracking from P8 to each of the consecutive days while omitting the intermediate days (e. g. P8 to P9, P8 to P10 … P8 to P14). The performance for the three mice from the manuscript is shown below:

      Author response image 1.

      As expected with increasing time difference between the two recordings the performance drops significantly (dropping to effectively zero for 2 out of 3 mice). This could also explain why CellReg struggles to track cells across all days, since it takes P8 as a reference and attempts to register all consecutive days to that time point before matching, instead of performing registration and matching in consecutive pairs of recordings (P8-P9, P9-P10 … P13-P14) as we do.

      Finally for one of the three mice we also performed an additional test where we asked how adding an additional recording day might rescue the P8-P14 tracking performance. This corresponds to the comment from the reviewer, answering the question if we can only perform three days of recording which additional day would give the best tracking performance. 

      Author response image 2.

      As can be seen from the plot, adding the P10 or P11 recording shows the most significant improvement to the tracking performance, however the performance is still significantly lower than when including all days (see Fig. 4). This test suggests that including a day that is slightly skewed to earlier ages might improve the performance more than simply choosing the middle day between the two extremes. This would also be consistent with the qualitative observation that the FOV seems to show more drastic day-to-day changes at earlier ages in our recording conditions.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Majnik et al. developed a computational algorithm to track individual developing interneurons in the rodent cortex at postnatal stages. Considerable development in cortical networks takes place during the first postnatal weeks; however, tools to study them longitudinally at a single-cell level are scarce. This paper provides a valuable approach to study both single-cell dynamics across days and state-driven network changes. The authors used Gad67Cre mice together with virally introduced TdTom to track interneurons based on their anatomical location in the FOV and AAVSynGCaMP8m to follow their activity across the second postnatal week, a period during which the cortex is known to undergo marked decorrelation in spontaneous activity. Using Track2P, the authors show the feasibility of tracking populations of neurons in the same mice, capturing with their analysis previously described developmental decorrelation and uncovering stable representations of neuronal activity, coincident with the onset of spontaneous active movement. The quality of the imaging data is compelling, and the computational analysis is thorough, providing a widely applicable tool for the analysis of emerging neuronal activity in the cortex. Below are some points for the authors to consider.

      We thank the reviewer for a constructive and positive evaluation of our MS. 

      Major points:

      (1) The authors used 20 neurons to generate a ground truth dataset. The rationale for this sample size is unclear. Figure 1 indicates the capability to track ~728 neurons. A larger ground truth data set will increase the robustness of the conclusions.

      We think this was a misunderstanding of our ground truth dataset analysis which included 192 and not 20 neurons. Indeed, as explained in the methods section, since manually tracking all cells would require prohibitive amounts of time, we decided to generate sparse manual annotations, only tracking a subset of all cells from the first recording day onwards. To do this, we took the first recording (s0), and we defined a grid 64 equidistant points over the FOV and, for each point, identified the closest ROI in terms of euclidean distance from the median pixel of the ROI (see Fig. S3A). We then manually tracked these 64 ROIs across subsequent days. Only neurons that were detected and tracked across all sessions were taken into account and referred to as our ground truth dataset (‘GT’ in Fig. 4). This was done for 3 mice, hence 3X64 neurons and not 20 were used to generate our GT dataset. 

      (2) It is unclear how movement was scored in the analysis shown in Figure 5A. Was the time that the mouse spent moving scored after visual inspection of the videos? Were whisker and muscle twitches scored as movement, or was movement quantified as the amount of time during which the treadmill was displaced?

      Movement was scored using a ‘motion energy’ metric as in Stringer et al. 2019 (V1) or Inácio et al. 2025 (S1). This metric takes each two consecutive frames of the videography recordings and computes the difference between them by summing up the square of pixelwise differences between the two images. We made the appropriate changes in the manuscript to further clarify this in the main text and methods in order to avoid confusion.

      Since this metric quantifies global movements, it is inherently biased to whole-body movements causing more significant changes in pixel values around the whole FOV of the camera. Slight twitches of a single limb, or the whisker pad would thus contribute much less to this metric, since these are usually slight displacements in a small region of the camera FOV. Additionally, comparing neural activity across all time points (using correlation or R<sup>2</sup>) also favours movements that last longer (such as wake movements / prolonged periods of high arousal) since each time point is treated equally.

      As we suggested in the discussion, in further analysis it would be interesting to look at the link between twitches and neural activity, but this would likely require extensive manual scoring. We could then treat movements not as continuous across all time-points, but instead using event-based analysis for example peri-movement time histograms for different types of movements at different ages, which is however outside of the scope of this study.

      (3) The rationale for binning the data analysis in early P11 is unclear. As the authors acknowledged, it is likely that the decoder captured active states from P11 onwards. Because active whisking begins around P14, it is unlikely to drive this change in network dynamics at P11. Does pupil dilation in the pups change during locomotor and resting states? Does the arousal state of the pups abruptly change at P11?

      We agree that P11 does not match any change in mouse behavior that we have been able to capture. However, arousal state in mice does change around postnatal day 11. This period marks a transition from immature, fragmented states to more organized and regulated sleep-wake patterns, along with increasing influence from neuromodulatory and sensory systems. All of these changes have been recently reviewed in Wu et al. 2024 (see also Martini et al. 2021). In addition, in the developing somatosensory system, before postnatal day 11 (P11), wake-related movements (reafference) are actively gated and blocked by the external cuneate nucleus (ECN, Tiriac et al. 2016 and all excellent recent work from the Blumberg lab). This gating prevents sensory feedback from wake movements from reaching the cortex, ensuring that only sleep-related twitches drive neural responses. However, around P11, this gating mechanism abruptly lifts, enabling sensory signals from wake movements to influence cortical processing—signaling a dramatic developmental shift from Wu et al. 2024

      Reviewer #1 (Recommendations for the authors):

      This manuscript represents a significant advancement in the field of developmental neuroscience, offering a powerful and elegant framework for longitudinal cellular tracking using the Track2p method combined with robust analytical approaches. The authors convincingly demonstrate that this integrated methodology provides an invaluable template for investigating complex developmental processes, including the emergence of sensory representations and higher cognitive functions.

      A major strength of this work is its emphasis on the power of longitudinal imaging to illuminate activity-dependent development. By tracking the same neurons over time, the authors open up new possibilities to uncover how early activity patterns shape later functional outcomes and the organization of neuronal assemblies-insights that would be inaccessible using conventional cross-sectional designs.

      Importantly, the manuscript highlights the potential for this approach to be extended even further, enabling continuous tracking into adulthood and thus offering an unprecedented window into long-term developmental trajectories. The authors also underscore the exciting opportunity to incorporate targeted perturbation experiments, allowing researchers to causally link early circuit dynamics to later outcomes.

      Given the increasing recognition that early postnatal alterations can underlie the etiology of various neurodevelopmental disorders, this work is especially timely. The methods and perspectives presented here are poised to catalyze a new generation of developmental studies that can reveal mechanistic underpinnings of both typical and atypical brain development.

      In summary, this is a technically impressive and conceptually forward-looking study that sets the stage for transformative advances in developmental neuroscience.

      Thank you for the thoughtful feedback—it's greatly appreciated!

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      (1) Figure 1. Consider merging or moving to Supplemental, as its rationale is well described in the text.

      We would like to retain the current figure as we believe it provides an effective visual illustration of our rationale that will capture readers' attention and could serve as a valuable reference for others seeking to justify longitudinal tracking of the developing brain. We hope the reviewer will understand our decision.

      (2) Some axis labels and panels are difficult to read due to small font sizes (e.g. smaller panels in Figures 5-7).

      Modified, thanks 

      (3) Supplementary Figures. The order of appearance in the main text is occasionally inconsistent.

      This was modified, thanks

      (4) Line 132. Add a reference to the registration toolbox used (elastix). A brief description of the affine transformation would also be helpful, either here or in the Methods section (p. 27).

      We have added reference to Ntatsis et al. 2023 and described affine transformation in the main text (lines 133-135): 

      Firstly, we estimate the spatial transformation between s0 and s1 using affine image registration (i.e. allowing shifting, rotation, scaling and shearing, see Fig. 2B, the transformation is denoted as T).

      (5) Lines 147-151. If this method is adapted from another work, please cite the source.

      Computing the intersection over union of two ROIs for tracking is a widely established and intuitive method used across numerous studies, representing standard practice rather than requiring specific citation. We have however included the reference to the paper describing the algorithm we use to solve the linear sum assignment problem used for matching neurons across a pair of consecutive days (Crouse 2016).

      (6) Line 218. "classical" or automatic?

      We meant “classical” in the sense of widely used. 

      (7) Lines 220-231. Did the authors find significant variability of successfully tracked neurons across mice? While the data for successfully tracked cells is reported (Figure 5B), the proportions are not. Could differences in neuron dropout across days and mice affect the analysis of neuronal activity statistics?

      We thank the reviewer for raising this important point. We computed the fraction of successfully tracked cells in our dataset and found substantial variability:

      Cells detected on day 0: [607, 1849, 2190, 1988, 1316, 2138] 

      Proportion successfully tracked: [0.47, 0.20, 0.36, 0.37, 0.41, 0.19]

      Notably, the number of cells detected on the first day varies considerably (607–2138 cells). There appears to be a trend whereby datasets with fewer initially detected cells show higher tracking success rates, potentially because only highly active cells are identified in these cases.

      To draw more definitive conclusions about the proportion of active cells and tracking dropout rates, we would require activity-independent cell detection methods (such as Cellpose applied to isosbestic 830 nm fluorescence, or ideally a pan-neuronal marker in a separate channel, e.g., tdTomato). We have incorporated the tracking success proportions into the revised manuscript.

      (8) Line 260. Please briefly explain, here or in the Methods, the rationale for using data from only 3 mice (rather than all 6) for evaluating tracking performance.

      We used three mice for this analysis due to the labor-intensive nature of manually annotating 64 ROIs across several days. Given the time constraints of this manual process, we determined that three subjects would provide adequate data to reliably assess tracking performance.

      (9) Line 277. Consider clarifying or rephrasing the phrase "across progressively shorter time intervals"? Do you mean across consecutive days?

      This has been rephrased as follows: 

      Additionally, to assess tracking performance over time, we quantified the proportion of reconstructed ground truth tracks over progressively longer time intervals (first two days, first three days etc. ‘Prop. correct’ in Fig. 4C-F, see Methods). This allowed us to understand how tracking accuracy depends on the number of successive sessions, as well as at which time points the algorithm might fail to successfully track cells.

      (10) Line 306. "we also provide additional resources and documentation". Please add a reference or link.

      Done, thanks

      Track2p  

      (11) Lines 342-344. Specify that the raster plots refer to one example mouse, not the entire sample.

      Done, thanks.

      (12) Lines 996-1002. Please confirm whether only successfully tracked neurons were used to compute the Pearson correlations between all pairs.

      Yes of course, this only applies to tracked neurons as it is impossible to compute this for non-tracked pairs.

      (13) Line 1003. Add a reference to scikit-learn.

      Reference was added to: 

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. 

      (14) Typos.Correct spacing between numeric values and units.

      We did not find many typos regarding spacing between the numerical value and the unit symbol (degrees and percent should not be spaced right?).

      Reviewer #3 (Recommendations for the authors):

      The font size in many of the figures is too small. For example, it is difficult to follow individual ROIs in Figure S3.

      Figure font size has been increased, thanks. In Figure S3 there might have been a misunderstanding, since the three FOV images do not correspond to the FOV of the same mouse across three days but rather to the first recording for each of the three mice used in evaluation (the ROIs can thus not be followed across images since they correspond to a different mouse). To avoid confusion we have labelled each of the FOV images with the corresponding mouse identifier (same as in Fig. 4 and 5).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors explore the role of the conserved transcription factor POU4-2 in planarian maintenance and regeneration of mechanosensory neurons. The authors explore the role of this transcription factor and identify potential targets of this transcription factor. Importantly, many genes discovered in this work are deeply conserved, with roles in mechanosensation and hearing, indicating that planarians may be a useful model with which to study the roles of these key molecules. This work is important within the field of regenerative neurobiology, but also impactful for those studying the evolution of the machinery that is important for human hearing. 

      Strengths: 

      The paper is rigorous and thorough, with convincing support for the conclusions of the work. 

      Weaknesses: 

      Weaknesses are relatively minor and could be addressed with additional experiments or changes in writing.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the role of the transcription factor Smed-pou4-2 in the maintenance, regeneration, and function of mechanosensory neurons in the freshwater planarian Schmidtea mediterranea. First, they characterize the expression of pou4-2 in mechanosensory neurons during both homeostasis and regeneration, and examine how its expression is affected by the knockdown of soxB1, 2, a previously identified transcription factor essential for the maintenance and regeneration of these neurons. Second, the authors assess whether pou4-2 is functionally required for the maintenance and regeneration of mechanosensory neurons. 

      Strengths: 

      The study provides some new insights into the regulatory role of pou4-2 in the differentiation, maintenance, and regeneration of ciliated mechanosensory neurons in planarians. 

      Weaknesses: 

      The overall scope is relatively limited. The manuscript lacks clear organization, and many of the conclusions would benefit from additional experiments and more rigorous quantification to enhance their strength and impact. 

      Reviewing Editor Comments: 

      (1) Quantification of pou4-2(+) cells that express (or do not express) hmcn-1-L and/or pkd1L-2(-) is a common suggestion amongst reviewers. It is recognized that Ross et al. (2018) showed that pkd1L-2 and hmcn-1L expression is detected in separate cells by double FISH, and the analysis presented in Supplementary Figure S3 is helpful in showing that some cells expressing pou4-2 (magenta) are not labeled by the combined signal of pkd1L-2 and hmcn-1-L riboprobes (green). However, I am not sure that we can conclude that pkd1L-2 and hmcn-1-L are effectively detected when riboprobes are combined in the analysis. Therefore, quantification of labeled cells as proposed by Reviewers 1 and 2 would help.

      Combining riboprobes is a standard approach in the field, and we chose this method as a direct way to determine which cells lack expression of both genes. We agree that providing the raw quantification data would be helpful for readers, and we included this data in Supplementary File S7; the file contains the quantification information for this dFISH experiment represented in Supplementary Figure 3.

      (2) It may be helpful to comment on changes (or lack of changes) in atoh gene RNA levels in RNAseq analyses of pou4-2 animals. As mentioned by one of the reviewers, in situs that don't show signal are inconclusive in this regard. 

      We fully agree with both reviewers. Two of the planarian atonal homologs are difficult to detect and produce background signals, which we attempted and previously reported in Cowles et al. Development (2013). We conceived performing reciprocal RNAi/in situ experiments, born out of curiosity given the reported role of atonal in the pou4 cascade in other organisms. However, these exploratory experiments lacked a strong rationale for inclusion, particularly given that pou4-2 and the atonal homologs do not share expression patterns, co-expression, or differential expression in our RNA-seq dataset. Therefore, we decided to omit the atonal in situs following pou4-2 RNAi. We retained the experiments showing that knockdown of the atonal genes does not show robust effects on the mechanosensory neuron pattern, as expected. We thank the reviewing editor and reviewers for pinpointing the concern. We agree that additional experiments, such as qPCR experiments, would be needed. We reasoned that while these additional experiments could be informative, they are unlikely to alter the key conclusions of this study substantially.

      (3) There seem to be typos at bottom of Figure 10 and top of page 11 when referencing to Figure 4B (should be to 5B instead): "While mechanosensory neuronal patterned expression of Eph1 was downregulated after pou4-2 and soxB1-2 inhibition, low expression in the brain branches of the ventral cephalic ganglia persisted (Figure 4B)." 

      Thank you! We have fixed those.

      (4) Typo (page 13; kernel?): "...to test to what extent the Pou4 gene regulatory kernel is conserved among these widely divergent animals." 

      Regulatory kernels are defined as the minimal sets of interacting genes that drive developmental processes and are the core circuits within a gene regulatory network, but we recognize that this might not be as well known, so we have changed the term to “network” for clarity.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors indicate that they are interested in finding out whether POU4-2 is important in the creation of mechanosensory neurons in adulthood as well as in embryogenesis (in other words, whether the mechanism is "reused during adult tissue maintenance and regeneration"). The manuscript clearly shows that planarian POU4 -2 is important in adult neurogenesis in planarians, but there is no evidence presented to show that this is a recapitulation of embryogenesis. Is pou4-2 expressed in the planarian embryo? This might be possible to examine by ISH or through the evaluation of sequencing data that already exists in the literature. 

      We agree that these statements should be precise. We have clarified when we make comparisons to the role of Pou4 in sensory system development in other organisms versus its role in the adult planarian. We examined its expression using the existing database of embryonic gene expression. Thanks for hinting at this idea. We performed BLAST in Planosphere (Davies et al., 2017) to cross-reference our clone matching dd_Smed_v6_30562_0_1, which is identical to SMED30002016. The embryonic gene expression for SMED30002016 indicates this gene is expressed at the expected stages given prior knowledge of the timing of organ development in Schmidtea mediterranea (a positive trend begins at Stage 5, with a marked increase by Stage 6 that remains comparable to the asexual expression levels shown). We thank the reviewer for pointing out this oversight. We have incorporated this result in the paper as a Supplementary Figure and discuss how we can only speculate that it has a similar role as we detect in the adult asexual worms.

      (2) Can it be determined whether the punctate pou4-2+ cells outside of the stripes are progenitors or other neural cell types? Are there pou4-2+ neurons that are not mechanosensory cell types? Could there be other roles for POU4-2 in the neurogenesis of other cell types? It might help to show percentages of overlap in Figure 4A and discuss whether the two populations add up to 100% of cells. 

      These are good questions that arise in part from other statements that need clarification in the text (pointed out by Reviewer 2). We think some of the dorsal pou4-2<sup>+</sup> might represent progenitor cells undergoing terminal differentiation (see Supplementary Figure 4). We attempted BrdU pulse chase experiments but were not successful in consistently detecting pou4-2 at sufficient levels with our protocol. In response to this helpful comment, we have included this question as a future direction in the revised Discussion. Finally, we have edited our description of the expression pattern. We already pointed out that there are other cells on the ventral side that are not affected when soxB1-2 is knocked down. We attempted to resolve the potential identity of those cells working with existing scRNA-seq data in collaboration with colleagues, but their low abundance made it difficult to distinguish other populations. While we acknowledge this interesting possibility, we have chosen to focus this report on the role of pou4-2 downstream of soxB1-2, as this represents the most well-supported aspect of the dataset and was positively highlighted by both the reviewer and editor.

      (3) The authors discuss many genes from their analysis that play conserved roles in mechanosensation and hearing. Were there any conserved genes that came up in the analysis of pou4-2(RNAi) planarians that have not yet been studied in human hearing and neurodevelopment? I am wondering the extent to which planarians could be used as a discovery system for mechanosensory neuron function and development, and discussion of this point might increase the impact of this paper or provide critical rationale for expanding work on planarian mechanosensation. 

      Indeed, we agree that planarians could be used to identify conserved genes with roles in mechanosensation and have included this point in the Discussion. In this study, we have focused on demonstrating the conservation of gene regulation. While this study was initially based on a graduate thesis project, we have since generated a more comprehensive dataset from isolated heads, which we are currently analyzing. This has been emphasized in the revised Discussion.

      Minor: 

      (1) For Figure 6E, the authors could consider showing data along a negative axis to indicate a decrease in length in response to vibration and to more clearly show that this decrease doesn't occur as strongly after pou4-2(RNAi). 

      We displayed this behavior as the percent change, as this is a standard way to represent this data. As the percent change is a positive value, we represent the data as these positive values.

      (2) The authors should consider quantifying the decrease of pou4-2 mRNA after atonal(RNAi) conditions, either by RT-qPCR or cell quantification. Visually, the signal in the stripes after atoh8-2(RNAi) seems lower, particularly in the tail. The punctate pattern outside the stripes may also be decreased after atoh8-1(RNAi). But quantification might strengthen the argument. 

      We agree with the reviewer and acknowledge that we should have been more cautious in interpreting these results. Those two genes are difficult to detect and did not show specific patterns in Cowles et al. (2013). The reviewer is correct that additional experiments are necessary before reaching conclusions, but we do not think as discussed earlier we do not think new experiments would provide insights for the major conclusions. These experiments were exploratory in nature and tangential to our main conclusions, especially in the absence of reciprocal evidence (e.g., shared expression patterns, co-expression, or differential expression in our RNA-seq data. Therefore, we decided to eliminate the atonal in situs following pou4-2 RNAi.

      Reviewer #2 (Recommendations for the authors): 

      A. Expression of pou4-2 in ciliated mechanosensory neurons: 

      (1) The conclusion that pou4-2 is expressed in ciliated mechanosensory neurons is primarily based on co-expression analysis using a published single-cell dataset. Although the authors later show that a subset of pou4-2 cells also express pkd1L-2 (Figure 4A), a known marker of ciliated mechanosensory neurons, this finding is not properly quantified. I recommend moving Figure 4A to earlier in the manuscript (e.g., to Figure 2) and expanding the analysis to include additional known markers of this cell type. Proper quantification of the extent of co-localization is necessary to support the claim robustly. 

      As pointed out by the reviewer, there is substantive evidence from our lab and other reports. King et al. also showed pou4-2 and pkd1L-2 ‘regulation’ by their scRNA-seq data, and this function is conserved in the acoel Hofstenia miamia (Hulett et al., PNAS 2024 ). Our analysis shows convincing co-localization by scRNA-seq and expression of soxB1-2 and neural markers in the respective populations. Furthermore, we included colocalization of pou4-2 with mechanosensory genes using fluorescence in situ hybridization (Figure 3B, Supplementary Figure 4, and Supplementary File S7). We are confident the data conclusively show pou4-2 regulates pkd1L-2 expression in a subset of mechanosensory neurons. Given the strength of existing observations and previously published data, we believe that additional staining experiments are not essential to support this conclusion. 

      (2) There appears to be a conceptual inconsistency in the interpretation of pou4-2 expression dynamics. On one hand, the authors suggest that delayed pou4-2 expression indicates a role in late-stage differentiation (p.6). On the other hand, they propose that pou4-2 may be expressed in undifferentiated progenitors to initiate downstream transcriptional programs (p.8). These interpretations should be reconciled. Additionally, claims regarding pou4-2 expression in progenitor populations should be supported by co-localization with established stem cell or progenitor markers, rather than inferred from signal intensity alone. 

      This is an excellent point, and we agree with the reviewer that this section requires editing. As described in response to Reviewer 1, we attempted BrdU pulse chase experiments but were not successful in consistently detecting pou4-2 at sufficient levels with our protocol. Furthermore, we could not obtain strong signals in double labeling experiments in pou4-2 in situs combined with piwi-1 or PIWI-1 antibodies. We will include those experiments as a future direction and amend our conclusions accordingly.

      (3) The expression pattern shown in Figure 1B raises questions about the precise anatomical localization of pou4-2 cells. It is unclear whether these cells reside in the subepidermal plexus or the deeper submuscular plexus, which represent distinct neuronal layers (Ross et al., 2017). The observed signals near the ventral nerve cords could suggest submuscular localization. To clarify this, higher-resolution imaging and co-staining with region-specific neural markers are recommended. 

      In Ross et al. (2018), we showed that the pkd1L-2<sup>+</sup> cells are located submuscularly. The pkd1L-2 cells express pou4-2, thus the pou4-2<sup>+</sup> cells are located in the same location. Based on co-expression data and co-expression with PKD genes, we are confident it is submuscular.

      B. The functional requirements of pou4-2 in the maintenance of mechanosensory neurons: 

      (1) To evaluate the functional role of pou4-2 in maintaining mechanosensory neurons, the authors performed whole-animal RNA-seq on pou4-2(RNAi) and control animals, identifying a significant downregulation of genes associated with mechanosensory neuron expression. However, the presentation of these findings is fragmented across Figures 3, 4, and 5. I recommend consolidating the RNA-seq results (Figure 3) and the subsequent validation of downregulated genes (Figures 4 and 5) into a single, cohesive figure. This would improve the logical flow and clarity of the manuscript. 

      As suggested by the reviewer, we have combined Figures 3 and 4 (new Figure 3), which we believe improves the flow. We decided to keep Figure 5 (new Figure 4) as a standalone because it focuses on the characterization of new genes revealed by RNAseq and scRNA-seq data mining that were not previously reported in Ross et al. 2018 and

      2024.

      (2) In pou4-2(RNAi) animals, pkd1L-2 expression appears to be entirely lost, while hmcn-1-L shows faint expression in scattered peripheral regions. The authors suggest that an extended RNAi treatment might be necessary to fully eliminate hmcn-1-L expression. However, an alternative explanation is that pou4-2 is not essential for maintaining all hmcn-1-L cells, particularly if pou4-2 expression does not fully overlap with that of hmcn-1-L. This possibility should be acknowledged and discussed. 

      We agree and have acknowledged this point in the revised text.

      (3) On page 9, the section title claims that "Smed-pou4-2 regulates genes involved in ciliated cell structure organization, cell adhesion, and nervous system development." While some differentially expressed genes are indeed annotated with these functions based on homology, the manuscript does not provide experimental evidence supporting their roles in these biological processes in planarians. The title should be revised to avoid overstatement, and the limitations of extrapolating a function solely from gene annotation should be acknowledged. 

      Excellent point. We have edited the text to indicate that the genes were annotated or implicated.

      (4) The cilia staining presented in Figure 6B to support the claim that pou4-2 is required for ciliated cell structure organization is unconvincing. Improved imaging and more targeted analysis (e.g., co-labeling with mechanosensory markers) are needed to support this conclusion. 

      We have addressed this concern by adjusting the language to be more precise and indicate that the stereotypical banded pattern is disrupted with decreased cilia labeling along the dorsal ciliated stripe. Indeed, our conclusion overstated the observations made with the staining and imaging resolution. Thank you.

      C. The functional requirements of pou4-2 in the regeneration of mechanosensory neurons: 

      To evaluate the role of pou4-2 in the regeneration of mechanosensory neurons, the authors performed amputations on pou4-2(RNAi) and control(RNAi) animals and assessed the expression of mechanosensory markers (pkd1L-2, hmcn-1-L) alongside a functional assay. However, the results shown in Figure 4B indicate the presence of numerous pkd1L-2 and hmcn-1-L cells in the blastema of pou4-2(RNAi) animals. This observation raises the possibility that pou4-2 may not be essential for the regeneration of these mechanosensory neurons. The authors should address this alternative interpretation. 

      Our interpretation is that there were very few cells expressing the markers compared to controls. The pattern was predominantly lost, which is consistent with other experiments shown in the paper. However, we have added the additional caveat suggested by the reviewer.

      Minor points: 

      (1) On p.8, the authors wrote "every 12 hours post-irradiation". However, this is not consistent with the figure, which only shows 0, 3, 4, 4.5, 5, and 5.5 dpi. 

      We corrected this. Thank you for catching the mistake!

      (2) On p.12, the authors wrote "Analysis of pou4-2 RNAi data revealed differentially expressed genes with known roles in mechanosensory functions, such as loxhd-1, cdh23, and myo7a. Mutations in these genes can cause a loss of mechanosensation/transduction". This is misleading because, to my knowledge, the role of these genes in planarians is unknown. If the authors meant other model systems, they should clearly state this in the text and include proper references. 

      The reviewer is correct that we are referencing findings from other organisms. We have clarified this point in the revised text. The appropriate references were included and cited in the first version.

      (3) On p.7, the authors wrote, "conversely, the expression of atonal genes was unaffected in pou4-2 RNAi-treated regenerates (Supplementary Figure S2B)". However, it is unclear whether the Atoh8-1 and Atoh8-2 signals are real, as the quality of the in situ results is too low to distinguish between real signals and background noise/non-specific staining. 

      This valid concern was addressed in our response to Reviewer 1. We have adjusted the figure and the text accordingly.

      (4) On p.6 the authors wrote "pinpointed time points wherein the pou4-2 transcripts were robustly downregulated". However, the current version of the manuscript does not provide data explaining why Pou4-2 transcripts are robustly downregulated on day 12. 

      Yes, we determined the appropriate time points using qPCR for all sample extractions. As an example, see the figure for qPCR validation at day 12 showing that pou4-2 and pkd1L2 are down.

      Author response image 1.

      In this graph, samples labeled “G” represent four biological controls of gfp(RNAi) control animals, and samples labeled “P” represent four biological controls of pou4-2(RNAi)animals at day 12 in the RNAi protocol.

      (5) On p.13, the authors wrote "collecting RNA from how animals." Is this a typo? 

      Thanks for catching the typo. It should read “whole” animals. We have corrected this.

      (6) On p.14, the authors wrote "but the expression patterns of planarian atonal genes indicated that they represent completely different cell populations from pou4-2-regulated mechanosensory neurons". However, this is unclear from the images, as the in situ staining of Atoh8-1 and Atoh82 are potentially failed stainings. 

      We agree. We have edited accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This paper is a relevant overview of the currently published literature on lowintensity focused ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems nevertheless relevant to summarize the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data-free permutation of the parameter space.

      The meta-analysis concludes that there are, at best, weak trends toward specific parameters predicting the direction of the stimulation effects. The data have been incorporated into an open database that will ideally continue to be populated by the community and thereby become a helpful resource as the field moves forward.

      Strengths:

      The current state of human TUS is concisely and well summarized. The methods of the meta-analysis are appropriate. The database is a valuable resource.

      We thank the reviewer for their positive assessment of the revised manuscript and the potential importance of the resource to the TUS community. 

      Suggestions:

      The paper remains lengthy and somewhat unfocused, to the detriment of readability. One can understand that the authors wish to include as much information as possible, but this reviewer is sceptical that this will aid the use of the databank, or help broaden the readership. For one, there is a good chunk of repetition throughout. The intro is also somewhat oscillating between TMS, tDCS and TUS. While the former two help contextualizing the issue, it doesn't seem necessary. In the section on clinical applications of TUs and possible outcomes of TUS, there's an imbalance of the content across examples. That's in part because of the difference in knowledge base but some sections could probably be shortened, eg stroke. In any case, the authors may want to consider whether it is worth making some additional effort in pruning the paper

      We thank the reviewer for these suggestions. We have checked for redundancy and that the clinical review section is more balanced, although some of the sections have more TUS studies than others, therefore some imbalance is unavoidable. As some examples, we have condensed the “Stroke and neuroprotection in brain injury” section (lines 624-647). This helps to improve the clarity and readability of the manuscript.

      The terms or concept of enhancement and suppression warrant a clearer definition and usage. In most cases, the authors refer to E/S of neural activity. Perhaps using terms such as "neural enhancement" etc helps distinguish these from eg behavioural or clinical effects. Crucially, how one maps onto the other is not clear. But in any case, a clear statement that the changes outlined on lines 277ff do not

      We thank the reviewer for this point and agree that it is important to distinguish neural E/S, as we had intended, from behavioral effects. In the first instance and in several places we add ‘neural’ before enhancement/suppression.  Also see Lines 276-279: Probable net neural enhancement versus suppression was characterised as follows. Note that our use of the terms enhancement and suppression refers exclusively to the increase or decrease of neural activity, respectively, as measured by, neurophysiological methods (EEG-ERPs, BOLD fMRI, etc.) and does not imply equivalent changes in behavioural responses 

      Please see also lines 108-116.

      Re tb-TUS (lines 382ff), it is worth acknowledging here that independent replication is very limited (eg Bao et al 2024; Fong et al bioRxiv 2024) and seems to indicate rather different effects

      We have updated this section by referencing Bao et al. and Fong et al., as examples of the limited independent replication of tbTUS results. Please see lines 392-396. “However, independent replication of these findings remains limited. For example, Bao, found reduced motor cortex excitability – measured as decreased TMS-MEP amplitude in M1 -- that lasted up to 30 minutes post-sonication (Bao et al., 2024). Whereas Fong reported no significant effects between tbTUS and sham conditions in M1 excitability (Fong et al., 2024).”

      The comparison with TPS is troublesome. For one, that original study was incredibly poorly controlled and designed. Cherry-picking individual (badly conducted) proof-of-principle studies doesn't seem a great way to go about as one can find a match for any desired use or outcome. Moreover, other than the concept of "pulsed" stimulation, it is not clear why that original study would motivate the use of TUS in the way the authors propose; both types of stimulation act in very different ways (if TPS "acts" at all). But surely the cited TPS study does not "demonstrate the capability for TUS for pre-operative cognitive mapping". As an aside, why the authors feel the need to state the "potential for TPS... to enhance cognitive function" is unclear, but it is certainly a non-sequitur. This review feels quite strongly that simplistic analogies such as the one here are unnecessary and misleading, and don't reflect the thoughtful discussion of the rest of the paper. In the other clinical examples, the authors build their suggestions on other TUS studies, which seems more sensible.

      This is an excellent point, and we have removed that statement replacing it with: “However, TPS effects studies remain highly limited and would require further study and comparison to effects with other TUS protocols.”. Please see lines 561-562. We thank the reviewer for the supportive comments on the rest of the review.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analysis of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of lifestyles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of proteincoding genes, including the total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated with insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

      Weaknesses:

      My main concerns all involve the genomic data, how they were annotated, and the biases this could impart to the downstream analyses. The three main features I'm concerned with are sequencing technology, gene annotation, and repeat annotation.

      We thank the reviewer for all the comments. We are aware that the genome assemblies are of heterogeneous quality since they come from many sources. The goal of this study was to make the best use of the existing assemblies, with the assumption that noise introduced by the heterogeneity of sequencing methods should be overcome by the robustness of evolutionary trends and the breadth and number of analyzed assemblies. Therefore, at worst, we would expect a decrease in the power to detect existing trends. It is important to note that the only way to confidently remove all potential biases would be to sequence and analyze all species in the same way; this would require a complete study and is beyond the scope of the work presented here. Nevertheless some biases could affect the results in a negative way, eg. is if they affect fungal lifestyles differently. We therefore made an attempt to explore the impact of sequencing technology, gene and repeat annotation approach among genomes of different fungal lifestyles. Details are described in Supplementary Results and below. Overall, even though the assembly size and annotations conducted with Augustus can sometimes vary compared to annotations from other resources, such as JGI Mycocosm, we do not observe a bias associated with fungal lifestyles. Comparison of annotations conducted with Augustus and JGI Mycocosm dataset revealed variation in gene-related features that reflect biological differences rather than issues with annotation.  

      The collection of genomes is diverse and includes assemblies generated from multiple sequencing technologies including both short- and long-read technologies. Not only has the impact of the sequencing method not been evaluated, but the technology is not even listed in Table S1. From the number of scaffolds it is clear that the quality of the assemblies varies dramatically. This is going to impact many of the values important for this study, including genome size, repeat content, and gene number.

      We have now added sequencing technology in Table S1 as it was reported in NCBI. We evaluated the impact of long-read (Nanopore, PacBio, Sanger) vs short-read assemblies in Supplementary Results. In short, the proportion of different lifestyles (pathogenic vs. nonpathogenic, IA vs non-IA) were the same for short- and long-read assemblies. Indeed, longread assemblies were longer, had a higher fraction of repeats and less genes on average, but the differences between pathogenic vs. non-pathogenic (or IA vs non-IA) species were in the same direction for two sequencing technologies and in line with our results. There were some discrepancies, eg. mean intron length was longer for pathogens with long-read assemblies, but slightly shorter on average for short-read assemblies (and to lesser extent GC and pseudo tRNA count), which could explain weaker or mixed results in our study for these features.

      Additionally, since some filtering was employed for small contigs, this could also bias the results.

      The reason behind setting the lower contig length threshold was the fact that assemblies submitted to NCBI have varying lower-length thresholds. This is because assemblers do not output contigs above a certain length, and this threshold can be manipulated by the user. Setting a common min contig length was meant to remove this variation, knowing that any length cut-off will have a larger effect on short-read based assemblies than long-read-based assemblies. Notably, genome assemblies of corresponding species in JGI Mycocosm have a minimum contig length of 865 bp, not much lower than in our dataset. Importantly, in a response to a comment of previous reviewer, repeat content was recalculated on raw assembly lengths instead of on filtered assembly length. 

      I have considerable worries that the gene annotation methods could impart biases that significantly affect the main conclusions. Only 5 reference training sets were used for the Sordariomycetes and these are unequally distributed across the phylogeny. Augusts obviously performed less than ideally, as the authors reported that it under-annotated the genomes by 10%. I suspect it will have performed worse with increasing phylogenetic distance from the reference genomes. None of the species used for training were insectassociated, except for those generated by the authors for this study. As this feature was used to split the data it could impact the results. Some major results rely explicitly on having good gene annotations, like exon length, adding to these concerns. Looking manually at Table S1 at Ophiostoma, it does seem to be a general trend that the genomes annotated with Magnaporthe grisea have shorter exons than those annotated with H294. I also wonder if many of the trends evident in Figure 5 are also the result of these biases. Clades H1 and G each contain a species used in the training and have an increase in genes for example.

      We have applied 6 different reference training sets (instead of one) precisely to address the problem of increasing phylogenetic distance of annotated species. To further investigate the impact of chosen species for training, we plotted five gene features (number of genes, number of introns, intron length, exon length, fraction of genes with introns) as a function of   branch length distance from the species (or genus) used as a training set for annotation. We don’t see systematic biases across different training sets. However,  trends are very clear for clades annotated with fusarium. This set of species includes Hypocreales and Microascales, which is indeed unfortunate since Microascales is an IA group and at the same time the most distant from the fusarium genus in this set. To clarify if this trend is related to annotation bias or a biological trend, we compared gene annotations with those of Mycocosm, between Hypocreales Fusarium species, Hypocreales non-Fusarium species, and Microascales, and we observe exactly the same trends in all gene features. 

      Similarly, among species that were annotated with magnaporthe_grisea, Ophiostomatales (another IA group) are among the most distant from the training set species. Here, however, another order, Diaporthales, is similarly distant, yet the two orders display different feature ranges. In terms of exon length, top 2 species in this training set include Ophiostoma, and they reach similar exon length as the Ophiostoma species annotated using H294 as a training set. In summary, it is possible that the choice of annotation species has some effect on feature values; however, in this dataset, these biases are likely mitigated by biological differences among lifestyles and clades. 

      Unfortunately, the genomes available from NCBI will vary greatly in the quality of their repeat masking. While some will have been masked using custom libraries generated with software like Repeatmodeler, others will probably have been masked with public databases like repbase. As public databases are again biased towards certain species (Fusarium is well represented in repbase for example), this could have significant impacts on estimating repeat content. Additionally, even custom libraries can be problematic as some software (like RepeatModeler) will include multicopy host genes leading to bona fide genes being masked if proper filtering is not employed. A more consistent repeat masking pipeline would add to the robustness of the conclusions.

      We have searched for the same species in JGI Mycocosm and were able to retrieve 58 genome assemblies with matching species, with 19 of them belonging to the same strain as in our dataset. Overall we found no differences in genome assembly length. Interestingly, repeat content was slightly higher for NCBI genome assemblies compared to JGI Mycocosm assemblies, perhaps due to masking of host multicopy genes, as the reviewer mentioned. By comparing pathogenic and non-pathogenic species for the same 19 strains, we observe that JGI Mycocosm annotates fewer repeats in pathogenic species than Augustus annotations (but trends are similar when taking into account 58 matching species). Given a small number of samples, it is hard to draw any strong conclusions; however, the differences that we see are in favor of our general results showing no (or negative) correlation of repeat content with pathogenicity. 

      To a lesser degree, I wonder what impact the use of representative genomes for a species has on the analyses. Some species vary greatly in genome size, repeat content, and architecture among strains. I understand that it is difficult to address in this type of analysis, but it could be discussed.

      In our case the use of protein sequences could underestimate divergence between closely related strains from the same species. We also excluded strains of the same species to avoid overrepresentation of closely related strains with similar lifestyle traits. We agree that some changes in the genome architecture can occur very rapidly, even at the species level, though analyzing emergence of eg. pathogenicity at the population level would require a slightly different approach which accounts for population-level processes. 

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strength of this study lies in the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes.

      Weaknesses:

      The main strength of the study is not the clarity of the conclusions.

      (1) This is due firstly to the presentation of the hypotheses. The introduction is poorly structured and contradictory in some places. It is also incomplete since, for example, fungusinsect associations are not mentioned in the introduction even though they are explicitly considered in the analyses.

      We thank the reviewer for pointing this out. We strived to address all comments and suggestions of the reviewer to clarify the message and remove the contradictions. We also added information about why we included insect-association trait in our analysis. 

      (2) The lack of clarity also stems from certain biases that are challenging to control in microbial comparative genomics. Indeed, defining lifestyles is complicated because many fungi exhibit different lifestyles throughout their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In numerous fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which doesn't mean that this substrate is a crucial aspect of the life cycle. This issue is discussed by the authors, but they do not eliminate the underlying uncertainties.

      We agree with the reviewer that lack of certainty in the lifestyle or range of possible lifestyles of studied species is a weakness in this analysis. We are limited by the information available in the literature. We hope that our study will increase interest in collecting such data in the future.

      Reviewer #3 (Public review):

      Summary:

      This important study combines comparative genomics with other validation methods to identify the factors that mediate genome size evolution in Sordariomycetes fungi and their relationship with lifestyle. The study provides insights into genome architecture traits in this Ascomycete group, finding that, rather than transposons, the size of their genomes is often influenced by gene gain and loss. With an excellent dataset and robust statistical support, this work contributes valuable insights into genome size evolution in Sordariomycetes, a topic of interest to both the biological and bioinformatics communities.

      Strengths:

      This study is complete and well-structured.

      Bioinformatics analysis is always backed by good sampling and statistical methods. Also, the graphic part is intuitive and complementary to the text.

      Weaknesses:

      The work is great in general, I just had issues with the Figure 1B interpretation.

      I struggled a bit to find the correspondence between this sentence: "Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the size of the assembly excluding repeats and the number of genes (Figure 1B)." and the Figure 1B. Perhaps highlighting the key p values in the figure could help.

      We thank the reviewer for pointing out this sentence. Perhaps the misunderstanding comes from the fact that in this sentence one variable is missing. The correct version should be “Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the genome size, the genome size excluding repeats and the number of genes (Figure 1B)”. Also, the variable names now correspond better to those shown on the figure.

      Reviewer #1 (Recommendations for the authors):

      The authors have clearly done a lot of good work, and I think this study is worthwhile. I understand that my concerns about the underlying data could necessitate rerunning the entire analysis with better gene models, but there may be another option. JGI has a fairly standard pipeline for gene and repeat annotation. Their gene predictions are based on RNA data from the sequenced strain and should be quite good in general. One could either compare the annotations from this manuscript to those in mycocosm for genomes that are identical and see if there are systematic biases, or rerun some analyses on a subset of genomes from mycocosm. Indeed, it's possible that the large dataset used here compensates for the above concerns, but without some attempt to evaluate these issues, it's difficult to have confidence in the results.

      We very appreciate the positive reception of our manuscript. Following the reviewer’s comments we have investigated gene annotations in comparison with those of JGI Mycocosm, even though only 58 species were matching and only 19 of them were from the same strain. This dataset is not representative of the Sordariomycetes diversity (most species come from one clade), therefore will not reflect the results we obtained in this study. To note, the reason for not choosing JGI Mycocosm in the first place, was the poor representation of the insect-associated species, which we found key in this study. In general, we found that assembly lengths were nearly identical, number of genes was higher, and the repeat content was lower for the JGI Mycocosm dataset. When comparing different lifestyles (in particular pathogens vs. non-pathogens), we found the same differences for our and JGI Mycocosm annotations, with one exception being the repeat content. In the small subset (19 same-strain assemblies), our dataset showed the same level of repeats between the two lifestyles, whereas JGI Mycocosm showed lower repeat content for pathogens (but notably for all 58 species, the trend was same for our and JGI Mycocosm annotations). None of these observations are in conflict with our results where we find no or negative association of repeat content with pathogens. 

      The figures are very information-dense. While I accept that this is somewhat of a necessity for presenting this type of study, if the authors could summarize the important information in easier-to-interpret plots, that could help improve readability.

      We put a lot of effort into showing these complicated results in as approachable manner as possible. Given that other reviewers find them intuitive we decided to keep most of them as they are. To add more clarification, we added one supplementary figure showing distributions of genomic traits across lifestyles. Moreover, in Figure 5, a phylogenetic tree was added with position of selected clades, as well as a scatterplot showing distributions of mean values for genome size and number of genes for those clades. If the reviewer has any specific suggestions on what to improve and in which figure, we’re happy to consider it. 

      Reviewer #2 (Recommendations for the authors):

      I have no major comments on the analyses, which have already been extensively revised. My major criticism is the presentation of the background, which is very insufficient to understand the importance or relevance of the results presented fully.

      Lines are not numbered, unfortunately, which will not help the reading of my review.

      (1) The introduction could better present the background and hypotheses:

      (a) After reading the introduction, I still didn't have a clear understanding of the specific 'genome features' the study focuses on. The introduction fails to clearly outline the current knowledge about the genetic basis of the pathogenic lifestyle: What is known, what remains unknown, what constitutes a correlation, and what has been demonstrated? This lack of clarity makes reading difficult.

      We thank the reviewer for pointing this out. We have now included in the introduction a list of genomic traits we focus on. We also tried to be more precise about demonstrated pathogenic traits and other correlated traits in the introduction. 

      (b) Page 3. « Various features of the genome have been implicated in the evolution of the pathogenic lifestyle. » The cited studies did not genuinely link genome features to lifestyle, so the authors can't use « implicated in » - correlation does not imply causation.

      This sentence also somehow contradicts the one at the end of the paragraph: « we still have limited knowledge of which genomic features are specific to pathogenic lifestyle

      We thank the reviewer for this comment. We added a phrase “correlated with or implicated in” and changed the last sentence of the paragraph into “Yet we still have limited knowledge of how important and frequent different genomic processes are in the evolution of pathogenicity across phylogenetically distinct groups of fungi and whether we can use genomic signatures left by some of these processes as predictors of pathogenic state.”.

      (c) Page 3: « Fungal pathogen genomes, and in particular fungal plant pathogen genomes have been often linked to large sizes with expansions of TEs, and a unique presence of a compartmentalized genome with fast and slow evolving regions or chromosomes » Do the authors really need to say « often »? Do they really know how often?

      We removed “often”.

      (d) Such accessory genomic compartments were shown to facilitate the fast evolution of effectors (Dong, Raffaele, and Kamoun 2015) ». The cited paper doesn't « show » that genomic compartments facilitate the fast evolution of effectors. It's just an observation that there might be a correlation. It's an opinion piece, not a research manuscript.

      We changed the sentence to “Such accessory genomic compartments could facilitate the fast evolution of effectors”.

      (e) even though such architecture can facilitate pathogen evolution, it is currently recognized more as a side effect of a species evolutionary history rather than a pathogenicity related trait ». This sentence somehow contradicts the following one: « Such accessory genomic compartments were shown to facilitate the fast evolution of effectors".

      Here we wanted to point out that even though accessory genome compartments and TE expansions can facilitate pathogen evolution the origin of such architecture is not linked to pathogenicity. We reformulated the sentence to “Even though such architecture can facilitate pathogen evolution, it is currently recognized that its origin is more likely a side effect of a species evolutionary history rather than being caused by pathogenicity”.

      (f) As the number of genes is strongly correlated with fungal genome size (Stajich 2017), such expansions could be a major contributor to fungal genome size. » This sentence suggests that pathogens might have bigger genomes because they have more effectors. This is contradictory to the sentence right after « At the end of the spectrum are the endoparasites Microsporidia, which have among the smallest known fungal genomes ».

      The authors state that pathogens have bigger genomes and then they take an example of a pathogen that has a minimal genome. I know it's probably because they lost genes following the transition to endoparasitism and not related to their capacity to cause disease. I just want to point out that their writing could be more precise. I invite authors to think of young scholars who are new to the field of fungal evolutionary genomics.

      We thank the reviewer for prompting us to clarify the text. We rewrote this short extract as follows “Notably, not all pathogenic species experience genome or gene expansions, or show compartmentalized genome architecture. While gene family expansions are important for some pathogens, the contrary can be observed in others, such as Microsporidia. Due to transition to obligatory intracellular lifestyle these fungi show signatures of strong genome contractions and reduced gene repertoire (Katinka et al. 2001) without compromising their ability to induce disease in the host. This raises questions about universal genomic mechanisms of transition to pathogenic state.”

      (g) I find it strange that the authors do not cite - and do not present the major results of two other studies that use the same type of approach and ask the same type of question in Sordariomycetes, although not focusing on pathogenicity:

      Hensen et al.: https://pubmed.ncbi.nlm.nih.gov/37820761/

      Shen et al.: https://pubmed.ncbi.nlm.nih.gov/33148650/

      We thank the reviewer for pointing out this omission. We now added more information in the introduction to highlight the importance of the phylogenetic context in studying genome evolution as demonstrated by these studies. The following part was added to introduction:  “Other phylogenomic studies investigating a wide range of Ascomycete species, while not explicitly focusing on the neutral evolution hypothesis, have found strong phylogenetic signals in genome evolution, reflected in distinct genome characteristics (e.g., genome size, gene number, intron number, repeat content) across lineages or families (Shen et al. 2020; Hensen et al. 2023). Variation in genome size has been shown to correlate with the activity of the repeat-induced point mutation (RIP) mechanism (Hensen et al. 2023; Badet and Croll 2025), by which repeated DNA is targeted and mutated. RIP can potentially lead to a slower rate of emergence of new genes via duplication (Galagan et al. 2003), and hinder TE proliferation limiting genome size expansion (Badet and Croll 2025). Variation in genome dynamics across lineages has also been suggested to result from environmental context and lifestyle strategies (Shen et al. 2020), with Saccharomycotina yeast fungi showing reductive genome evolution and Pezizomycotina filamentous fungi exhibiting frequent gene family expansions. Given the strong impact of phylogenetic membership,  demographic history (Ne) and host-specific adaptations of pathogens on their genomes, we reasoned that further examination of genomic sequences in groups of species with various lifestyles can generate predictions regarding the architecture of pathogenic genomes.”

      (h) Genome defense mechanisms against repeated elements, such as RIP, are not mentioned while they could have a major impact on genome size (Hensen et al cited above; Badet and Croll https://www.biorxiv.org/content/10.1101/2025.01.10.632494v1.full).

      This citation is added in the text above.

      (i) Should the reader assume that the genome features to be examined are those mentioned in the first paragraph or those in the penultimate one?

      In the last paragraph of the introduction we included the complete list of investigated genomic traits.

      (j) The insect-associated lifestyle is mentioned only in the research questions on page 4, but not earlier in the introduction. Why should we care about insect-associated fungi?

      We apologize for this omission. We added a sentence explaining how neutral evolution hypotheses can explain patterns of genome evolution in endoparasites and species with specialized vectors (traits present in insect-associated species) and added a sentence in the last paragraph that this is the reason why we have selected this trait for analysis.  

      (2) Why use concatenation to infer phylogeny?

      (a) Kapli et al. https://pubmed.ncbi.nlm.nih.gov/32424311/ « Analyses of both simulated and empirical data suggest that full likelihood methods are superior to the approximate coalescent methods and to concatenation »

      (b) It also seems that a homogeneous model was used, and not a partitioned model, while the latter are more powerful. Why?

      We thank the reviewer for the comment. When we were reconstructing the phylogenetic tree  we were not aware of the publication and we followed common practices from literature for phylogenetic tree reconstruction even though currently they are not regarded as most optimal. In fact, in the first round of submission, we have included both concatenation as well as a multispecies coalescent method based on 1000 busco sequences and a concatenation method with different partitions for 250 busco sequences. All three methods produced similar topologies. Since the results were concordant, we chose to omit these analyses from the manuscript to streamline the presentation and focus on the most important results.

      (3) Other comments:

      Is there a table listing lifestyles?

      Yes, lifestyles (pathogenicity and insect-association) are listed in Supplementary Table S1. 

      (4) Summary:

      (a) seemingly similar pathogens »: meaning unclear; on what basis are they similar? why « seemingly »?

      We removed “seemingly” from the sentence.

      (b) Page 4: what's the difference between genome feature and genome trait?

      There is no difference. We apologize for the confusion. We changed “feature” to “trait” whenever it refers to the specific 13 genomic traits analyzed in this study.

      (c) Page 22: Braker, not Breaker

      corrected

      What do the authors mean when they write that genes were predicted with Augustus and Braker? Do they mean that the two sets of gene models were combined? Gene counts are based on Augustus (P24): why not Braker?

      We only meant here that gene annotation was performed using Braker pipeline, which uses a particular version of Augustus. We corrected the sentence.

      (d) Figure 2B and 2C:

      'Undetermined sign' or 'Positive/Negative' would be better than « YES » or it's just impossible to understand the figure without reading the legend.

      We changed “YES” to “UNDETERMINED SIGN” as suggested by the reviewer.

    1. Author response:

      Reviewer #1 (Public review):

      Chaiyasitdhi et al. set out to investigate the detailed ultrastructure of the scolopidia in the locust Müller's organ, the geometry of the forces delivered to these scolopidia during natural stimulation, and the direction of forces that are most effective at eliciting transduction currents. To study the ultrastructure, they used the FIB-SEM technique, to study the geometry of natural stimulation, they used OCT vibrometry and high-speed light microscopy, and to study transduction currents, they used patch clamp physiology.

      Strengths:

      I believe that the ultrastructural description of the locust scolopidium is excellent and the first of its kind in any insect system. In particular, the finding of the bend in the dendritic cilium and the position of the ciliary dilation are interesting, and it would be interesting to see whether these are common features within the huge diversity of insect chordotonal organs.

      Thank you very much for your comments. We indeed plan to extend and continue our approach to exploit and understand diverse chordotonal organs in insects and crustaceans.

      I believe the use of OCT to measure organ movements is a significant strength of this paper; however, using ex vivo preparations undermines any conclusions drawn about the system's in vivo mechanics.

      Having re-read the manuscript, we failed to explicitly describe our ex vivo preparation of Müller’s organ including key references that detail the largely retained physiological function of Müller’s organ. We have now revised this detail in the method section:

      “We used an excised locust ear preparation for all experiments, following a previously described dissection protocol [9]. In short, the tympanum, with Muller’s organ attached was left intact suspended between the cuticular rim. The cuticular rim of the tympanum was fixed into a hole in a preparation dish that allowed Muller’s organ to be submerged with extracellular saline, whilst the outside of the tympanum was dry and could be stimulated with airborne sound. This ex vivo preparation of Muller’s organ retained frequency tuning (Warren & Matheson, 2018), similar electrophysiological function as freshly dissected Muller’s organs (Hill, 1983a, 1983b; Michelsen, 1968: frequency discrimination in the locust ear by means of four groups of receptor cells), and amplitude coding (Warren & Matheson, 2018). Since Müller’s organ is backed by an air-filled trachea in vivo, the addition of saline solution in the ex vivo preparation decreased its displacements ~100 fold due to a dampening effect (Warren et al., 2020).”

      And in the last section of the introduction:

      “Here, we combined FIB-SEM to resolve the 3D ultrastructure of a scolopidium, OCT and high-speed microscopy to examine sound-evoked motion at both the organ and individual scolopidium levels, and direct mechanical stimulation of the scolopale cap, where the ciliary tip is anchored, whilst simultaneously recording transduction currents. Here, Muller’s organ and the tympanum was excised from the locust for physiological experiments. This ex vivo preparation of Muller’s organ retained frequency tuning, amplitude coding and electrophysiological function. This preparation also permitted the enzymatic isolation of individual scolopidia whilst recording transduction currents (Warren & Matheson, 2018).”  

      To further clarify physiological differences between the in vivo and ex vivo operation of the tympanum and Müller’s organ, we will perform an additional experiment for the revised manuscript by quantifying the changes in the sound-evoked tonotopic travelling wave of the tympanum using Laser Doppler Vibrometry (LDV). This result will be added to the Supplementary Text.

      The choice of Group III scolopidia is also good. Research on the mechanics of locust tympana has shown that travelling waves are formed on the tympanum and waves of different frequencies show highest amplitudes at different positions on the tympanum, and therefore also on different groups of scolopidia within the Müller's organ (Windmill et al, 2005; 2008, and Malkin et al, 2013). The lowest frequency modal waves (F0) observed by Windmill et al 2008 were at about 4.4 kHz, which are slightly higher than the ~3 kHz frequencies studied in this paper but do show large deflections where these group III scolopidia attach at the styliform body (Windmill et al, 2005).

      Thank you very much. We accept that the frequencies studied in this manuscript were lower than the lowest modal wave observed by Windmill et al., 2008. Other authors, according to Jacobs et al. 1999, found broad tuning form 3.4-3.74 kHz (Michelson et al., 1971) and 2-3.5 kHz (Halex et al., 1988). We settled on tuning previously measured for Group-III neurons in the same kind of preparation as in this manuscript, which was broadly around 3 kHz (Warren & Matheson, 2018).

      This should be mentioned in the paper since the electrophysiology justification to use group III neurons is less convincing, given that Jacobs et al 1999 clearly point out that group III neurons are very variable and some of them are tuned much higher to 10 kHz, and others even higher to 20-30 kHz.

      Looking at Fig. 7 from Jacobs et al., 1999, we indeed see that the four Group-III neurons recorded in this study are broadly tuned to 3-4 kHz. Often these tuning curves have threshold dips at higher frequencies at least 20 dB higher. We settled on the most sensitive frequency that we previously measured, and which also overlaps the most sensitive frequencies from several other studies.

      Weaknesses:

      Specifically, it is understandable that the authors decided to use excised ears for the light microscopy, where Müller's organ would not be accessible in situ. However, it is very likely that excision will change the system's mechanics, especially since any tension or support to Müller's organ will be ablated.

      We completely understand this criticism. We have now added descriptions in the methodology and introduction (as detailed previously). In short, the tympanum was left intact suspended on the cuticle. Müller’s organ retains all (measured) physiological properties: frequency tuning, amplitude coding and electrophysiological function. To further investigate whether this excised preparation is a representative of the in vivo conditions, we plan to measure tympanal mechanics, such as the travelling wave, as part of the revisions.

      OCT enables in vivo measurements in fully undissected systems (Mhatre et al, Biorxiv, 2021) or in systems with minimal dissection where the mechanics have not been compromised (Vavakou et al, 2021). The choice to entirely dissect out the membrane is difficult to understand here.

      The pioneering OCT works by Mhatre et al, Biorxiv, 2021 and Vavakou et al, 2021 set the new standard of in vivo measurements in the field. We also totally agree with Reviewer#1’s view that OCT is best performed on in vivo Müller’s organ and we tried OCT imaging of Müller’s organ for several months in vivo. Although the OCT penetrates the tympanum the OCT beam does not penetrate the tracheal air sac that surrounds Müller’s organ and therefore OCT cannot be used in vivo. Please also see previous comment with regards to the intact physiological operation of Muller’s organ in the ex vivo preparation.

      My main concern with this paper, however, is the use of light microscopy very close to the Nyquist limit to study scolopidial motion, and the fact that the OCT data contradict and do not match the light microscopy data. The light microscopy data is collected at ~8 kHz, and hence the Nyquist limit is ~4 kHz. It is possible to measure frequencies reliably this close to the limit, but the amplitude of motion is quite likely to be underestimated, given that the technique only provides 2 sample points per cycle at 4 kHz and approximately 2.66 sample points at 3 kHz. At that temporal resolution, the samples are much more likely to miss the peak of the wave than not, and therefore, amplitudes will be mis-estimated. A much more reasonable sample rate for amplitude estimation is generally about 10 samples per cycle. I do not believe the data from the microscopy is reliable for what the authors wish to use them for.

      We understand your concern that the study of sound-evoked motion of the scolopidium using light microscopy was done near the Nyquist limit (with our average sampling rate at 8.6 ± 0.3 kHz and the Nyquist limit at 4.3 kHz). We also agree with your comment that amplitude of the motion could be underestimated at frequencies closer to the limit. However, we find that this systematic error does not change the key observation from our direct light microscopy observation that axial stretch of the scolopidium occurs around 3 kHz.

      To address this concern, we plan to study the scolopidial motion within Group 1 auditory neurons, which are tuned to lower frequencies (0.5-1.5 kHz). This new set of data will allow us to obtain more data points per cycle (up to ~8.6 data points at 1 kHz). We will consider adding this result into the revised Fig. 4 or its extended data.

      Regarding increasing the sampling rate, we did try to achieve higher sampling rate (> 10 kHz), however, there is a technical limitation of our camera and a trade-off between other key parameters, such as the size of the region of interest (ROI) and magnification. To increase the sampling rate, we will have to reduce the magnification or the ROI and in turn lose the spatial resolution required for quantification of the scolopidial motion or the ROI does not cover the whole scolopidial motion. The sampling rate at 8.6 ± 0.3 kHz was the best we could achieve.

      Using the light microscopy data, the authors claim that the strains experienced by the group III scolopidia at 3 kHz are greater along the AP axis than the ML axis (Figure 4). However, this is contradicted by the OCT data, which show very low strain along the AP axis (black traces) at and around 3 kHz (Figure 3c and extended data Figure 2f) and show some movement along the ML axis (red traces, same figures). The phase at low amplitudes of motion cannot be considered very reliable either, and hence phase variations at these frequencies in the OCT cannot be considered reliable indicators of AP motion; hence, I'm unclear whether the vector difference in the OCT is a reliable indicator of movement.

      This is our fault for not clearly explaining the orientation of the light microscopy measurement, which then leads to the reviewer’s concern about contradiction between OCT and light microscopy. Our OCT measurements was done along the Antero-Posterior (AP) and Mesio-Lateral axes (ML), while the axial stretch of the scolopidium occurs along the Dorso-Ventral (DV) axis. We recognise that the anatomical references in this manuscript can be confusing, and we tried to show the orientation of the scolopidium relative to Müller’s organ in Fig. 3b. To further clarify the orientation of our observations, we will add anatomical references in Fig. 4a and Fig. 5a. in the revised manuscript.

      As stated in our result section (Line 165-167)

      “Notably, we could not resolve the Group-III scolopidia along the ventro-dorsal axis—which runs parallel to the dendrite—as the OCT beam was obstructed by either the cuticle or the elevated process”

      We did try to perform OCT measurement along the VD axis, but we could not resolve the scolopidial region along the scolopidial or ciliary axes because the OCT beam could not go through the thick cuticle at the edge of the tympanic membrane and the elevated process. For this reason, it is impossible for us to find an agreement or rule out any contradiction between the OCT and light microscopy since they are measuring motion along different axes. We plan to address this accessibility issue in a separate work using OCT measurements in combination with mirrors.

      The OCT data are significantly more reliable as they are acquired at an appropriate sampling rate of 90 kHz. The authors do not mention what microphone they use to monitor or calibrate their sound field and phase measurements in OCT, but I presume this was done since it is the norm.

      We use a condenser microphone (MK301, Microtech) and measuring amplifier (type 2610, Brüle & Kjær) for calibration. The calibration microphone was also calibrated beforehand using  a sound calibrator type 4231 from B&K.

      Thus, the OCT data show that the movement within the Müller's organ is complex, probably traces an ellipse at some frequencies as observed in bushcrickets (Vavkou et al, 2021) and also thought to be the case in tree crickets based on the known attachment points of the tympanal organ (Mhatre et al, 2021). The OCT data shows relatively low AP motion at frequencies near 3 kHz, and higher ML motion, which contradicts the less reliable light microscopy data. Given that the locust membrane shows peaks in motion at ~4.5 kHz, ~11 kHz, and also at ~20 kHz (Windmill et al, 2008), I am surprised that the authors limited their OCT experiments and analyses to 5 kHz.

      We found that immediately above 5 kHz the displacements reduced to undetectable magnitudes. We accept that there may be other modes of vibration at higher frequencies >10 kHz (based on Jacobs et al., 1999) that we could have detected with OCT. However, we focused our analysis on Group-III neurons at the best frequency and frequencies that we could cross-compere between our high-speed imaging system and OCT.

      In summary for this section, I am not convinced of the conclusion drawn by the authors that group III scolopidia receive significantly higher stimulation along the AP axis in their native configuration, if indeed they were studied in the appropriate force regime (altered due to excision).

      Again, we accept our faults for not clearly displaying the anatomical references of the scolopidial and ciliary axes in Fig. 4 and Fig. 5. We also did not clearly describe in detail that our ex vivo preparation largely retains its physiological properties. We will address the errors of our measurement near Nyquist and provide additional information from Group 1 scolopidia where we could achieve higher data points per cycle.

      In the scolopidial patch clamp data, the authors study transduction currents in response to steady state stimulation along the AP axis and the ML axis. The responses to steady state and periodic forces may well be different, and the authors do not offer us a way to clearly relate the two and therefore, to interpret the data.

      We will revise the Fig. 5a to clarify that the push-pull were done along the Dorso-Ventral (DV) axis and the push-pull were done along the Antero-Posterior (AP) axis. We do agree that steady-state and periodic forces may well be very different. However, valuable insight can be gained from mechanical systems when displaced outside of their normal physiological frequency (e.g. the transformative work on vertebrate hair bundle mechanics, Howard & Hudspeth, 1988). For the same reason, we believe artificial stimulation of the scolopidium gives us new and crucial information to understand scolopidial mechanics. Our main finding that stretch is the dominant stimulus should still, or at least provide strong support, that stretch is the dominant stimulus in periodical motion.

      In addition, both stimulation types, along the AP axis and the ML, elicit clear transduction responses. Stimulation along the AP axis might be slightly higher, but there is over 40% variation around the mean in one case (pull: 26.22 {plus minus} 10.99 pA) and close to 80% variation in the other (push: 10.96 {plus minus} 8.59 pA). These data are indeed from a very high displacement range (2000 nm), which is very high compared to the native displacement levels, which are in the 1-10 nm range.

      In this experiment, we wished to establish the upper limits (and plateau region) of displacement-transduction current response. However, even at 2000 nm we still did not see a plateau. Therefore, we believe that the strain on the scolopidium is still in the operating range even though our displacement is not. This discrepancy can be explained because the base of the scolopidium is not fixed. Therefore, the displacement imposed in our experiment is not equivalent to the strain on the cilium but a combination of pulling and stretching along the length of the dendrite. The force, however, remains along that particular axis, supporting our main finding.

      Another important consideration is that the cilium is surrounded by the scolopale wall. It is assumed that the scolopale wall is far stiffer than the ciliary and will therefore limit the amount of ciliary strain.

      The factor change from sample to sample is not reported and is small even overall. The statistical analyses of these data are not clearly reported, and I don't see the results of the overall ANOVA in the results section.

      We reported the statistical analyses in the Fig. 5 Source Data. We will now add tables displaying these statistics in the supplementary text of the revised manuscript.

      I also find the dip in the reported transduction currents between 10 and 100 nm quite odd (Figure 5 j-m) and would like to know what the authors' interpretation of this behaviour is. It seems to me that those currents increase continuously linearly after ~50-100 nm and that the data below that range are in the noise. Thus, the transduction currents observed at the relevant displacement range (1-10 nm) may not actually be reliable. How were these small displacements achieved, and how closely were the actual levels monitored? Is it possible to reliably deliver 1-10 nm displacements using a micromanipulator?

      One interpretation is that the cilium has both sensitive and insensitive mechanically gated ion channels. A finding that is also supported by Effertz et al., 2012. We will add a sentence in the discussion highlighting this interpretation. We will also provide our calibration of displacement vs voltage delivered to the piezo in the Supplementary Text.

      What is clear, despite the difficulty in interpreting this data, is that both AP and ML stimulation evoke transduction currents, and their relative differences are small. Additionally, in Müller's organ itself, in the excised organ, the scolopidia are stimulated along both axes. Thus, in my opinion, it is not possible to say that axial stretch along the cilium is 'the key mechanical input that activates mechano-electrical transduction'.

      We confirm that the scolopidia are displaced along both. We also note that displacements of the scolopidium limited to the up-down axis will also produce a strain on the scolopidium along the push-pull axis. However, we tried to disentangle this complex motion by limiting the displacements to one axis during recordings of the transduction current. We found that displacement along the scolopidial axis generated the largest transduction currents. Even though there is large variation our statistical analysis confirmed a significant difference as stated in the result section (Line 283 – 286)

      “Additionally, the transduction current evoked by pull from the resting position was larger than displacement upward, 12.17 ± 5.37 pA (N = 11, n = 11) (Tukey's procedure, p = 1.75e-03, t = -3.83) or downward 7.28 ± 9.76 pA (N = 11, n = 11) (Tukey's procedure, p = 5.10e-06, t = -4.53).”

      The reason for large variation is that the discrete depolarisations (random depolarisations of unknown function and a common feature of chordotonal neurons so far recorded) have a similar magnitude to the transduction current produced by the step displacements. We will highlight these discrete depolarisations in Figure 4d and mention them in the results.

      Reviewer #2 (Public review):

      Summary of strengths and weaknesses:

      Using several techniques-FIB-SEM, OCT, high-speed light microscopy, and electrophysiology-Chaiyasitdhi et al. provide evidence that chordotonal receptors in the locust ear (Müller's organ) sense the stretch of the scolapale cell, primarily of its cilium. Careful measurements certainly show cell stretch, albeit with some inconsistencies regarding best frequencies and amplitudes.

      Thank you very much for acknowledging the strength of our study. Regarding the inconsistencies between best frequencies and amplitude, we believe that this concern largely arises from our faults for not clearly displaying the anatomical references of the scolopidial and ciliary axes in Fig. 4 and Fig. 5. As previously addressed in our response to Reviewer#1, we will add the anatomical references and revised the text to clarify the orientation of our measurements.

      The weakest argument concerns the electrophysiological recordings, because the authors do not show directly that the stimulus stretches the cells. If this latter point can be clarified, then our confidence that ciliary stretch is the proximal stimulus for mechanotransduction will be increased.

      We agree that the displacement is not solely stretching the scolopidium. However, the force is still constrained and acting along the push-pull axis. Due to this reason, we overestimate the displacement required to open the MET channels but stand by our conclusion that stretch is the dominant stimulus. For future work, we wish to devise a technique to mechanically clamp the base of the scolopidium and measure the more physiological relevant current-strain relationship.

      This conclusion will not come as a surprise for workers in the field, as the chordotonal organ is known as a stretch-receptor organ (e.g., Wikipedia). But it is a useful contribution to the field and allows the authors to suggest transduction mechanisms whereby ciliary stretch is transduced into channel opening.

      One of the goals of this manuscript is to highlight the lack of direct evidence for stretch-sensitivity of chordotonal organs, as this is assumed from their structure. More importantly the acceptance of chordotonal organs, as being stretch sensitive does not address the mechanism of how organs work. For instance, one candidate for the MET channel, NompC, is shown to be sensitive to compression (Wang et al., 2021). We find that a preconceived concept of “stretch-sensitive” mechanism, without an appreciation of scolopidium mechanics, cannot explain how NompC can be opened in chordotonal organs.

      P. .E. Howse wrote in his work on ‘The Fine Structure and Functional Organisation of Chordotonal Organs’ in 1968 (Symp. Zool. Soc. Lon.) No. 23

      “There is, however, a common tendency to refer to chordotonal organs in which scolopidia are contained in a connective tissue strand as “stretch receptor”. This is unfortunate in two senses, for firstly the implied function may not have been proved and secondly even if the organ responds to stretch the scolopidia may not.” then he proceeded to cite a pioneering work in the chordotonal organs of the hermit crab by R.C. Taylor (Comp. Biochem. Physiol. 1966) showing that the scolopidia may experience flexing when the connective strand are stretched.

      This work represents the first efforts to investigate the problematic assumption of stretch-sensitivity of scolopidia since it was first highlighted 57 years ago.

      Reviewer #3 (Public review):

      Summary:

      The paper 'A stretching mechanism evokes mechano-electrical transduction in auditory chordotonal neurons' by Chaiyasitdhi et al. presents a study that aims to address the mechanical model for scolopidia in Schistocerca gregaria Müller's organ, the basic mechanosensory units in insect chordotonal organs. The authors combine high-resolution ultrastructural analysis (FIB-SEM), sound-evoked motion tracking (OCT and high-speed light microscopy), and electrophysiological recordings of transduction currents during direct mechanical stimulation of individual scolopidia. They conclude that axial stretching along the ciliary axis is an adequate mechanical stimulus for activating mechanotransduction channels.

      Strengths/Highlights:

      (1) The 3D FIB-SEM reconstruction provides high resolution of scolopidial architecture, including the newly described "scolopale lid" and the full extent of the cilium.

      (2) High-speed microscopy clearly demonstrates axial stretch as the dominant motion component in the auditory receptors, which confirms a long-standing question of what the actual motion of a stretch receptor is upon auditory stimulation.

      (3) Patch-clamp recordings directly link mechanical stretch to transduction currents, a major advance over previous indirect models.

      Weaknesses/Limitations:

      (1) The text is conceptually unclear or written in an unclear manner in some places, for example, when using the proposed model to explain the sensitivity of Nanchung-Inactive in the discussion.

      We will rephrase and make clearer the context of our findings for Nanchung-Inactive mechanism of MET in the introduction and the discussion. We will also refine and simplify unclear text overall.

      (2) The proposed mechanistic models (direct-stretch, stretch-compression, stretch-deformation, stretch-tilt) are compelling but remain speculative without direct molecular or biophysical validation. For example, examining whether the organ is pre-stretched and identifying the mechanical components of cells (tissues), such as the extracellular matrix and cytoskeleton, would help establish the mechanical model and strengthen the conclusion.

      We agree with the speculative nature of our four proposed hypotheses. We have, however, narrowed down from at least ten previous hypotheses (Field and Matheson, 1998). These hypotheses will enable us, and hopefully the field, to test them and more rapidly advance our understanding of how scolopidia work. We will add a section in the discussion as to the best way to experimentally test these four hypotheses (e.g pushing directly onto the cap should elicit sensitive responses for the cap-compression hypothesis).

      (3) To some extent, the weaknesses of the paper are part of its strengths and vice versa. For example, the direct push/pull and up/down stimulations are a great experimental advance to approach an answer to the question of how the underlying cellular components are deformed and how the underlying ion channels are forced. However, as the authors clearly state, neither of their stimulations can limit all forces to only one direction, and both orthogonal forces evoke responses in the neurons. The question of which of the two orthogonal forces 'causes' the response cannot be answered with these experiments and has not been answered by this manuscript. But the study has brought the field a considerable step closer to answering the question. The answer, however, might be that both longitudinal ('stretch') and perpendicular ('compression') forces act together to open the ion channels and that both dendritic extension via stretch and bending can provide forces for ion channel gating.

      Thank you very much for your acknowledgement of our experimental advances. We agree that this study cannot identify and localise the forces on the cilium as it is enclosed in the scolopidial unit. As previously explained, we plan to address this question in our next work by improving and expanding our experimental techniques, including modelling, to study the scolopidial mechanics based on our experiments using patch-clamp recording in combination with individual and direct manipulation the scolopidium.

      The current paper has identified major components (longitudinal stretch components) for the neurons they analysed, but these will surely have been chosen according to their accessibility, and as such, the variety of mechanical responses in Müller's organ might be greater. In light of these considerations, the authors might acknowledge such uncertainties more clearly in their paper.

      Our high-speed and OCT imaging confirms complex multi-dimensional displacements (and presumably forces) acting on the scolopidium. We agree that our mechanical stimulation cannot recapitulate such complex motions. But for future work we wish to extend our mechanical stimulation to three axis and also to pivot on the axis of the scolopidial cap.

      The paper is an impressive methodological progress and breakthrough, but it simply does not "demonstrate that axial stretch along the cilium is the adequate stimulus or the key mechanical input that activates mechano-electrical transduction" as the authors write at the start of their discussion.

      We rephrase to clarity that stretching along the “scolopidial axis”, not “along the ciliary axis” is the adequate stimulus. We cannot yet verify how this translates to forces acting on the cilium, hence the four speculative hypotheses. We will re-write the discussion to make clear that we are only interpretating the forces and displacements at the level of the cilium.

      They do show that axial stretch dominates for the neurons they looked at, which is important information. The same applies to the end of the discussion: The authors write, "This relative motion within the organ then drives an axial stretch of the scolopidium, which in turn evokes the mechano-electrical transduction current." Reading the manuscript, the certainty and display of confidence are not substantiated by the data provided. But they are also not necessary. The study has paved the road to answer these questions. Instead, the authors are encouraged to make suggestions on how the remaining uncertainties could be removed (and what experiments or model might be used).

      We will moderate our conclusion in the discussion, but we are confident that we have experimental repeats, and the statistical test, to support our conclusion that stretching of the scolopidium provides that largest transduction current responses (although not at the level of the cilium). As mentioned previously, we will include a section in the discussion for the best way to test the hypotheses arising from this work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study shows a novel role for SCoR2 in regulating metabolic pathways in the heart to prevent injury following ischemia/reperfusion. It combines a new multi-omics method to determine SCoR2 mediated metabolic pathways in the heart. This paper would be of interest to cardiovascular researchers working on cardioprotective strategies following ischemic injury in the heart. 

      Strengths:

      (1) Use of SCoR2KO mice subjected to I/R injury. 

      (2) Identification of multiple metabolic pathways in the heart by a novel multi-omics approach.

      We thank the Reviewer for the positive review of our manuscript.

      Weaknesses:

      (1) Use of a global SCoR2KO mice is a limitation since the effects in the heart can be a combination of global loss of SCoR2. 

      (2) Lack of a cell type specific effect. 

      We agree that global KOs limit the cell type-specific mechanistic conclusions that can be drawn. Global knockouts are nonetheless informative in their own right and serve to identify phenotypes worthy of further study.

      Reviewer #2 (Public review):

      Summary: 

      This manuscript addresses the gap in knowledge related to the cardiac function of the S-denitrosylase SNOCoA Reductase 2 (SCoR2; product of the Akr1a1 gene). Genetic variants in SCoR2 have been linked to cardiovascular disease, yet their exact role in the heart remains unclear. This paper demonstrates that mice deficient in SCoR2 show significant protection in a myocardial infarction (MI) model. SCoR2 influenced ketolytic energy production, antioxidant levels, and polyol balance through the S-nitrosylation of crucial metabolic regulators. 

      Strengths: 

      (1) Addresses a well-defined gap in knowledge related to the cardiac function of SNO-CoA Reductase 2. Besides the in-depth case for this specific player, the manuscript sheds more light on the links between Snitrosylation and metabolic reprogramming in the heart.

      (2) Rigorous proof of requirement through the combination of gene knockout and in vivo myocardial ischemia/reperfusion. 

      (3) Identification of precise Cys residue for SNO-modification of BDH1 as SCoR2 target in cardiac ketolysis 

      We thank the Reviewer for their kind words.

      Weaknesses: 

      (1) The experiments with BDH1 stability were performed in mutant 293 cells. Was there a difference in BDH1 stability in myocardial tissue or primary cardiomyocytes from SCoR2-null vs -WT mice? The same question extends to PKM2. 

      We have not assessed BDH1 stability directly in cardiomyocytes. However, S-nitrosylation increased BDH1 stability in HEK293 cells, and BDH1 expression was increased in (injured) hearts of SCoR2KO mice, together with increased SNO-BDH1. 

      For PKM2, there is a wealth of published evidence from us and others that S-nitrosylation does not regulate protein stability but rather inhibits tetramerization required for full activity.  

      (2) In the absence of tracing experiments, the cross-sectional changes in ketolysis, glycolysis, or polyol intermediates presented in Figures 4 and 5 are suggestive at best. This needs to be stressed while describing and interpreting these results. 

      We now acknowledge this limitation in the ‘Limitations’ section of the manuscript and in edits made to the text. 

      (3) The findings from human samples with ischemic and non-ischemic cardiomyopathy do not seem immediately or linearly in line with each other and with the model proposed from the KO mice. While the correlation holds up in the non-ischemic cardiomyopathy (increased SNO-BDH1, SNO-PKM2 with decreased SCoR2 expression), how do the authors explain the decreased SNO-BDH1 with preserved SCoR2 expression in ischemic cardiomyopathy? This seems counterintuitive as activation of ketolysis is a quite established myocardial response to ischemic stress. It may help the overall message clarity to focus the human data part on only NICM patients. 

      We find it interesting and important that SNO-BDH1 is readily detected in human heart tissue and its level is correlated to disease state. Our findings suggest conservation of this mechanism in human heart failure. However, we caution against drawing further conclusions related to NICM or ICM. Our animal model (based on a single time point) cannot faithfully recapitulate patients with chronic heart disease or differences between NICM and ICM. 

      (4) This is partially linked to the point above. An important proof that is lacking at present is the proof of sufficiency for SCoR2 in S-nitrosylation of targets and cardiac remodeling. Does SCoR2 overexpression in the heart or isolated cardiomyocytes reduce S-nitrosylation of BDH1 and other targets, undermining heart function at baseline or under stress? 

      The Reviewer proposes to test the effect of SCoR2 overexpression on cardioprotection. This is an interesting experiment for future study with the following caveats. First, it presupposes that native expression of SCoR2 is insufficient to control basal steady state S-nitrosylation of SNO-BDH1 and SNO-PKM2 (this does not seem to be the case). Second, overexpressed SCoR2 may be mislocalized within cells or associated with unnatural targets. Thank you.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript demonstrates that mice lacking the denitrosylase enzyme SCoR2/AKR1A1 demonstrate a robust cardioprotection resulting from reprogramming of multiple metabolic pathways, revealing widespread, coordinated metabolic regulation by SCoR2. 

      Strengths: 

      (1) The extensive experimental evidence. 

      (2) The use of the knockout model. 

      We thank the Reviewer for identifying strengths in our work.

      Weaknesses: 

      (1) The connection of direct evidence for the mechanism. 

      We believe we have identified a novel mechanism for cardioprotection entailing coordinate reprogramming of multiple metabolic pathways and suggesting a widescale role for SCoR2 in metabolic regulation. This is the key message we convey. While genetic dissection of individual pathways may be worthwhile, these investigations will have their own limitations. 

      (2) The mouse model used is not tissue-specific. 

      Please see our response to Reviewer 1, above. 

      Reviewer #1 (Recommendations for the authors):

      In the study, titled "The denitrosylase SCoR2 controls cardioprotective metabolic reprogramming", Grimmett ZW et al., describe a role for SNO-CoA Reductase 2 (SCoR2) in promoting cardioprotection via metabolic reprogramming in the heart after I/R injury. Authors show that loss SCoR2 coordinates multiple metabolic pathways to limit infarct size. Overall, the hypothesis is interesting, however there are some limitations as described below: 

      (1) It is unclear whether SCoR2 mice are global or cardiomyocyte specific. 

      We apologize for any confusion. These are global SCoR2<sup>-/-</sup> mice. This is now stated in the Results when first identifying the strain, as well as in the Methods.  

      (2) Can the authors clarify how divergent metabolic pathways such as Ketone oxidation, glycolysis, PPP and polyol metabolism work downstream of SCoR2 to impact cardioprotection in mice with I/R. 

      The metabolic pathways of ketone oxidation, glycolysis, PPP and polyols appear to converge to support ischemic cardioprotection in SCoR2<sup>-/-</sup> mice, as depicted in the model shown in Fig. 5L. Subsequent to SNO-PKM2 blockade of flux through glycolysis (detailed in this manuscript and in Zhou et al, 2019, PMID: 30487609, as well as by others), substrates of ketolysis and glycolysis are funneled into the PPP, producing the antioxidant NADPH and energy precursor phosphocreatine, which are well-known to be cardioprotective. This occurs more readily in SCoR2<sup>-/-</sup> mice due to elevated SNO-BDH1 (detailed in this manuscript). 

      Polyols, thought to be products of the PPP carbohydrate intermediates arabinose, ribulose, xylulose (among others), have recently been shown to be harmful to cardiovascular health in humans. These polyols are uniformly downregulated in SCoR2<sup>-/-</sup> mice. We suggest this is likely the result of S-nitrosylation of SCoR2-substrate enzymes that form polyols (SCoR2/Akr1a1 is unable to directly reduce carbohydrates to their corresponding polyols). Regulation of endogenous polyol production in humans is a new concept and the mechanisms whereby these compounds increase risk of cardiac events are a subject of active investigation. This is detailed in the final paragraph of both the Results and Discussion sections, and in Fig. 5L. 

      (3) The only functional outcome of SCoR2 loss in echocardiography and measurements for apoptosis. However, it would be important to determine whether the cardioprotective effect persists. It seems cardiac function was recorded 24hours post injury and whether the benefit remains till later time point such as 2 or 4 weeks is not shown. Without this time point, loss of SCoR2 only leads to an acute increment in function. 

      Loss of SCoR2 reduced post-MI mortality at 4 hr; cardiac functional changes (plus troponin, LDH, and apoptosis) were studied in surviving animals at 24 hr post-MI. Cardiac response to acute injury and to chronic injury (weeks post-MI) are not the same metabolically. This is well elucidated in the literature and exemplified by the role of PKM2, which is protective in the chronic response to MI (28 days post-MI; PMID: 32078387), but implicated in injury at shorter timepoints post-MI (PMID: 33288902, 28964797). All that said, functional changes at 2-4 weeks will be important to determine in the future, as the Reviewer indicates. 

      Reviewer #2 (Recommendations for the authors): 

      (1) The last paragraph of the Results section should be divided into the statement related to Table S2 in the Results section, and the rest of the paragraph should be put somewhere in the Discussion. 

      Thank you for this suggestion, which we have taken. 

      (2) The number of mice alive/dead should be reported in the histogram in Figure 1G. 

      Done.

      (3) A concise Graphical Abstract will be useful to grasp the overall logic and message of the manuscript from the beginning. 

      We thank you for this suggestion and have added a graphical abstract to the manuscript.

      Reviewer #3 (Recommendations for the authors): 

      I would suggest having more evidence on the effect of metabolic reprogramming on which cell type. The use of a global knockout is a major limitation, and probably some in vitro experiments with shRNA knockdown in endothelial cells and fibroblasts would provide more insights. 

      The reviewer suggests one direction for future study. We identify a novel mechanism for cardioprotection entailing coordinate reprogramming of multiple metabolic pathways and suggesting a widescale role for SCoR2 in metabolic regulation. This is the message we wish to convey. The role of cardiomyocytes vs contributing cell types is a thoughtful direction for future study. Thank you. 

      Editor's additional comment:

      The editors wish to highlight a critical issue concerning the characterization of the SCoR2−/− mice employed in this study. 

      In the Methods section (page 20), the manuscript states that "SCoR2+/− mice were made by Deltagen, Inc. as described previously (33)." However, reference 33 does not describe SCoR2−/− mice; instead, it refers to other genetically modified strains, including Akr1a1+/−, eNOS−/−, and PKM2−/− mice, with no mention of a SCoR2-targeted model. 

      The editors fully acknowledge that the authors may be using the term "SCoR2" as a functional synonym for Akr1a1, based on its described role as a mammalian homologue of yeast SCoR. If this is the case, such equivalence should be explicitly stated in the manuscript to prevent potential confusion. Moreover, considering that the genetic deletion of Akr1a1 (i.e., SCoR2) underlies the key mechanistic findings presented, it is essential that the manuscript include a clear and comprehensive description of the generation and validation of the mouse model used. 

      We therefore ask the authors to (1) clarify the nomenclature and relationship between "SCoR2" and Akr1a1, and (2) provide full details on the generation of the knockout mice, including the targeting strategy and the genotyping procedures. This information is necessary not only to ensure transparency and reproducibility but also to allow readers to fully appreciate the biological relevance of the findings.

      Thank you for identifying this inconsistency. We have adjusted the manuscript text accordingly to clearly state that SCoR2 is a functional name for the product of the Akr1a1 gene and that these SCoR2<sup>-/-</sup> mice are the same as Akr1a1<sup>-/-</sup> mice described in Ref 33. We have augmented the Methods text to describe the generation and genotyping of these SCoR2/Akr1a1 knockout mice.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Reviews):

      Summary: 

      Here, Millet et al. consider whether the nematode C. elegans 'discounts' the value of reward due to effort in a manner similar to that shown in other species, including rodents and humans. They designed a T-maze effort choice paradigm inspired by previous literature, but manipulated how effortful the food is to consume.C. elegans worms were sensitive to this novel manipulation, exhibiting effort-discountinglike behaviour that could be shaped by varying the density of food at each alternative in order to calculate an indifference point. This discounting-like behaviour was related to worms' rates of patch leaving, which differed between the low and high effort patches in isolation. The authors also found a potential relationship to dopamine signalling, and also that this discounting behaviour was not specific to lab-based strains of C. elegans

      Strengths: 

      The question is well-motivated, and the approach taken here is novel. The authors are careful in their approach to altering and testing the properties of the effortful, elongated bacteria. Similarly, they go to some effort to understand what exactly is driving behavioural choices in this context, both through the application of simple standard models of effort discounting and a kinetic analysis of patch leaving. The comparisons to various dopamine mutants further extend the translational potential of their findings. I also appreciate the comparison to natural isolate strains, as the question of whether this behaviour may be driven by some sort of strain-specific adaptation to the environment is not regularly addressed in mammalian counterparts. The manuscript is well-written, and the figures are clear and comprehensible. 

      Weaknesses: 

      Discounting is typically defined as the alteration of a subjective value by effort (or time, risk, etc.), which is then used to guide future decision-making. By adapting the standard t-maze task for C. elegans as a patch-leaving paradigm, the authors observe behaviour strongly consistent with discounting models, but that is likely driven by a different process, in particular by an online estimate of the type of food in the current patch, which then influences patch-leaving dynamics (Figure 3). This is fundamentally different from decision-making strategies relating to effort that have been described in the rodent and human literatures. 

      We agree that in our study worms are likely making an on-line estimate of food quality in the current patch, but we wish to point out that rodents and humans also use on-line estimates in some significant effort-discounting paradigms. With respect to rodents, we call attention to effort discounting studies involving the widely used progressive ratio task (references in Discussion). In this task, animals can either lever-press for a preferred food or consume a less preferred food that is freely available nearby. However, the number of lever presses required to obtain preferred food increases as a function of the cumulative number of lever presses until the effort-cost of obtaining preferred food becomes too high and the animal switches to a freely available food. In essence, the lever and the freely available food are patches and the animal decides whether or not to leave the “lever” patch. It seems inescapable that the progressive ratio task involves an on-line assessment of the cost/benefit relationship associated with lever pressing. With respect to humans, one highly cited study (reference in Discussion) presented participants with a series of virtual apple trees. They could see how many apples are in the current tree and how much effort (squeezing a handgrip) is required to gather them. Their task was to decide whether or not to gather apples from that tree based on the perceived cost and benefit. Thus, on-line estimation is a common strategy used by animals and humans as shown in the effort discounting literature. We now make this point in the Discussion section titled A model of effort-discounting like behavior.

      Similarly, the calculation of indifference points at the group instead of at the individual level also suggests a different underlying process and limits the translational potential of their findings. The authors do not discuss the implications of these differences or why they chose not to attempt a more analogous trial-based experiment.  

      It is not clear to us why changing the read-out –– from the individual level to the population level –– necessarily suggests that a different biological mechanism is at work. In our view, there is one mechanism and it can be seen from different perspectives (e.g., individual vs population). Furthermore, the analogous trial-based experiment, as we understand it, would be to record behavior one worm at a time in the T-maze. This design is not practical because it entails recording a large number of single worms in the T-maze for 60 min each. 

      In the case of both the dopamine and natural isolate experiments, the data are very noisy despite large (relative to other C. elegans experiments) sample sizes. In the dopamine experiment, disruption of dop1, dop-2, and cat-2 had no statistically significant effect. There do not appear to be any corrections for multiple comparisons, and the single significant comparison, for dop-3, had a small effect size. 

      An ANOVA followed by a Dunnett test was used to test differences between groups in Fig. 4 and 5. The Dunnett test is a multiple comparison test comparing experimental groups to a single control group. It is used to minimize type I error while maintaining statistical power and does not require further correction for multiple comparisons. We have clarified the use of the Dunnett test in the statistical table.  The effect size for dop-3 is 0.5 (Cohen’s d), which is typically interpreted as a medium, not small, effect size.(e.g. Cohen, Psychological Bulletin, 1992, Vol. 112. No. 1,155-159). 

      More detailed behavioural analyses on both these and the wild isolate strains, for example by applying their kinetic analysis, would likely give greater insight as to what is driving these inconsistent effects. 

      More detailed behavioral analysis could reveal why we observe a difference in effort discounting in some strains and not others. However, it is not obvious what type of behavioral analysis would be needed to differentiate between pleiotropic effects of the mutations/natural isolates and more specific effects on effort discounting. A simple kinetic analysis in particular may not be enough to reveal relevant differences between mutants/natural isolates. For this reason, we think that such experiments may be better suited for future follow up studies.

      Reviewer #2 (Public Reviews)

      Summary: 

      Millet et al. show that C. elegans systematically prefers easy-to-eat bacteria but will switch its choice when harder-to-eat bacteria are offered at higher densities, producing indifference points that fit standard economic discounting models. Detailed kinetic analysis reveals that this bias arises from unchanged patch-entry rates but significantly elevated exit rates on effortful food, and dop-3 mutants lose the preference altogether, implicating dopamine in effort sensitivity. These findings extend effortdiscounting behavior to a simple nematode, pushing the phylogenetic boundary of economic costbenefit decision-making. 

      Strengths: 

      (1) Extends the well-characterized concept of effort discounting into C. elegans , setting a new phylogenetic boundary and opening invertebrate genetics to economic-behavior studies. 

      (2) Elegant use of cephalexin-elongated bacteria to manipulate "effort" without altering nutritional or olfactory cues, yielding clear preference reversals and reproducible indifference points. 

      (3) Application of standard discounting models to predict novel indifference points is both rigorous and quantitatively satisfying, reinforcing the interpretation of worm behavior in economic terms. 

      (4) The three-state patch-model cleanly separates entry and exit dynamics, showing that increased leaving rates-rather than altered re-entry-drive choice biases. 

      (5) Investigates the role of dopamine in this behavior to try to establish shared mechanisms with vertebrates. 

      (6) Demonstration of discounting in wild strain (solid evidence). 

      Weaknesses: 

      (1) The kinetic model omits rich trajectory details-such as turning angles or hazard functions-that could distinguish a bona fide roaming transition from other exit behaviors. 

      The overarching goal of present paper was to develop a simple model for effort discounting in a small, genetically tractable organism.  Accordingly,  we focused on quantitative assays that are easy to implement and analyze. The patch-leaving assay and its associated kinetic analysis are one such assay. To keep things simple in this assay, we counted the number of  transitions between the three states shown in Fig. 3A. We chose not to analyze the data in terms of turning angles or hazard functions because the metrics we developed seemed sufficient. Finally, we note that there are new modeling data showing that the presumptive transitions into the roaming state can be explained in terms of a one-state stochastic model in which there is no discrete roaming state (Elife. 2025 Jul 30;14:RP104972. doi:

      10.7554/eLife.104972.PMID: 40736321).

      (2) Only dop-3 shows an effect, and the statistical validity of this result is questionable. It is not clear if the authors corrected for multiple comparisons, and the effect size is quite small and noisy, given the large number of worms tested. Other mutants do not show effects. Given these two concerns, the role of dopamine in C. elegans effort discounting was unconvincing. 

      An ANOVA followed by a Dunnett test was used to test statistical significance in figures 4 and 5 (see above for a discussion of these tests). We believe this approach is rigorous, and the use of these tests is statistically valid. We note that the effect size for this comparison was medium.

      (3) With only five wild isolates tested (and variable data quality), it's hard to conclude that effort discounting isn't a lab-strain artifact or how broadly it varies in natural populations. 

      The fact that four of the five natural isolates tested display levels of effort discounting similar to N2 (only one natural isolate does not display effort discounting) argues against effort discounting being a laboratory adaption.  We have nevertheless weakened the claim regarding natural isolates. We now say effort discounting-like behavior may not be an adaptation to the laboratory environment.  

      (4) Detailed analysis of behavior beyond preference indices would strengthen the dopamine link and the claim of effort discounting in wild strains. 

      Going beyond preference in the behavioral analysis might or might not reveal new phenotypes that strengthen the link with dopamine. At present, however, we think such experiments are beyond the scope of the paper.

      (5) A few mechanistic statements (e.g., tying satiety exclusively to nutrient signals) would benefit from explicit citations or brief clarifications for non-worm specialists. 

      We are unable to identify a mechanistic statement tying satiety to nutrient signals in our manuscript.

      Reviewer #3 (Public Reviews)

      Summary: 

      The authors establish a behavioral task to explore effort discounting in C. eleganss . By using bacterial food that takes longer to consume, the authors show that, for equivalent effort, as measured by pumping rate, they obtain less food, as measured by fat deposition. The authors formalize the task by applying a formal neuroeconomic decision-making model that includes value, effort, and discounting. They use this to estimate the discounting that C. elegans applies based on ingestion effort by using a population-level 2-choice T-maze. They then analyze the behavioral dynamics of individual animals transitioning between on-food and off-food states. Harder to ingest bacteria led to increased food patch leaving. Finally, they examined a set of mutants defective in different aspects of dopamine signaling, as dopamine plays a key role in discounting in vertebrates and regulates certain aspects of C. elegans foraging. 

      Strengths: 

      The behavioral experiments and neuroeconomic analysis framework are compelling, interesting, and make a significant contribution to the field. While these foraging behaviors have been extensively studied, few include clearly articulated theoretical models to be tested. 

      Demonstrating that C. elegans effort discounting fits model predictions and has stable indifference points is important for establishing these tasks as a model for decision making. 

      Weaknesses: 

      The dopamine experiments are harder to interpret. The authors point out the perplexing lack of an effect of dat-1 and cat-2. dop-3 leads to general indifference. I am not sure this is the expected result if the argument is a parallel functional role to discounting in vertebrates. dop-3 causes a range of locomotor phenotypes and may affect feeding (reduced fat storage), and thus, there may be a general defect in the ability to perform the task rather than anything specific to discounting.

      That said, some of the other DA mutants also have locomotor defects and do not differ from N2. But there is no clear result here - my concern is that global mutants in such a critical pathway exhibit such pleiotropy that it's difficult to conclude there is a clear and specific role for DA in effort discounting. This would require more targeted or cell-specific approaches. 

      We agree with the reviewer that the results of the dopamine experiments are puzzling and getting a better understanding of the role of dopamine in effort-discounting will require more sensitive assays and different experimental approaches (e.g. cell-specific rescues). However, as mentioned by the reviewer, all the mutations tested have some pleiotropic effects, yet only dop-3 displays a defect in effort discounting. This, in our opinion, points to a specific role of dop-3 in effort-discounting in C. elegans. This point is now made in the Discussion in the section titled Role of dopamine signaling in effort discountinglike behavior.

      Meanwhile, there are other pathways known to affect responses to food and patch leaving decisions: serotonin, pigment-dispersing factor, tyramine, etc. The paper would have benefited from a clarification about why these were not considered as promising candidates to test (in addition to or instead of dopamine). 

      We focused on DA because of its well-established effect on effort discounting in rodents.

      Testing other pathways is a goal for future research.

      Reviewer #1 (Recommendations for the authors):

      The current results are more a reframing of data gathered from a patch-leaving paradigm, but described in the form of economic choice modelling in which discounting is one possible explanation. One more parsimonious explanation that worms estimate in real-time some rate of reward and leave the patch at some threshold, consistent with canonical foraging models, previous experiments in C. elegans, and the authors' own data (Figure 3). Therefore, I am wary about some of the claims made in this manuscript, such as 'decision-making strategies based on effort-cost trade-offs are evolutionarily conserved'. 

      These points are now addressed in the Discussion in a revised section titled A model of effortdiscounting like behavior. (i) We now call attention to the fact that our T-maze assay is a patch-leaving foraging paradigm. (ii) We now propose a revised model in which “worms make an on-line assessment of food value in the current patch which in turn alters patch-leaving dynamics, increasing the exit rates from cephalexin-treated patches as shown in Figure 3.” (iii) We now provide evidence from the rodent and human literature that the strategy of on-line assessment of reward value may be evolutionarily conserved in the case of a class of effort discounting tasks whose solution requires on-line assessments. 

      If the reason the authors chose to do a patch-leaving style task rather than a traditional t-maze is because C. elegans is unable to retain the sort of information necessary to make such simultaneous decisions - e.g., if pre-training on the two options isn't possible - then this in itself suggests that mechanisms underlying these decisions in worms and mammals are unlikely to be the same. I mention this because I would like to suggest to the authors an alternative interpretation: that patch foraging is actually 'the' canonical computation that translates across species. This would, in fact, be nicely consistent with some other recent modelling work in humans, e.g., https://www.biorxiv.org/content/10.1101/2025.05.06.652482v1

      Please see the previous response.

      Reviewer #2 (Recommendations for the authors):

      Can you provide a picture of the regular and CEPH bacteria? 

      Done (see Figure 1––figure supplement 1).

      Reviewer #3 (Recommendations for the authors):

      I would recommend testing representative mutants in other pathways in the choice task. If possible, more targeted experiments with dop-3, including either cell-specific KOs or rescues, would very much strengthen this aspect of the paper. 

      While valuable, these experiments are out of scope for the present study.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that  after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism  after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only  after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways.

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neuro-transcriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model.

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria.

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.

      4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in the graphs inserted below.

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following:

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligand promiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns.

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impacwul. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.

      b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.

      c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformating here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides).

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-host seeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios.

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability.

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B</sup>” is now “D1<sup>PBM</sup>” (post-bloodmeal) and “D1<sup>O</sup>” is now “D1<sup>PO</sup>” (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 2023). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline  after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa_mediated control of blood feeding in _An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B</sup>” is now “D1<sup>PBM</sup>” (post-bloodmeal) and “D1<sup>O</sup>” is now “D1<sup>PO</sup>” (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected.

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. Author response:

      Thank you for your time and for considering our manuscript as a Reviewed Preprint. We also would like to thank Reviewer 1 for their evaluation of our manuscript.

      Here, we present a provisional response to reviewer comments and following their suggestions we will make an effort to: i) increase evidence for the role of dopamine in olfactory glomeruli and ii) delineate the circuit involved mediating the observed potentiation. Next, we briefly describe the set of experiments that are in progress or will be performed to improve our paper.

      We will carry out immunostainings for tyrosine hydroxylase to certify that dopamine can be released on the genetically labelled glomerulus. There is a lack of good commercial antibodies for Xenopus (we already tried one and did not work, PA1-4679, Thermofisher scientific), but we will look for alternatives. In a previous set of experiments, we attempted to measure dopamine release in the glomerular layer by electroporating olfactory sensory neurons or olfactory bulb neurons with the dopamine sensors dLight1.1 (Addgene #111053) or dLight1.3 (Addgene # 111056). In our hands, fluorescence signals were extremely weak, barely undetectable. Similar results were obtained after electroporating the tectum or the rhombencephalon. We propose to repeat experiments using a more sensitive sensor such as GRAB_DA2m. Other approaches, such as performing single cell transcriptomics of olfactory sensory neurons might be considered to confirm the expression of D2 receptors.

      We agree with the reviewer that we should obtain more lines of evidence in support for a presynaptic inhibition mediated by D2 receptors.To gain insight on the bilateral circuit mediating the observed potentiation of glomerular responses we are currently investigating the role of dorsolateral pallium neurons. In Xenopus tadpoles the lateral pallium plays an analogous role to the olfactory cortex in amniotes. Preliminary observations show that neurons located in this pallial region respond to ipsilateral stimulation of the olfactory epithelium and if damaged, a contralateral potentiation of glomerular output occurs. We aim to conclude this set of experiments and include it in the paper as we believe it clarifies the circuitry involved.

    1. Author response:

      We thank both reviewers for their thoughtful and constructive comments. To address this feedback, we plan to do the following:

      Questions/Hypotheses: We will clarify the study’s motivation, central questions, and our hypotheses, with a particular focus on the integration across learning and memory.

      Methods: To improve clarity and transparency, we will expand the Methods section and modify relevant figures to provide more explanation of the task, our decisions regarding data analysis approaches, and how they address our questions and hypotheses.

      Learning Behavioral Analysis: As suggested by reviewers, we will fit and compare mixed-effects models with the maximal random effects structure for the within-subject variables and their interactions. We may simplify this structure as the data justify (i.e., if we encounter convergence problems or the random effects explain minimal variance). In the revision, we will also directly compare the adolescent peaks in performance across the conditions to support our conclusion that adolescents outperform people of other ages in the Pavlovian-congruent conditions.

      Computational Modeling: We appreciate the reviewers’ close attention to the computational modeling methods, as it identified a small error in the reporting of the formulas we implemented. Specifically, the preprint’s softmax function had an error and should be printed as:

      This correct parameterization can be seen in the Huys, 2018 public repository on line 48 here. As such, rather than indicating random choices, the lapse rates with estimated solutions close to one represent expected goal-directed behavior. That said, we acknowledge that parameter recovery indicated potential identifiability issues for some parameters, especially those with extreme values. We appreciate the reviewer’s suggestion to examine “learners” separately from “non-learners,” as has been done in prior work with adults (Cavanagh et al., 2013; Guitart-Masip et al., 2012). In this revision, we will investigate whether behavioral differences in learners vs. non-learners, among other potential explanations, accounts for the relatively poor parameter recovery. We will also explain more about why we selected these RL models, including how the Pavlovian policy works and why it adequately captures participants’ behavior.

      Memory Behavioral Analysis: At the reviewers’ suggestion, we will expand our analysis of the learning-memory trade-off to fully explore this possible explanation. We will also explore the additional analyses that the reviewers suggested (e.g., ROC curves accounting for confidence ratings, analysis of correct vs. incorrect responses).

      We are confident that these revisions will strengthen the work, and we are grateful to the reviewers for their thorough, insightful feedback. In the coming revision, we will provide a detailed point-by-point response to all comments and questions.

      References

      Cavanagh, J. F., Eisenberg, I., Guitart-Masip, M., Huys, Q., & Frank, M. J. (2013). Frontal Theta Overrides Pavlovian Learning Biases. The Journal of Neuroscience, 33(19), 8541–8548. https://doi.org/10.1523/JNEUROSCI.5754-12.2013

      Guitart-Masip, M., Huys, Q. J. M., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J. (2012). Go and no-go learning in reward and punishment: Interactions between affect and effect. NeuroImage, 62(1), 154–166. https://doi.org/10.1016/j.neuroimage.2012.04.024

      Huys, Q. J. M. (2018). Bayesian Approaches to Learning and Decision-Making. In Computational Psychiatry (pp. 247–271). Elsevier. https://doi.org/10.1016/B978-0-12-809825-7.00010-9

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Comment; The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      We sincerely thank the reviewer for the detailed and thoughtful comments. We fully recognize the importance of carefully distinguishing genetic and environmental contributions in transcriptomic studies, particularly when addressing evolutionary questions. The reviewer identified two major concerns regarding our study design: (1) the use of different monitoring periods across species, and (2) the use of samples collected from different study sites. We addressed both concerns with additional analyses using 112 new samples and now present new evidence that supports the robustness of our conclusions.

      (1) Monitoring period variation does not bias our conclusions<br /> To address concerns about the differing monitoring periods, we added new RNA-seq data (42 samples each for bud and leaf samples for L. glaber and 14 samples each for bud and leaf samples for _L. eduli_s) collected from November 2021 to November 2022, enabling direct comparison across species within a consistent timeframe. Hierarchical clustering of this expanded dataset (Fig. S6) yielded results consistent with our original findings: winter-collected samples cluster together regardless of species identity. This strongly supports our conclusion that the seasonal synchrony observed in winter is not an artifact of the monitoring period and demonstrates the robustness of our conclusions across datasets.

      (2) Site variation is limited and does not confound our findings<br /> Although the study included three sites, two of them (Imajuku and Ito Campus) are only 7.3 km apart, share nearly identical temperature profiles (see Fig. S2), and are located at the edge of similar evergreen broadleaf forests. Only Q. acuta was sampled from a higher-altitude, cooler site. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      (3) Justification for our approach in natural systems<br /> We agree with the reviewer that experimental approaches such as common gardens, reciprocal transplants, and the use of co-occurring species are valuable for disentangling genetic and environmental effects. In fact, we have previously implemented such designs in studies using the perennial herb Arabidopsis halleri (Komoto et al., 2022, https://doi.org/10.1111/pce.14716) and clonal Someiyoshino cherry trees (Miyawaki-Kuwakado et al., 2024, https://doi.org/10.1002/ppp3.10548) to examine environmental effects on gene expression. However, extending these approaches to long-lived tree species in diverse natural ecosystems poses significant logistical and biological challenges. In this study, we addressed this limitation by including three co-occurring species at the same site, which allowed us to evaluate interspecific differences under comparable environmental conditions. Importantly, even when we limited our analyses to these co-occurring species, the results remained consistent, indicating that the observed variation in transcriptomic profiles cannot be attributed to environmental factors alone and likely reflects underlying genetic influences.

      Accordingly, we added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the manuscript to clarify the limitations and strengths of our design, to tone down the evolutionary claims where appropriate, and to more explicitly define the scope of our conclusions in light of the data. We hope that these efforts sufficiently address the reviewer’s concerns and strengthen the manuscript.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      We thank the reviewer 1 for the helpful comment. To evaluate the variation among biological replicates, we compared the expression level of each gene across different individuals. We observed high correlation between each pair of individuals (Q. glauca (n=3): an average correlation coefficient r = 0.947; Q. acuta (n=3): r = 0.948; L. glaber (n=3): r = 0.948)). This result suggests that the seasonal gene expression pattern is highly synchronized across individuals within the same species. We mentioned this point in the Result section in the revised manuscript. We also calculated the mean mapping rates for each species. As the reviewer expected, the mapping rate was slightly lower in Q. acuta (88.6 ± 2.3%) and L. glaber (84.3 ± 5.4%), whose RNA-Seq data were mapped to reference genomes of related but different species, compared to that in Q. glauca (92.6 ± 2.2%) and L. edulis (89.3 ± 2.7%). However, we minimized the impact of these differences on downstream analysis. These details have been included in the revised main text.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      We thank the reviewer for the helpful comment. As noted, we acknowledge that hierarchical clustering (Fig. 2A) is primarily a visualization and hypothesis-generating method. To assess the biological relevance of the clusters identified, we conducted a Mann-Whitney U test or the Steel-Dwass test to evaluate whether the environmental temperatures at the time of sample collection differed significantly among the clusters. This analysis (Fig. 2B) revealed statistically significant differences in temperature in the cluster B3 (p < 0.01), indicating that the gene expression clusters are associated with seasonal thermal variation. These results support the interpretation that the clusters reflect coordinated transcriptional responses to environmental temperature. We revised the Results section to clarify this point.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      We agree that genome assemblies of Fagaceae species are becoming increasing available. However, our study does not aim to emphasize the novelty of the genome assemblies per se. Rather, with the increasing availability of chromosome-level genomes, we regard genome assembly as a necessary foundation for more advanced analyses. The main objective of our study is to investigate how each gene is expressed in response to seasonal environmental changes, and to link genome information with seasonal transcriptomic dynamics. To address the reviewer’s comment in line with this objective, we added a discussion on the syntenic structure of eight genome assemblies spanning four genera within the Fagaceae, including a species from the genus Fagus (Ikezaki et al. 2025, https://doi.org/10.1101/2025.07.31.667835). This addition helps to position our work more clearly within the context of existing genomic resources.

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      A previous study on genome size variation in Fagaceae suggested that, given the consistent ploidy level across the family, genome expansion likely occurred through relatively small segmental duplications rather than whole-genome duplications. Because Figure 1B-D supports this view, we cited the following reference in the revised version of the manuscript. Chen et al. (2014) https://doi.org/10.1007/s11295-014-0736-y

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

      We relocated the explanation regarding the expression levels of single-copy and multi-copy genes to the section titled “Seasonal gene expression dynamics.” Additionally, we clarified in the Materials and Methods section that short-read sequencing data were used for both genome size estimation and phylogenetic reconstruction.

      Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      We are grateful for the reviewer for the detailed and thoughtful review of our manuscript.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      Leaf samples clustered into the bud are newly flushed leaves collected in April for Q. glauca, May for Q. acuta, May and June for L. edulis, and August and September for L. glaber. To clarify this point, we highlighted these newly flushed leaf samples as asterisk in the revised figure (Fig. 2A).

      (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      We thank the reviewer for this insightful comment. As noted in the Discussion section, we hypothesize that such genome-wide seasonal expression patterns—and their divergence across species—are likely mediated by cis-regulatory elements and chromatin-level mechanisms. While a direct investigation of regulatory architecture was beyond the scope of the present study, we fully agree that incorporating regulatory genomic and epigenomic data would significantly deepen the mechanistic understanding of expression divergence. In this regard, we are currently working to identify putative cis-regulatory elements in non-coding regions and are collecting epigenetic data from the same tree species using ChIP-seq. We believe the current study provide a foundation for these future investigations into the regulatory basis of seasonal transcriptome variation. We made a minor revision to the Discussion to note that an important future direction is to investigate the evolution of non-coding sequences that regulate gene expression in response to seasonal environmental changes.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      We would like to note that phenological traits have been observed in the field on a monthly basis throughout the sampling period and the phenological data were plotted together with molecular phenology (e.g. Fig. 2A, C; Fig. 3C, D). Although the temporal resolution is limited, these observations captured species-specific differences in key phenological events such as leaf flushing and flowering times. We revised the manuscript to clarify this point.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      We fully agree with the reviewer that local environmental conditions, including microclimate and photoperiod differences, could potentially influence gene expression patterns. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were qualitatively similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      We believe these additional analyses help to decouple the effects of environment and genetics, and support our conclusion that both seasonal synchrony and phylogenetic constraints play key roles in shaping transcriptome dynamics. We added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the text accordingly to clarify this point and to acknowledge the potential impact of site-specific environmental variation.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      We thank the reviewer for the insightful comment. We would like to clarify that the analysis presented in Figures 5E and 5F was based on linear regression, not Pearson’s correlation. Although Δ_φ_ is a discrete variable, it takes values from 0 to 6 in 0.5 increments, resulting in 13 levels. We treated it as a quasi-continuous variable for the purposes of linear regression analysis. This approach is commonly adopted in practice when a discrete variable has sufficient resolution and ordering to approximate continuity. To enhance clarity, we revised the manuscript to explicitly state that linear regression was used, and we now reported the regression coefficient and associated p-value to support the interpretation of the observed trend.

      b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

      We agree with the reviewer’s comment. While we retained the original panels, we reframed our interpretation to emphasize that, despite statistical significance, the observed correlation is very weak—suggesting that coding region variation is unlikely to be the primary driver of seasonal gene expression patterns. Accordingly, we revised the “Relating seasonal gene expression divergence to sequence divergence” section in the Results, as well as the relevant part of the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Sentences around lines 250-251 are incomplete and need revision.

      We thank the reviewer for pointing this out. We revised the sentences in the subsection “Peak month distribution of rhythmic genes and intra-genus and inter-genera comparison” in the Results section to ensure clarity and completeness. In addition, to improve the interpretability of the peak month distribution, we added arrows to indicate the major peaks in the circular histograms shown in Fig. 3C and 3D.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1E-G, the term Copy number or Copy number variation could be misleading, as it is commonly associated with inter-individual gene copy number variation in a population. Since the analysis here refers to orthology relationships rather than population-level variation, a more precise term, such as orthogroup classification, may be preferable.

      We thank the reviewer for this helpful suggestion. We agree that the term “copy number” could be misleading in this context. Accordingly, we updated the labeling in Fig. 1 to reflect the more precise term “orthogroup classification.”

      (2) In Figure 3A, the x-axis label Period (month) may be misleading, as it could be mistaken for calendar months rather than referring to the periodicity of gene expression cycles. A more explicit label, such as Expression periodicity (months), might improve clarity for the reader.

      We thank the reviewer for this valuable suggestion. In the original version of Fig. 3A, we used the label “Period (month),” which could indeed be misinterpreted as referring to calendar months. To clarify that this axis represents the length of gene expression cycles, we revised the label to “Period length (months).” This change also aligns with the terminology used throughout the manuscript, where “Period” refers specifically to cycle length, and “Periodicity” denotes the presence or absence of rhythmic expression.

      Other minor revisions

      We also made minor revisions for the reference list and the grant number details, and included the accession numbers for all DNA and RNA sequence data deposited in the DNA Data Bank of Japan (DDBJ) in the Data deposition and code availability section, in addition to the BioProject ID.

    1. Author Response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      The scale bar for fly and ovary images should be included in Figures 9, 10, and 12.

      We agree with this comment and apologize for the oversight. We have now modified Figures 9, 10, and 12 to include the scale bars for the ovary images. The fly images were acquired using a stereo microscope where scale bar calculation was not possible. However, all images were acquired at the same magnification for consistency.

      Reviewer #2 (Public review):

      A weakness of this paper is the phylogenetic analysis to investigate if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. However, I recognise that such a detailed analysis may extend beyond the scope of this paper, which is already rich in data.

      We thank the reviewer for their comment and the suggestion to perform a fine-grained species level comparison of ITP and Gyc76C genes across protostomes. We are unsure of the utility of this analysis for the present study given that we have now shown that ITPa can activate Gyc76C using both an ex vivo and a heterologous assay, the latter being the gold standard in GPCR and guanylate cyclase discovery (see Huang et al 2025 https://doi.org/10.1073/pnas.2420966122; Beets et al 2023 https://doi.org/10.1016/j.celrep.2023.113058); Chang et al 2009 https://doi.org/10.1073/pnas.0812593106.

      Additionally, absence of a gene in a genome/proteome is hard to prove especially when many/most of the protostomian datasets are not as high-quality as those of model systems (e.g. Drosophila melanogaster and Caenorhabditis elegans). Secondly, based on previous findings in Bombyx mori (Nagai et al. 2014 https://doi.org/10.1074/jbc.m114.590646 and Nagai et al. 2016 https://doi.org/10.1371/journal.pone.0156501) and Drosophila (Xu et al. 2023 https://doi.org/10.1038/s41586-023-06833-8 and our study) it is evident that different products of the ITP gene (ITPa and ITPL) could signal via different receptor types depending on the species. Hence, we would need to explore the presence of several genes (ITP, tachykinin, pyrokinin, tachykinin receptor, pyrokinin receptor, CG30340 orphan receptor and Gyc76C) to fully understand which components of these diverse signaling systems are present in a given species to decipher the potential for cross-talk.

      While this species-level comparison will certainly be useful in the context of ITP-Gyc76C evolution, it will not alter the conclusions of the present study – ITPa acts via Gyc76C in Drosophila. We therefore agree with the reviewer that these analyses are beyond the scope of this paper.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary:  

      In Drosophila melanogaster, ITP has functions on feeding, drinking, metabolism, excretion, and circadian rhythm. In the current study, the authors characterized and compared the expression of all three ITP isoforms (ITPa and ITPL1&2) in the CNS and peripheral tissues of Drosophila. An important finding is that they functionally characterized and identified Gyc76C as an ITPa receptor in Drosophila using both in vitro and in vivo approaches. In vitro, the authors nicely confirmed that the inhibitory function of recombinant Drosophila ITPa on MT secretion is Gyc76C-dependent (knockdown Gyc76C specifically in two types of cells abolished the anti-diuretic action of Drosophila ITPa on renal tubules). They also used a combination of multiple approaches to investigate the roles of ITPa and Gyc76C on osmotic and metabolic homeostasis modulation in vivo. They revealed that ITPa signaling to renal tubules and fat body modulates osmotic and metabolic homeostasis via Gyc76C.  

      Furthermore, they tried to identify the upstream and downstream of ITP neurons in the nervous system by using connectomics and single-cell transcriptomic analysis. I found this interesting manuscript to be well-written and described. The findings in this study are valuable to help understand how ITP signals work on systemic homeostasis regulation. Both anatomical and single-cell transcriptome analysis here should be useful to many in the field. 

      We thank this reviewer for the positive and thorough assessment of our manuscript.  

      Strengths:  

      The question (what receptors of ITPa in Drosophila) that this study tries to address is important. The authors ruled out the Bombyx ITPa receptor orthologs as potential candidates. They identified a novel ITP receptor by using phylogenetic, anatomical analysis, and both in vitro and in vivo approaches. 

      The authors exhibited detailed anatomical data of both ITP isoforms and Gyc76C (in the main and supplementary figures), which helped audiences understand the expression of the neurons studied in the manuscript.  

      They also performed connectomes and single-cell transcriptomics analysis to study the synaptic and peptidergic connectivity of ITP-expressing neurons. This provided more information for better understanding and further study on systemic homeostasis modulation.  

      Weaknesses:  

      In the discussion section, the authors raised the limitations of the current study, which I mostly agree with, such as the lack of verification of direct binding between ITPa and Gyc76C, even though they provided different data to support that ITPa-Gyc76C signaling pathway regulates systemic homeostasis in adult flies. 

      We now provide evidence of Gyc76C activation by ITPa in a heterologous system (new Figure 7 and Figure 7 Supplement 1).

      Reviewer #2 (Public Review):  

      Summary:  

      The physiology and behaviour of animals are regulated by a huge variety of neuropeptide signalling systems. In this paper, the authors focus on the neuropeptide ion transport peptide (ITP), which was first identified and named on account of its effects on the locust hindgut (Audsley et al. 1992). Using Drosophila as an experimental model, the authors have mapped the expression of three different isoforms of ITP (Figures 1, S1, and S2), all of which are encoded by the same gene.  

      The authors then investigated candidate receptors for isoforms of ITP. Firstly, Drosophila orthologs of G-protein coupled receptors (GPCRs) that have been reported to act as receptors for ITPa or ITPL in the insect Bombyx mori were investigated. Importantly, the authors report that ITPa does not act as a ligand for the GPCRs TkR99D and PK2-R1 (Figure S3). Therefore, the authors investigated other putative receptors for ITPs. Informed by a previously reported finding that ITP-type peptides cause an increase in cGMP levels in cells/tissues (Dircksen, 2009, Nagai et al., 2014), the authors investigated guanylyl cyclases as candidate receptors for ITPs. In particular, the authors suggest that Gyc76C may act as an ITP receptor in Drosophila.  

      Evidence that Gyc76C may be involved in mediating effects of ITP in Bombyx was first reported by Nagai et al. (2014) and here the authors present further evidence, based on a proposed concordance in the phylogenetic distribution ITP-type neuropeptides and Gyc76C (Figure 2). Having performed detailed mapping of the expression of Gyc76C in Drosophila (Figures 3, S4, S5, S6), the authors then investigated if Gyc76C knockdown affects the bioactivity of ITPa in Drosophila. The inhibitory effect of ITPa on leucokinin- and diuretic hormone-31-stimulated fluid secretion from Malpighian tubules was found to be abolished when expression of Gyc76C was knocked down in stellate cells and principal cells, respectively (Figure 4). However, as discussed below, this does not provide proof that Gyc76C directly mediates the effect of ITPa by acting as its receptor. The effect of Gyc76C knockdown on the action of ITPa could be an indirect consequence of an alteration in cGMP signalling.  

      Having investigated the proposed mechanism of ITPa in Drosophila, the authors then investigated its physiological roles at a systemic level. In Figure 5 the authors present evidence that ITPa is released during desiccation and accordingly, overexpression of ITPa increases survival when animals are subjected to desiccation. Furthermore, knockdown of Gyc76C in stellate or principal cells of Malphigian tubules decreases survival when animals are subject to desiccation. However, whilst this is correlative, it does not prove that Gyc76C mediates the effects of ITPa. The authors investigated the effects of knockdown of Gyc76C in stellate or principal cells of Malphigian tubules on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. It is not clear, however, why animals overexpressing ITPa were also not tested for its effect on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. In Figures 6 and S8, the authors show the effects of Gyc76C knockdown in the female fat body on metabolism, feeding-associated behaviours and locomotor activity, which are interesting. Furthermore, the relevance of the phenotypes observed to potential in vivo actions of ITPa is explored in Figure 7. The authors conclude that "increased ITPa signaling results in phenotypes that largely mirror those seen following Gyc76C knockdown in the fat body, providing further support that ITPa mediates its effects via Gyc76C." Use of the term "largely mirror" seems inappropriate here because there are opposing effects- e.g. decreased starvation resistance in Figure 6A versus increased starvation resistance in Figure 7A. Furthermore, as discussed above, the results of these experiments do not prove that the effects of ITPa are mediated by Gyc76C because the effects reported here could be correlative, rather than causative. 

      We thank this reviewer for an extremely thorough and fair assessment of our manuscript. 

      We have now performed salt stress tolerance and chill coma recovery assays using flies over-expressing ITPa (new Figure 10 Supplement 1).

      We agree that the use of the term “largely mirrors” to describe the effects of ITPa overexpression and Gyc76C knockdown is not appropriate and have changed this sentence. We also agree that the experiments did not provide direct evidence that the effects of ITPa are mediated by Gyc76C. To address this, we now provide evidence of Gyc76C activation by ITPa in a heterologous system (new Figure 7 and Figure 7 Supplement 1).

      Lastly, in Figures 8, S9, and S10 the authors analyse publicly available connectomic data and single-cell transcriptomic data to identify putative inputs and outputs of ITPa-expressing neurons. These data are a valuable addition to our knowledge ITPa expressing neurons; but they do not address the core hypothesis of this paper - namely that Gyc76C acts as an ITPa receptor.  

      The goal of our study was to comprehensively characterize an anti-diuretic system in Drosophila. Hence, in addition to identifying the receptor via which ITPa exerts its effects, we also wanted to understand how ITPa-producing neurons are regulated. Connectomic and single-cell transcriptomic analyses are highly appropriate for this purpose. We have now updated the connectomic analyses using an improved connectome dataset that was released during the revision of this manuscript. Our new analysis shows that lNSC<sup>ITP</sup> are connected to other endocrine cells that produce other homeostatic hormones (new Figure 13F). We also identify a pathway through which other ITP-producing neurons (LNd<sup>ITP</sup>) receive hygrosensory inputs to regulate water seeking behavior (new Figure 13E). Moreover, we now include results which showcase that ITPa-producing neurons (l-NSC<sup>ITP</sup>) are active (new Figure 8A and B) and release ITPa under desiccation. Together with other analyses, these data provide a comprehensive outlook on the when, what and how ITPa regulates systemic homeostasis.  

      Strengths:  

      (1) The main strengths of this paper are i) the detailed analysis of the expression and actions of ITP and the phenotypic consequences of overexpression of ITPa in Drosophila. ii). the detailed analysis of the expression of Gyc76C and the phenotypic consequences of knockdown of Gyc76C expression in Drosophila.  

      (2) Furthermore, the paper is generally well-written and the figures are of good quality. 

      We thank this reviewer for highlighting the strengths of this manuscript.

      Weaknesses:  

      (1) The main weakness of this paper is that the data obtained do not prove that Gyc76C acts as a receptor for ITPa. Therefore, the following statement in the abstract is premature: "Using a phylogenetic-driven approach and the ex vivo secretion assay, we identified and functionally characterized Gyc76C, a membrane guanylate cyclase, as an elusive Drosophila ITPa receptor." Further experimental studies are needed to determine if Gyc76C acts as a receptor for ITPa. In the section of the paper headed "Limitations of the study", the authors recognise this weakness. They state "While our phylogenetic analysis, anatomical mapping, and ex vivo and in vivo functional studies all indicate that Gyc76C functions as an ITPa receptor in Drosophila, we were unable to verify that ITPa directly binds to Gyc76C. This was largely due to the lack of a robust and sensitive reporter system to monitor mGC activation." It is not clear what the authors mean by "the lack of a robust and sensitive reporter system to monitor mGC activation". The discovery of mGCs as receptors for ANP in mammals was dependent on the use of assays that measure GC activity in cells (e.g. by measuring cGMP levels in cells). Furthermore, more recently cGMP reporters have been developed. The use of such assays is needed here to investigate directly whether Gyc76C acts as a receptor for ITPa. In summary, insufficient evidence has been obtained to conclude that Gyc76C acts as a receptor for ITPa. Therefore, I think there are two ways forward, either:  

      (a) The authors obtain additional biochemical evidence that ITPa is a ligand for Gyc76C.  

      or  

      (b) The authors substantially revise the conclusions of the paper (in the title, abstract, and throughout the paper) to state that Gyc76C MAY act as a receptor for ITPa, but that additional experiments are needed to prove this. 

      We thank the reviewer for this comment and agree with the two options they propose. We had previously tried different a cGMP reporter (Promega GloSensor cGMP assay) to monitor activation of Gyc76C by ITPa in a heterologous system. Unfortunately, we were not successful in monitoring Gyc76C activation by ITPa. We now utilized another cGMP sensor, Green cGull, to show that ITPa can indeed activate Gyc76C heterologously expressed in HEK cells (new Figure 7 and Figure 7 Supplement 1). However, we still cannot rule out the possibility that ITPa can act on additional receptors in vivo. This is based on our ex vivo Malpighian tubule assays (new Figure 6E and F). ITPa inhibits DH31- and LK-stimulated secretion and we show that this effect is abolished in Gyc76C knockdown specifically in principal and stellate cells, respectively. Interestingly, application of ITPa alone can stimulate secretion when Gyc76C is knocked down in principal cells (new Figure 6E). This could be explained by: 1) presence of another receptor for ITPa which results in diuretic actions and/or 2) low Gyc76C signaling activity (RNAi based knockdown lowers signaling but does not abolish it completely) could alter other intracellular messenger pathways that promote secretion. We have added text to indicate the possibility of other ITPa receptors. Nonetheless, our conclusions are supported by the heterologous assay results which indicate that ITPa can activate Gyc76C. Therefore, we do not alter the title. 

      (2) The authors state in the abstract that a phylogenetic-driven approach led to their identification of Gyc76C as a candidate receptor for ITPa. However, there are weaknesses in this claim. Firstly, because the hypothesis that Gyc76C may be involved in mediating effects of ITPa was first proposed ten years ago by Nagai et al. 2014, so this surely was the primary basis for investigating this protein. Nevertheless, investigating if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins is a valuable approach to addressing this issue. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. Thus, are there protostome species in which both ITP-type and Gyc76C-type genes/proteins have been lost? Furthermore, are there any protostome species in which an ITP-type gene is present but an Gyc76C-type gene is absent, or vice versa? If there are protostome species in which an ITP-type gene is present but a Gyc76C-type gene is absent or vice versa, this would argue against Gyc76C being a receptor for ITPa. In this regard, it is noteworthy that in Figure 2A there are two ITP-type precursors in C. elegans, but there are no Gyc76Ctype proteins shown in the tree in Figure 2B. Thus, what is needed is a more detailed analysis of protostomes to investigate if there really is correspondence in the phylogenetic distribution of Gyc76C-type and ITP-type genes at the species level. 

      We thank the reviewer for this comment. While the previous study by Nagai et al had implicated Gyc76C in the ITP signaling pathway, how they narrowed down Gyc76C as a candidate was not reported. Therefore, our unbiased phylogenetic approach was necessary to ensure that we identified all suitable candidate receptors. Indeed, our phylogenetic analysis also identified Gyc32E as another candidate ITP receptor. However, we did not pursue this receptor further as our expression data (new Figure 4 Supplement 2) indicated that Gyc32E is not expressed in osmoregulatory tissues and therefore likely does not mediate the osmotic effects of ITPa. 

      We also appreciate the suggestion to perform a more detailed phylogenetic analysis for the peptide and receptor. We did not include C. elegans receptors in the phylogenetic analysis because they tend to be highly evolved and routinely cause long-branch attraction (see: Guerra and Zandawala 2024: https://doi.org/10.1093/gbe/evad108). We (specifically the senior author) have previously excluded C. elegans receptors in the phylogenetic analysis of GnRH and Corazonin receptors for similar reasons (see: Tian and Zandawala et al. 2016: 10.1038/srep28788). 

      Unfortunately, absence of a gene in a genome is hard to prove especially when they are not as high-quality as the genomes of model systems (e.g. Drosophila and mice). Moreover, given the concern of this reviewer that our physiological and behavioral data on ITPa and Gyc76C only provide correlative evidence, we decided against performing additional phylogenetic analysis which also provides correlative evidence. Our only goal with this analysis was to identify a candidate ITPa receptor. Since we have now functionally characterized this receptor using a heterologous system, we feel that the current phylogenetic analysis was able to successfully serve its purpose.  

      (3) The manuscript would benefit from a more comprehensive overview and discussion of published literature on Gyc76C in Drosophila, both as a basis for this study and for interpretation of the findings of this study.  

      We thank the reviewer for this comment. We have now included a broader discussion of Gyc76C based on published literature.  

      Reviewer #3 (Public Review):  

      Summary:  

      The goal of this paper is to characterize an anti-diuretic signaling system in insects using Drosophila melanogaster as a model. Specifically, the authors wished to characterize a role of ion transport peptide (ITP) and its isoforms in regulating diverse aspects of physiology and metabolism. The authors combined genetic and comparative genomic approaches with classical physiological techniques and biochemical assays to provide a comprehensive analysis of ITP and its role in regulating fluid balance and metabolic homeostasis in Drosophila. The authors further characterized a previously unrecognized role for Gyc76C as a receptor for ITPa, an amidated isoform of ITP, and in mediating the effects of ITPa on fluid balance and metabolism. The evidence presented in favor of this model is very strong as it combines multiple approaches and employs ideal controls. Taken together, these findings represent an important contribution to the field of insect neuropeptides and neurohormones and have strong relevance for other animals. 

      We thank this reviewer for the positive and thorough assessment of our manuscript.

      Strengths:  

      Many approaches are used to support their model. Experiments were wellcontrolled, used appropriate statistical analyses, and were interpreted properly and without exaggeration.  

      Weaknesses:  

      No major weaknesses were identified by this reviewer. More evidence to support their model would be gained by using a loss-of-function approach with ITPa, and by providing more direct evidence that Gyc76C is the receptor that mediates the effects of ITPa on fat metabolism. However, these weaknesses do not detract from the overall quality of the evidence presented in this manuscript, which is very strong.  

      We agree with this reviewer regarding the need to provide additional evidence using a loss-of-function approach with ITPa. We now characterize the phenotypes following knockdown of ITP in ITP-producing cells (new Figure 9). Our results are in agreement with phenotypes observed following Gyc76C knockdown, lending further support that ITPa mediates its effects via Gyc76C. Unfortunately, we are not able to provide evidence that ITPa acts on Gyc76C in the fat body using the assay suggested by this reviewer (explained in detail below). Instead, we now provide direct evidence of Gyc76C activation by ITPa in a heterologous system (new Figure 7 and Figure 7 Supplement 1).

      Reviewer #1 (Recommendations For The Authors):  

      Here, I have several extra concerns about the work as below:  

      (1) The authors confirmed the function of ITPa in regulating both osmotic and metabolic homeostasis by specifically overexpressing ITPa driven by ITP-RCGal4 in adult flies (Figures. 5 and 7). Have authors ever tried to knock down ITP in ITP-RC-Gal4 neurons? What was the phenotype? Especially regarding the impact on metabolic homeostasis, does knocking down ITP in ITP neurons mimic the phenotypes of Gyc76C fat body knockdown flies? 

      We thank the reviewer for this suggestion. We now characterize the phenotypes following knockdown of ITP using ITP-RC-Gal4 (new Figure 9). Our results are in agreement with phenotypes observed following Gyc76C knockdown, lending further support that ITPa mediates its effects via Gyc76C.

      The authors mentioned that the existing ITP RNAi lines target all three isoforms. It would be interesting if the authors could overexpress ITPa in ITPRC-Gal4>ITP-RNAi flies and confirm whether any phenotypes induced by ITP knockdown could be rescued. It will further confirm the role of ITPa in homeostasis regulation.  

      We thank the reviewer for this suggestion. Unfortunately, this experiment is not straightforward because knockdown with ITP RNAi does not completely abolish ITP expression (see Figure 9A). Hence, the rescue experiment needs to be ideally performed in an ITP mutant background. However, ITP mutation leads to developmental lethality (unpublished observation) so we cannot generate all the flies necessary for this experiment. Therefore, we cannot perform the rescue experiments at this time. In future studies, we hope to perform knockdown of specific ITP isoforms using the transgenes generated here (Xu et al 2023: 10.1038/s41586-023-06833-8).   

      (2) In Figures 5A and B, the authors nicely show the increased release of ITPa under desiccation by quantifying the ITPa immunolabelling intensity in different neuronal populations. It may be induced by the increased neuronal activity of ITPa neurons under the desiccated condition. Have the authors confirmed whether the activity of ITPa-expressing neurons is impacted by desiccation?  

      The TRIC system may be able to detect the different activity of those neurons before and after desiccation. This may further explain the reduced ITPa peptide levels during desiccation.  

      We thank the reviewer for this suggestion. We have now monitored the activity of ITPa-expressing neurons using the CaLexA system (Masuyama et al 2012: 10.3109/01677063.2011.642910). Our results indicate that ITPa neurons are indeed active under desiccation (new Figure 8A and B). These results are also in agreement with ITPa immunolabelling showing increased peptide release during desiccation (new Figure 8C and D). Together, these results show that ITPa neurons are activated and release ITPa under desiccation.  

      (3) What about the intensity of ITPa immunolabelling in other ITPa-positive neurons (e.g., VNC) under desiccation? If there is no change in other ITPa neurons, it will be a good control. 

      We thank the reviewer for this suggestion. Unfortunately, ITPa immunostaining in VNC neurons is extremely weak preventing accurate quantification of ITPa levels under different conditions. We did hypothesize that ITPa immunolabelling in clock neurons (5<sup>th</sup>-LN<sub>v</sub> and LN<Sub>d</sub><sup>ITP</sup>) would not change depending on the osmotic state of the animal. However, our results (Figure 8C and D) indicate that ITPa from these neurons is also released under desiccation. Interestingly, LNd<sup>ITP</sup>, which also coexpress Neuropeptide F (NPF) have recently been implicated in water seeking during thirst (Ramirez et al, 2025: 10.1101/2025.07.03.662850). Our new connectomic-driven analysis shows that these neurons can receive thermo/hygrosensory inputs (new Figure 13E). Hence, it is conceivable that other ITPa-expressing neurons also release ITPa during thirst/desiccation.

      (4) The adult stage, specifically overexpression of ITPa in ITP neurons, does show significant phenotypes compared to controls in both osmotic and metabolic homeostasis-related assays. It would be helpful if authors could show how much ITPa mRNA levels are increased in the fly heads with ITPa overexpression (under desiccation & starvation or not). 

      We thank the reviewer for this suggestion. We have now included immunohistochemical evidence showing increase in ITPa peptide levels in flies with ITPa overexpression (new Figure 10A). We feel that this is a better indicator of ITPa signaling level instead of ITPa mRNA levels.   

      (5) Another question concerns the bloated abdomens of ITPa-overexpressing flies. Are the bloated abdomens of ITPa OE female flies (Figure 5E) due to increased ovary size (Figure 7G)? Have the authors also detected similar bloated abdomens in male flies with ITPa overexpression? Since both male and female flies show more release of ITPa during the desiccation.  

      We thank the reviewer for this comment. The bloated abdomen phenotype seen in females can be attributed to increased water content since we see a similar phenotype in males (see Author response image 1 below).

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Page 1 - change "Homeostasis is obtained by" to "Homeostasis is achieved by".  

      Changed

      (2) Page 1 - change "Physiological responses" to "Physiological processes". 

      Changed

      (3) Page 2 - Change "Recently, ITPL2 was also shown to mediate anti-diuretic effects via the tachykinin receptor" to "Recently, ITPL2 was also shown to exert anti-diuretic effects via the tachykinin receptor". 

      Changed

      (4) Page 9 - "(C) Adult-specific overexpression of ITPa using ITP- RC-GAL4TS (ITP-RC-T2A-GAL4 combined with temperature-sensitive tubulinGAL80) increases desiccation" Unless I am misunderstanding Fig 5C, I think what is shown is that overexpression of ITPa prolongs survival during a period of desiccation. I am not sure what the authors mean by "increases desiccation". In the text (page 9) the authors state "ITPa overexpression improves desiccation tolerance, which is a much clearer statement than what is in the figure legend. 

      We thank the reviewer for identifying this oversight. We have now changed the caption to “increases desiccation tolerance”.  

      (5) Page 11 - The authors conclude that "increased ITPa signaling results in phenotypes that largely mirror those seen following Gyc76C knockdown in the fat body, providing further support that ITPa mediates its effects via Gyc76C." Use of the term "largely mirror" seems inappropriate here because there are opposing effects- e.g. decreased starvation resistance in Figure 6A versus increased starvation resistance in Figure 7A.  

      Perhaps there is a misunderstanding of what is meant by "mirroring" - it means the same, not the opposite. 

      We thank the reviewer for this comment. We agree that the use of the term “largely mirrors” to describe the effects of ITPa overexpression and Gyc76C knockdown is not appropriate and have changed this sentence as follows: “Taken together, the phenotypes seen following Gyc76C knockdown in the fat body largely mirror those seen following ITP knockdown in ITP-RC neurons, providing further support that ITPa mediates its effects via Gyc76C.”

      (6) Page 12 - There appear to be words missing between "neurons during desiccation, as well as their downstream" and "the recently completed FlyWire adult brain connectome" 

      We thank the reviewer for highlighting this mistake. We have changed the sentence as following: “Having characterized the functions of ITP signaling to the renal tubules and the fat body, we wanted to identify the factors and mechanisms regulating the activity of ITP neurons during desiccation, as well as their downstream neuronal pathways. To address this, we took advantage of the recently completed FlyWire adult brain connectome (Dorkenwald et al., 2024, Schlegel et al., 2024) to identify pre- and post-synaptic partners of ITP neurons.”

      (7) Page 15 - "can release up to a staggering 8 neuropeptides" - I suggest that the word "staggering" is removed. The notion that individual neurons release many neuropeptides is now widely recognised (both in vertebrates and invertebrates) based on analysis of single-cell transcriptomic data. 

      Removed staggering.

      (8) Page 16 - "(Farwa and Jean-Paul, 2024)" - this citation needs to be added to the reference list and I think it needs to be changed to "Sajadi and Paluzzi, 2024". 

      We thank the reviewer for highlighting this oversight. The correct citation has now been added.

      (9) It is noteworthy that, based on a PubMed search, there are at least thirteen published papers that report on Gyc76C in Drosophila (PMIDs: 34988396, 32063902, 27642749, 26440503, 24284209, 23862019, 23213443,  21893139, 21350862, 16341244, 15485853, 15282266, 7706258). However, none of these papers are discussed/cited by the authors. This is surprising because the authors' hypothesis that Gyc76C acts as a receptor for ITPa surely needs to be evaluated and discussed with reference to all the published insights into the developmental/physiological roles of this protein. 

      We thank the reviewer for this comment. Some of the references mentioned above (21350862, 16341244, 15485853) mainly report on soluble guanylyl cyclases and not membrane guanylyl cyclase like Gyc76C. Based on other studies on Gyc76C and its role in immunity and development, we have now expanded the discussion on additional roles of ITPa.

      Reviewer #3 (Recommendations For The Authors):  

      I have only a few comments that will help the authors strengthen a couple of aspects of their model.  

      (1) The case for Gyc76C as a receptor for ITPa in regulating fluid homeostasis is clear, given the experiments the authors carried out where they applied ITPa to tubules and showed that the effects of ITPa on tubule secretion were blocked if Gyc76C was absent in tubules. This approach, or something similar, should be used to provide conclusive proof that ITPa's metabolic effects on the fat body go through Gyc76C.  

      At present (unless I missed it) the authors only show that gain of ITPa has the opposite phenotype to fat body-specific loss of Gyc76C. While this would be the expected result if ITPa/Gyc76C is a ligand-receptor pair, it is not quite sufficient to conclusively demonstrate that Gyc76C is definitely the fat body receptor. Ex vivo experiments such as soaking the adult fat body carcasses with and without Gyc76C in ITPa and monitoring fat content via Nile Red could be one way to address this lack of direct evidence. The authors could also make text changes to explicitly mention this lack of conclusive evidence and suggest it as a future direction.

      We thank the reviewer for this comment. We have now conclusively demonstrated that Gyc76C is activated by ITPa in a heterologous assay (new Figure 7 and Figure 7 Supplement 1). With this evidence, we can confidently claim that ITPa can mediate its actions via Gyc76C in various tissues including the Malpighian tubules and fat body. Nonetheless, we liked the suggestion by this reviewer to perform the ex vivo assay and test the effect of ITPa on the fat body. Unfortunately, it is challenging to do this because increased ITPa signaling (chronically using ITPa overexpression) results in increased lipid accumulation in the fat body in vivo. Therefore, we would likely not see the effect of ITPa addition in an ex vivo fat body preparation since lipogenesis will not occur in the absence of glucose. However, ITPa could counteract the effects of other lipolytic factors such as adipokinetic hormone (AKH). To test this hypothesis, we monitored fat content in the fat body incubated with and without AKH (see Author response image 2 below showing representative images from this experiment). Since we did not observe any differences in fat levels between these two conditions, we were unable to test the effects of ITPa on AKH-activity using this assay.

      Author response image 2.

      (2) I did not see any loss of function data for ITPa - is this possible? If so this would strengthen the case for a 1:1 relationship between loss of ligand and loss of receptor. Alternatively, the authors could suggest this as an important future direction. 

      We agree with this reviewer regarding the need to provide additional evidence using a loss-of-function approach with ITPa. We have now characterized the phenotypes following knockdown of ITP in ITP-producing cells (new Figure 9). Our results are in agreement with phenotypes observed following Gyc76C knockdown, lending further support that ITPa mediates its effects via Gyc76C.

      (3) For clarity, please include the sex of all animals in the figure legend. Even though the methods say 'females used unless otherwise indicated' it is still better for the reader to know within the figure legend what sex is displayed. 

      We thank the reviewer for this suggestion and have now included sex of the animals in the figure legends.  

      (4) Please state whether females are mated or not, as this is relevant for taste preferences and food intake. 

      We apologize for this oversight. We used mated females for all experiments. This has now been included in the methods.  

      (5) More discussion on the previous study on metabolic effects of ITP in this study compared with past studies would help readers appreciate any similarities and/or differences between this study and past work (Galikova 2018, 2022) 

      We thank the reviewer for this suggestion. Unfortunately, it is difficult to directly compare our phenotypes with the metabolic effects of ITP reported in Galikova and Klepsatel 2022 because the previous study used a ubiquitous driver (Da-GAL4) to manipulate ITP levels. Ectopically overexpressing ITPa in non-ITP producing cells can result in non-physiological phenotypes. This is evident in their metabolic measurements where both global overexpression and knockdown of ITP results in reduced glycogen and fat levels, and starvation tolerance. Moreover, ITP-RC-GAL4 used in our study to overexpress and knockdown ITPa is more specific than the Da-GAL4 used previously. Da-GAL4 would include other ITP cells (e.g. ITP-RD producing cells). Since ITP is broadly expressed across the animal, it is difficult to parse out the phenotypes of ITPa and other isoforms using manipulations performed with Da-GAL4. We have mentioned this limitation in the results for ITP knockdown as follows: “A previous study employing ubiquitous ITP knockdown and overexpression suggests that Drosophila ITP also regulates feeding and metabolic homeostasis (Galikova and Klepsatel, 2022) in addition to osmotic homeostais (Galikova et al., 2018). However, given the nature of the genetic manipulations (ectopic ITPa overexpression and knockdown of ITP in all tissues) utilized in those studies, it is difficult to parse the effects of ITP signaling from ITPa-producing neurons.”

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to thank the reviewers for taking the time to thoroughly revise our work. We have considered their suggestions carefully and tried our best to respond to them point by point. Based on their recommendations, two major issues came forward: (1) the strength of our claims about the involvement of cohesin in HR-driven repair in late mitosis; and (2) the underlying mechanism that reconstitutes cohesin in late mitosis after DNA damage. In this revision, we focused on the former and left the latter out (yet it is discussed). We considered that the question of how cohesin returns in late mitosis after DNA damage is important and worthy of further research, but it is beyond the scope of this study (as it is the putative role of condensin). Thus, we have focused on buttressing our main claims, as otherwise pointed out by the reviewers. What have we done to strengthen the role of cohesin in late mitotic DSB repair?

      (1) We have biologically replicated and quantified the reappearance of Scc1 after DSB generation (new Figure 1e). We have also quantified changes for the other core subunits (new Figure 1c-e).

      (2) We now show that the newly synthetized Scc1 serves to assemble back the cohesin complex (new Figure 2a and S1).

      (3) We have performed chromatin fractionation and show that cohesin binding to chromatin increases after the HO-induced DSB (new Figure 2b and S2).

      (4) We have performed ChIP assays and show that, despite the increase in the chromatin-bound fraction, the HOcs DSB does not recruit new cohesin to the locus (new Figure 2c and S3).

      (5) A key assertion in the preprint version was that depleting cohesin using the auxin degron system impairs HR-driven MAT switching. This claim was based on a direct comparison of cultures treated or not with auxin (-/+ IAA). However, during the revision process, we realized that auxin treatment itself could interfere with MAT switching. Firstly, we noticed a diminished HOcs cutting efficiency by HO in +IAA cultures (Figure S6). Secondly, the apparently dramatic delay in gene conversion to MAT_α could actually be related to other undesirable effects of IAA downstream in the repair process. Thus, we decided to repeat this experiment with strains that differ in their response to auxin, so that we could compare all strains in the presence of auxin. We compared four isogenic strains: _SMC3; SMC3-aid*; SMC3 + OsTIR1; and SMC3-aid* + OsTIR1. As a result, we can now show that cohesin depletion does not affect MAT switching (see new Figure 4b-d).

      (6) We recently reported a negative chemical interaction between auxin and phleomycin. Auxin appears to diminish the ability of phleomycin to generate DSBs (Comm Biol 2025, doi: 10.1038/s42003-025-08416-x; see Figures S14 and S15 in that paper). While the underlying nature of this interaction is unknown to us (we are working on it), this leads us to omit the coalescence assay included in the preprint version (old Figure 4c), as the diminished coalescence upon IAA addition is actually due to this effect rather than cohesin depletion. This is also in agreement with the new data we include in the revised version, in which we observed only minor changes in cohesin reconstitution and chromatin binding after phleomycin (Figure 2a,b; S1 and S2).  

      (7) In addition to addressing these reviewers’ requests, we have better characterized the MAT switching in late mitosis by incorporating the kinetics of _rad9_Δ (deficient in the DNA damage checkpoint), _yku70_Δ (deficient in non-homologous end joining) and _mre11_Δ (deficient in DSB end tethering). The effect of _rad52_Δ (deficient in HR) has been described elsewhere (our iScience 2024, 10.1016/j.isci.2024.110250).

      As a result of these new experiments, new figure panels have been added in the main figures and as supplementary figures. To make room for the these panels in the main figures and keep the short report format, the following changes have been made: (i) old figures and new panels have been combined into four main figures, (ii) some panels from the old figures have been moved to supplementary figures, and (iii) some panels have been reordered for the sake of simplicity and fluidity in the main text. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The cohesin complex maintains sister chromatid cohesion from S phase to anaphase. Beyond that, DSBs trigger cohesin recruitment and post-replication cohesion at both damage sites and globally, which was originally reported in 2004. In their recent study, Ayra-Plasencia et al reported in telophase, DSBs are repaired via HR with re-coalesced sister chromatids (Ayra-Plasencia & Machín, 2019). In this study, they show that HR occurs in a Smc3-dependent way in late mitosis.

      Strengths:

      The authors take great advantage of the yeast system, they check the DSB processing and repair of a single DSB generated by HO endonuclease, which cuts the MAT locus in chromosome III. In combination with cell synchronization, they detect the HR repair during G2/M or late mitosis. and the cohesin subunit SMC3 is critical for this repair. Beyond that, full-length Scc1 protein can be recovered upon DSBs.

      Weaknesses:

      These new results basically support their proposal although with a very limited molecular mechanistic progression, especially compared with their recent work.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "Cohesin still drives homologous recombination repair of DNA double-strand breaks in late mitosis" by Ayra-Plasencia et al. investigates regulations of HR repair in conditional cdc15 mutants, which arrests the cell cycle in late anaphase/telophase. Using a non-competitive MAT switching system of S. cerevisiae, they show that a DSB in telophase-arrested cells elicits a delayed DNA damage checkpoint response and resection. Using a degron allele of SMC3 they show that MATa-to-alpha switching requires cohesin in this context. The presence of a DSB in telophase-arrested cells leads to an increase in the kleisin subunit Scc1 and a partial rejoining of sister chromatids after they have separated in a subset of cells.

      Strengths:

      The experiments presented are well-controlled. The induction systems are clean and well thought-out.

      Weaknesses:

      The manuscript is very preliminary, and I have reservations about its physiological relevance. I also have reservations regarding the usage of MAT to make the point that inter-sister repair can occur in late mitosis.

      Regarding these two weaknesses:

      - Physiological relevance: This is something we already addressed in our previous research work (Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8), and which was further discussed in a follow-up theoretical paper (Bioessays. 2020 ;42(7):e2000021. doi: 10.1002/bies.202000021). In summary, this is physiologically relevant because a DSB in anaphase activates a late-mitotic checkpoint so the DSB can be repaired before cytokinesis. The fact that anaphase is quick and only a minor fraction of cells get a DSB in this cell cycle stage in an asynchronous population does not preclude its importance since it is enough a single mis-repaired DSB in hundreds of cells to mutate a population in an health- or evolution-relevant way.

      - MAT system in late mitosis: It was not our intention to use the MAT switching assay to state that inter-sister repair can occur in late-M. The purpose was to address whether HR was fully functional in this non-G2/M non-G1 stage. Having said that, it is very challenging to design a strategy based on sequence-specific DSB to tackle the inter-sister repair in late-M. Any endonuclease-generated DSB is going to cut in both sisters. This is something we also deeply discussed in our previous works (Nat Commun & Bioessays).    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) Smc3 degradation affects Rad53 activation upon DSBs, and this may directly lead to HR repair deficiency. Smc3 also could be phosphorylated by ATM and functions in DNA damage checkpoint activation, these alternative possibilities should also be tested before addressing the bona fide role of Smc3 in this context.

      Our previous data already suggested that Rad53 hyperphosphorylation still occurs after Smc3 degradation (Figure S6). Regardless, the question of whether the DNA damage checkpoint (DDC) may play a distinct role in the MAT switching has been addressed in this revision by comparing RAD9 versus rad9_Δ. Rad9 is a mediator in the DDC required for the activation of Rad53. We have seen that MAT switching in _rad9_Δ is as efficient as in _RAD9 (new Figure S5d-f).

      On the other hand, our new results, in which we have compared four different strains with all auxin system combinations in the presence of auxin, show that cohesin depletion does not affect MAT switching. Previously, we compared minus versus plus auxin and noticed diminished HO cutting efficiency. Thus, we repeated this experiment with four isogenic strains (SMC3; SMC3-aid*; SMC3 + OsTIR1; and SMC3-aid* + OsTIR1) that differ in their response to auxin and ability to degrade cohesin, so that we could compare all strains in the presence of auxin. As a result, we can now affirm that cohesin depletion does not affect MAT switching (see new Figure 4b-d). Therefore, HR appears efficient after cohesin depletion.

      (2) The requirement of cohesin subunit Smc3 and "coincidently" recovery of Scc1 are not sufficient to claim they act as a cohesin complex in this scenario. CoIP in the chromatin fraction after DSBs to prove the cohesin complex formation is recommended. If they act as a complex, are cohesin loader Scc2/4 required?

      We have constructed a SMC3-HA SCC1-myc strain. We have purified the chromatin-bound fraction as well as performing the co-IP. We have found Smc1-acSmc3-Scc1 forms a complex after Scc1 returns, and that at least a fraction of this complex binds to the chromatin in our HO model of DSBs in late anaphase (the cdc15-2 arrest). This is now shown in the new Figures 2a,b and S1,S2.

      As for the requirement of Scc2/4, we consider that the mechanisms underlying how Scc1 comes back, how a new cohesin complex is reassembled, and how it can partly bind to the chromatin in late anaphase are beyond the scope of this study and worth pursuing in a follow-up story.

      (3) Figure 3b. acetylated SMC3 was prominently detected in the absence of DSBs. During the cohesion cycle, the cohesin was released from chromatin in a separase-dependent manner at the anaphase onset. Released Smc3 was deacetylated by Hos1 subsequently. In principle, the acSMC3 level could be very low in late mitosis.

      In that figure (now renumbered as Fig S6), we did detect acetylated Smc3 for the remnant Smc3 still found in late mitosis, however, a direct comparison between the acetylated versus non-acetylated pools was not performed, and would require more sophisticated approaches. Note that blots are distinctly exposed until the band is detected, and that signal intensity is antibody-specific. The presence of an acSmc3 pool in the cdc15-2 arrest is now further confirmed by the new blots in Figures 2a, S1 and S2b.

      On the other hand, previous time course experiments from G1 and G2/M releases point out that Smc3 deacetylation is incomplete in anaphase, with up to 30% of acetylated Smc3 remaining (Beckouët et al, 2010 doi:10.1016/j.molcel.2010.08.008). This is consistent with the presence of acSmc3 in the cdc15-2 arrest.   

      (4) Did the author examine the acSMC3 levels returning after DSB, as Scc1's levels? If so, how about the Eco1's protein level? Chromatin fractionation could be conducted to check the chromatin-bound SMC3, acSMC3/Eco1, SCC1, SCC1 phosphorylation, and SMC1. These results will tell us whether cohesin functions in DSB repair in late M in a cohesion state.

      As stated above, we have now determined that cohesin depletion does not affect HR-driven MAT switching. As for the other questions, yes, we have performed both an assessment of acSmc3 in the pull down and chromatin fractionation, before and after DSBs (new Figures 2a, S1 and S2b). Interestingly, we have noticed a difference between the HO-generated and the phle-generated DSBs. It appears that the former leads to a better reconstituted Smc1-acSmc3-Scc1 complex and more chromatin-bound cohesin. The overall acSmc3 levels do not appear to significantly change in the whole cell extracts, although there could be further posttranslational modifications in telophase (see the changes in intensity between the two acSmc3 bands in Figure S1).

      The role of Eco1 has not been directly addressed but is discussed. The main point here is that Eco1 levels may be low after G2/M (e.g., Lyons and Morgan, 2011), but there is still a significant acSmc3 pool in anaphase as Hof1 does not deacetylate all Smc3 (Beckouët et al., 2010). 

      (5) Figure 4a, the return of full-length Scc1 is based on a single experiment. What's the mechanism? Inhibition of cleavage or re-expression? How about its mRNA levels?

      We have repeated the full-length Scc1 experiment two more times. Now, an expression graph is included as a new Figure 1e. The two other subunits, Smc1 and Smc3, have been assessed as well, with no major changes in abundance (new Figure 1c and d).

      We feel that the exact molecular mechanism of how Scc1 returns is beyond the scope of this study, but we discuss that the DDC may either inactivate separase or protect Scc1 against it. Indeed, there is literature that supports both mechanisms (e.g., Heidinger-Pauli et al., 2008 doi:10.1016/j.molcel.2008.06.005; Yam et al., 2020 doi:10.1093/nar/gkaa355).   

      Minor points:

      (6) FACS data should be shown for all cell synchronization experiments.

      From our previous own works, FACS profiles add little to late-M experiments. To properly confirm late-M, microscopy is a must. FACS cannot differentiate between G2/M (metaphase-like), anaphase, telophase and the ensuing G1 (as cdc15-2 cells do not immediately split apart after re-entering G1). In all experiments, Tel samples (late-M cdc15-2 arrest) were characterized by >95% large budded binucleated cells.

      (7) Figure 1d, A loading control of Rad53-P in is missing. The "Arrest" samples should be loaded again on the right to confirm the shift of Rad53, but not due to "smiling gels".

      It is true that the blot on the right has a right-handed smile; however, it is very clear the presence of the Rad53/Rad53-P partner. Because there is not a full shift from Rad53 to Rad53-P, the concern of misidentifying Rad53-P as a result of a blot smile is unfounded.

      (8) Figure 1c, After the HO cut, the resected DNA at the 726 bp site reaches to platform at about 4 hrs, while it still increases at the 5.6 kb site. Thus, it is difficult to conclude that "The time to reach half of the maximum possible resection (t1/2) was ~1 h at 0.7 Kb and ~2.5 h at 5.7 Kb from the DSB, respectively".

      We assumed that both loci reach the plateau at 0.8 (which is consistent with other studies), so the t1/2 was calculated when the resected intersected 0.4.

      (9) Figure 2b and 2c are wrongly labeled.

      We have fixed this (now Fig. 3d and e).

      (10) Figure 2d, Double check and make sure the quantitative data reflects the representative result. E.g. in Figure 2b (in fact should be 2c). For instance, in Figure 2b, the MATα signals seem to remain stable from 60' to 180', but they keep increasing in Figure 2d. In Yamaguchi & James E. Haber's paper, the signals and changes of MATa and MATα over time are way stronger compared to this study.

      We have double checked this. It is true that the sum of MATα, MATalpha and cut HOcs bands throughout the assay does not have the intensity seen for MATa before the HO induction (Tel), but MATalpha and HOcs signals cannot be established based on the equimolarity of the reaction as all band signals are probe-specific (the best indication of this can be seen in the signal comparison between MAT_α and _MAT distal at Tel). Alternatively, some resected HOcs may remain unrepaired.

      As for the referred example (now Figure 3e), note that they are double normalized to ACT1 and MAT_α (Tel), and the _ACT1 band gets fainter after 60’. This explains the increase in the MATalpha quantification in spite of what is apparently seen in the blot.

      (11) Typos and fonts: e.g. lines 111-112; line 76 "his link".

      We have fixed this. Thanks.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) Physiological relevance. The authors show that HR can happen in the anaphase to telophase interval, yet does it outside of an hours-long artificial arrest upon inactivation of Cdc15? It is this reviewer's understanding that the duration of the anaphase to telophase transition is short, in the order of minutes. In fact, break signaling and resection are delayed by ~1 hour (Fig. 1), which suggests that cells avoid dealing with the damage and engaging in HR in the anaphase-telophase interval. Is there any described physiological context or checkpoint that blocks this transition for extended periods, that would make any of the findings in this paper relevant?

      This concern about the physiological relevance was addressed in our previous study (Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8). In that paper’s Figure 1, we showed that G1 re-entry after a cdc15-2 release was delayed by several hours when DSBs had been previously generated at the cdc15-2 arrest. We also showed that such a delay depended on Rad9 (i.e., the DNA damage checkpoint). In addition, synchronized (not arrested) cells transiting through anaphase responded to DSB generation by slowing anaphase transition while partly regressing chromosome segregation (Figure S7 in that paper).

      (2) Methodological caveats. It is unclear why the authors chose to study DSB-repair in the context of MATa-to-alpha switching (which uses an ectopic donor on the other chromosome arm) as a model for inter-sister repair. It creates a disconnect in the claims of the paper, which means to study inter-sister repair. Studying the kinetics of DSB repair by cytology following low-dose irradiation or radiomimetic drugs would have been a better option. Phleomycin is used in Fig. 4, but the repair kinetics (e.g. Rad52 foci) is not studied.

      The MAT switching assay was used here to address how much HR was functional in late-M compared to G2/M (metaphase-like). Then, it was employed to check how cohesin depletion hampers HR in late-M. Even though this is something we already deeply discussed previously (Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8; Bioessays. 2020 ;42(7):e2000021. doi: 10.1002/bies.202000021), it is worth recapitulating the methodological challenges that the study of inter-sister repair has in late-M: (i) endonuclease-based DSBs are going to generate two DSBs, one per sister chromatid; (ii) the use of a homologous chromosome without the cutting site as a template is pointless because a sister of the homolog is always going to co-segregate with the broken chromatid, and the same caveat applies for any other ectopic sequence. In this context, the MATa with the HML ectopic intrachromosomal sequence is as valid as any other option, with the advantage that it is a very well-known system.

      On the other hand, most of the reviewer’s concerns about the inter-sister repair by cytology and the role of Rad52 was addressed in our previous paper (Nat Commun). Note that our new results about the cohesin role on MAT switching show that this HR-mediated DSB repair does not depend on cohesin (new Figure 4b-d).

      (3) Preliminary work. The requirement of cohesin for MAT switching in cdc15 mutants would have warranted several additional experiments. Indeed, Cohesin has been shown to regulate homology search in multiple ways upon DNA damage checkpoint-induced metaphase-arrest (see Piazza et al. Nat Cell Biol 2021 (10.1038/s41556-021-00783-x), not cited in the current manuscript). Consequently, is the effect of cohesin observed in the MAT system specific to telophase or is it true in other cell-cycle phases? What is the mechanism behind this requirement (one may expect it not to depend on the sister since the HML donor is available within the damaged chromatid)? Does cohesin re-accumulate around the DSB site or genome-wide? How does the Esp1 activity decay from anaphase onset? Is cohesin required for the horseshoe folding of chr. III involved in MATa-to-alpha switching? Furthermore, condensin is involved in MATa-specific switching (Li et al. PLoS Genet 2019, 10.1371/journal.pgen.1008339), and condensin remains active on chromatin in cdc15 arrested cells, as shown on chr. XII (Lazar-Stefanita et al. EMBO J. 2017 10.15252/embj.201797342), which calls for determining the impact contribution of condensin in the recoil of the right ch.XII arm (Fig 4c) and on MAT switching.

      There are several points here:

      - Is the effect of cohesin observed in the MAT system specific to telophase or is it true in other cell-cycle phases?

      Our new results show that cohesin depletion does not affect MAT switching when four different strains with all auxin system combinations are compared in the presence of auxin. Previously, when we compared minus versus plus auxin, we noticed diminished HO cutting efficiency. Therefore, we repeated the experiment using four isogenic strains (SMC3, SMC3-aid*, SMC3 + OsTIR1, and SMC3-aid* + OsTIR1), which differ in their response to auxin and ability to degrade cohesin. This allowed us to compare all strains in the presence of auxin. As a result, we can now confirm that cohesin depletion does not affect MAT switching (see the new Figures 4b–d). Therefore, HR appears efficient after cohesin depletion. In agreement, the new ChIPs we have performed do not detect an increment in local cohesin after the HO DSB in telophase (but it does in cells arrested in G2/M).

      - What is the mechanism behind this requirement (one may expect it not to depend on the sister since the HML donor is available within the damaged chromatid)?

      As just said, we have changed our previous conclusion on cohesin and MAT switching. It was an effect of auxin addition rather than cohesin depletion.

      - Does cohesin re-accumulate around the DSB site or genome-wide?

      We have performed ChIP around the HOcs. We have found that it does accumulate in G2/M after HO induction, but it does not in telophase (new Figures 2c and S3). As for the global binding of cohesin, our chromatin fractionation data suggest there is ~2-fold increase in Smc1-Smc3, which also binds to the newly formed Scc1, rendering an overall increase in the chromatin-bound canonical complex (new Figures 2b and S2). Altogether, this suggests a genome-wide binding but with little role in the repair of HO DSBs.

      - How does the Esp1 activity decay from anaphase onset?

      We have not checked this here but it is an interesting question for a follow-up story.

      - Is cohesin required for the horseshoe folding of chr. III involved in MATa-to-alpha switching?

      Probably not in view of our new data in Figures 2c and 4b-d. The Piazza papers are cited and discussed.

      - Contribution of condensin in the recoil of the right ch.XII arm (Fig 4c) and on MAT switching.

      The role of condensin, which overtakes some cohesin function in late-M as the reviewer reminds, is worth studying indeed. However, we feel this deserves a separate and focus-on study. We does discuss, though, that condensin loading onto the arms in anaphase may prevent Smc1-Smc3 from loading after DSBs.

      Other points:

      (4) Is the retrograde behavior in Fig. 4c dependent on recombination?

      No, this is something we addressed in our previous paper (see Figure 4 in Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8).

      (5) Fig 3c: add a scheme of the system.

      A scheme was already shown in the old Figure 2a (note that the old Fig 3c is now Fig S6).

      (6) Fig 3b: annotate as in Fig 2b.

      We have fixed this (now the referred figures are S6a and 3d, respectively).

      (7) Authors used IAA concentrations 4- to 8-fold higher than commonly used. Given the solubility of IAA in DMSO (the most commonly used solvent), it is likely that authors treated their cells with >2% DMSO. This is expected to have broad transcriptional and physiological effects on yeast. A comparison of +IAA samples with a mock (DMSO) treatment would be more appropriate than a lack of treatment.

      The IAA stock solution was 500 mM in DMSO, so the final DMSO concentration for an 8 mM IAA solution was 1.6% (v/v). Although the stock concentration was high and some precipitation was observed during preparation, we always heated, sonicated, and vigorously vortexed the stock tube before adding IAA to the cultures. Thus, we kept the uncertainty in the final IAA concentration to a minimum.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, In cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Figure 6-figure supplement 2). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 1 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 1.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (8) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution.  We have updated in the cryoEM table

      Reviewer #2:

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins  (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

      For all other minors, we have made corrections/changes in our revised text and figures.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      What are the overarching principles by which prokaryotic genomes evolve? This fundamental question motivates the investigations in this excellent piece of work. While it is still very common in this field to simply assume that prokaryotic genome evolution can be described by a standard model from mathematical population genetics, and fit the genomic data to such a model, a smaller group of researchers rightly insists that we should not have such preconceived ideas and instead try to carefully look at what the genomic data tell us about how prokaryotic genomes evolve. This is the approach taken by the authors of this work. Lacking a tight theoretical framework, the challenge of such approaches is to devise analysis methods that are robust to all our uncertainties about what the underlying evolutionary dynamics might be.

      The authors here focus on a collection of ~300 single-cell genomes from a relatively well-isolated habitat with relatively simple species composition, i.e. cyanobacteria living in hotsprings in Yellowstone National Park, and convincingly demonstrate that the relative simplicity of this habitat increases our ability to interpret what the genomic data tells us about the evolutionary dynamics.

      Using a very thorough and multi-faceted analysis of these data, the authors convincingly show that there are three main species of Synechococcus cyanobacteria living in this habitat, and that apart from very frequent recombination within each species (which is in line with insights from other recent studies) there is also a remarkably frequent occurrence of hybridization events between the different species, and with as of yet unidentified other genomes. Moreover, these hybridization events drive much of the diversity within each species. The authors also show convincing evidence that these hybridization events are not neutral but are driven by selected by natural selection.

      Strengths:

      The great strength of this paper is that, by not making any preconceived assumptions about what the evolutionary dynamics is expected to look like, but instead devising careful analysis methods to tease apart what the data tells us about what has happened in the evolution in these genomes, highly novel and unexpected results are obtained, i.e. the major role of hybridization across the 3 main species living in this habitat.

      The analysis is very thorough and reading the detailed supplementary material it is clear that these authors took a lot of care in devising these methods and avoiding the pitfalls that unfortunately affect many other studies in this research area.

      The picture of the evolutionary dynamics of these three Synechococcus species that emerge from this analysis is highly novel and surprising. I think this study is a major stepping stone toward the development of more realistic quantitative theories of genome evolution in prokaryotes.

      The analysis methods that the authors employ are also partially novel and will no doubt be very valuable for analysis of many other datasets.

      We thank the reviewer for their appreciation of our work.

      Weaknesses:

      I feel the main weakness of this paper is that the presentation is structured such that it is extremely difficult to read. I feel readers have essentially no chance to understand the main text without first fully reading the 50-page supplement with methods and 31 supplementary materials. I think this will unfortunately strongly narrow the audience for this paper and below in the recommendations for the authors I make some suggestions as to how this might be improved.<br /> A very interesting observation is that a lot of hybridization events (i.e. about half) originate from species other than the alpha, beta, and gamma Synechococcus species from which the genomes that are analyzed here derive. For this to occur, these other species must presumably also be living in the same habitat and must be relatively abundant. But if they are, why are they not being captured by the sampling? I did not see a clear explanation for this very common occurrence of hybridization events from outside of these Synechococcus species. The authors raise the possibility that these other species used to live in these hot springs but are now extinct. I'm not sure how plausible this is and wonder if there would be some way to find support for this in the data (e.g that one does not observe recent events of import from one of these unknown other species). This was one major finding that I believe went without a clear interpretation.

      We agree with the reviewer that the extent of hybridization with other species is surprising. While we do feel that our metagenome data provide convincing evidence that “X” species are not present in MS or OS, we cannot currently rule out the presence of X in other springs. In the revision we explicitly mention the alternative hypothesis (Lines 239-242).

      The core entities in the paper are groups of orthologous genes that show clear evidence of hybridization. It is thus very frustating that exactly the methods for identifying and classifying these hybridization events were really difficult to understand (sections I and V of the supplement). Even after several readings, I was unsure of exactly how orthogroups were classified, i.e. what the difference between M and X clusters is, what a `simple hybrid' corresponds to (as opposed to complex hybrids?), what precisely the definitions of singlet and non-singlet hybrids are, etcetera. It also seems that some numbers reported in the main text do not match what is shown in the supplement. For example, the main text talks about "around 80 genes with more than three clusters (SM, Sec. V; fig. S17).", but there is no group with around 80 genes shown in Fig S17! And similarly, it says "We found several dozen (100 in α and 84 in β) simple hybrid loci" and I also cannot match those numbers to what is shown in the supplement. I am convinced that what the authors did probably made sense. But as a reader, it is frustrating that when one tries to understand the results in detail, it is very difficult to understand what exactly is going on. I mention this example in detail because the hybrid classification is the core of this paper, but I had similar problems in other sections.

      We thank the reviewer for pointing out these issues with our original presentation. In the revision, we have redone most of the analysis to simplify the methods and check the consistency of the results. We did not find any qualitative differences in our results after reanalysis, but some of the numbers for different hybridization patterns have changed. The most notable difference is an increase in the number of alpha-gamma simple hybrids and a corresponding decrease in mixed-species clusters (now labeled mosaic hybrids). These transfers are difficult to assign because we only have access to a single gamma genome. We have added a short explanation of this point in Lines 219-222.

      To improve the presentation, we significantly expanded the “Results” section to better explain our analysis and the different steps we take. We included two additional figures (Figs. 3 and 4) that illustrate the different types of hybrids and the heterogeneity in the diversity of alpha which is discussed in the main text and is important for interpreting our results. We also included two additional figures (Figs. 2 and 6) that were previously in the Appendix but were mentioned in the main text. We believe these changes should address most of the issues raised by the reviewer and hopefully make the manuscript easier to read.

      Although I generally was quite convinced by the methods and it was clear that the authors were doing a very thorough job, there were some instances where I did not understand the analysis. For example, the way orthogroups were built is very much along the lines used by many in the field (i.e. orthoMCL on the graph of pairwise matchings, building phylogenies of connected components of the graph, splitting the phylogenies along long branches). But then to subdivide orthogroups into clusters of different species, the authors did not use the phylogenetic tree already built but instead used an ad hoc pairwise hierarchical average linkage clustering algorithm.

      The reviewer is correct that there is an unexplained discrepancy between the clustering methods we used at different steps in our pipeline. We followed previous work by using phylogenetic distances for the initial clustering of orthogroups. On these scales we expect hybridization to play a minor role and phylogenetic distances to correlate reasonably well with evolutionary divergence. However, because of the extensive hybridization we observed, the use of phylogenetic models for species clustering is more difficult to justify. We therefore chose to simply use pairwise nucleotide distances, which make fewer assumptions about the underlying evolutionary processes and should be more robust. We have briefly explained our reasoning and the details of our clustering method in the revision (Lines 182-190).

      Reviewer #2 (Public Review):

      Summary:

      Birzu et al. describe two sympatric hotspring cyanobacterial species ("alpha" and "beta") and infer recombination across the genome, including inter-species recombination events (hybridization) based on single-cell genome sequencing. The evidence for hybridization is strong and the authors took care to control for artefacts such as contamination during sequencing library preparation. Despite hybridization, the species remain genetically distinct from each other. The authors also present evidence for selective sweeps of genes across both species - a phenomenon which is widely observed for antibiotic resistance genes in pathogens, but rarely documented in environmental bacteria.

      Strengths:

      This manuscript describes some of the most thorough and convincing evidence to date of recombination happening within and between cohabitating bacteria in nature. Their single-cell sequencing approach allows them to sample the genetic diversity from two dominant species. Although single-cell genome sequences are incomplete, they contain much more information about genetic linkage than typical short-read shotgun metagenomes, enabling a reliable analysis of recombination. The authors also go to great lengths to quality-filter the single-cell sequencing data and to exclude contamination and read mismapping as major drivers of the signal of recombination.

      We thank the reviewer for their appreciation of our work.

      Weaknesses:

      Despite the very thorough and extensive analyses, many of the methods are bespoke and rely on reasonable but often arbitrary cutoffs (e.g. for defining gene sequence clusters etc.). Much of this is warranted, given the unique challenges of working with single-cell genome sequences, which are often quite fragmented and incomplete (30-70% of the genome covered). I think the challenges of working with this single-cell data should be addressed up-front in the main text, which would help justify the choices made for the analysis.

      We have significantly expanded the “Results” section to better justify and explain the choices we made during our analysis. We hope these changes address the reviewer’s concerns and make the manuscript more accessible to readers.

      The conclusions could also be strengthened by an analysis restricted to only a subset of the highest quality (>70% complete) genomes. Even if this results in a much smaller sample size, it could enable more standard phylogenetic methods to be applied, which could give meaningful support to the conclusions even if applied to just ~10 genomes or so from each species. By building phylogenetic trees, recombination events could be supported using bootstraps, which would add confidence to the gene sequence clustering-based analyses which rely on arbitrary cutoffs without explicit measures of support.

      It seems to us that the reviewer’s suggestion presupposes that the recombination events we find can be described as discrete events on an asexual phylogeny, similar to how rare mutations are treated in standard phylogenetic inference. Popular tools, such as ClonalFrame and its offshoots, have attempted to identify individual recombination events starting from these assumptions. But the main conclusion of both our linkage and SNP block analysis is that the ClonalFrame assumptions do not hold for our data. Under a clonal frame, the SNP blocks we observe should be perfectly linked, similar to mutations on an asexual tree. But our results in Fig. 7D show the opposite. Part of the issue may have been that in our original presentation, we only briefly discuss the results of our linkage analysis and refer readers to the Appendix for more details. To fix this issue we have added an extra figure (Fig. 2), showing rapid linkage decrease in both species and that at long distances the linkage values are essentially identical to the unlinked case, similar to sexual populations. We hope that this change will help clarify this point.

      The manuscript closes without a cartoon (Figure 4) which outlines the broad evolutionary scenario supported by the data and analysis. I agree with the overall picture, but I do think that some of the temporal ordering of events, especially the timing of recombination events could be better supported by data. In particular, is there evidence that inter-species recombination events are increasing or decreasing over time? Are they currently at steady-state? This would help clarify whether a newly arrived species into the caldera experiences an initial burst of accepting DNA from already-present species (perhaps involving locally adaptive alleles), or whether recombination events are relatively constant over time.

      The reviewer raises some very interesting questions about the dynamics of recombination in the population, which we hope to pursue in future work. We have added this as an open question in the Discussion (Lines 365-382).

      These questions could be answered by counting recombination events that occur deeper or more recently in a phylogenetic tree.

      The reviewer here seems to presuppose that recombination is rare enough that a phylogenetic tree can reliably be inferred, which is contrary to our linkage analysis (see the response to an earlier comment). Perhaps the reviewer missed this point in our original manuscript since it was discussed primarily in the Appendix. See also our response to a previous comment by the reviewer.

      The cartoon also shows a 'purple' species that is initially present, then donates some DNA to the 'blue' species before going extinct. In this model, 'purple' DNA should also be donated to the more recently arrived 'orange' species, in proportion to its frequency in the 'blue' genome. This is a relatively subtle detail, but it could be tested in the real data, and this may actually help discern the order of the inferred recombination events.

      We have included an extra figure in the main text (Fig. 6) that addresses the question of timing of events. A quantitative test of our cartoon model along the lines the reviewer suggested would certainly be worthwhile and we hope to do that in future work.  

      The abstract also makes a bold claim that is not well-supported by the data: "This widespread mixing is contrary to the prevailing view that ecological barriers can maintain cohesive bacterial species..." In fact, the two species are cohesive in the sense that they are identifiable based on clustering of genome-wide genetic diversity (as shown in Fig 1A). I agree that the mixing is 'widespread' in the sense that it occurs across the genome (as shown in Figure 2A) but it is clearly not sufficient to erode species boundaries. So I believe the data is consistent with a Biological Species Concept (sensu Bobay & Ochman, Genome Biology & Evolution 2017) that remains 'fuzzy' - such that there are still inter-species recombination events, just not sufficient to erode the cohesion of genomic clusters. Therefore, I think the data supports the emerging picture of most bacteria abiding by some version of a BSC, and is not particularly 'contrary' to the prevailing view.

      We have revised the phrase mentioned by the reviewer to “prevent genetic mixture between bacterial species,” which more accurately represents our conclusions. 

      The final Results paragraph begins by posing a question about epistatic interactions, but fails to provide a definitive answer to the extent of epistasis in these genomes. Quantifying epistatic effects in bacterial genomes is certainly of interest, but might be beyond the scope of this paper. This could be a Discussion point rather than an underdeveloped section of the Results.

      We agree with the reviewer that an exhaustive analysis of epistasis in the population is beyond the scope of the manuscript. Our original intention was to answer whether SNP blocks we discovered showed evidence of strong linkage, as might be expected if only a small number of strains are present in the population. In light of the previous comments by the reviewer regarding the consistency with the clonal frame hypothesis, we believe this is especially relevant for our results. Moreover, the results we found‑especially for the beta population‑were quite conclusive: SNP block linkages in beta are indistinguishable from an unlinked model. To avoid misdirecting the reader about the significance of our results, we have revised the relevant paragraph (Lines 316-319).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Although I am entirely convinced of the validity of the results, methodology, and interpretations presented in this work, I must say I found the paper very hard to read. And I think I am really quite familiar with these kinds of approaches. I fear that for people other than experts on these kinds of comparative genomic analyses, this paper will be almost impossible to read. With the aim of expanding the audience for this compelling work, I think the authors might want to consider ways to improve the presentation.

      At the end of a long project, the obtained results typically form a web of mutual interconnections and dependencies and one of the key challenges in presenting the results in a paper is having to untangle this web of connected results and analysis into a linear ordered narrative so that, at any point in the narrative, understanding the next point only depends on previous points in the narrative. I frankly feel that this paper fails at this.

      The paper reads to me as if one author put together the supplement by essentially writing a report of all the analyses that were done together with supplementary figures summarizing all those analyses, and that another author then wrote the main text by using the materials in the supplement almost in the way a cook uses ingredients for a dish. Almost every other sentence in the main text refers to results in the (31!) supplementary figures and can only be understood by reading the appropriate corresponding sections in the supplementary materials. I found it essentially impossible to read the main text without having first read the entire 50-page supplement.

      I think the paper could be hugely improved by trying to restructure the presentation so as to make it more linear. The main text can be expanded to include a summary of the crucial methods and analysis results from the supplement needed to understand the narrative in the main text. For example, as it currently stands it is really challenging to understand what is shown in figures 2 and 3 of the main text without having to first read a very substantial part of the supplement. Figure 3, even after having read the relevant sections in the supplement, took me quite a while to understand and almost felt like a puzzle to decypher. Rethinking which parts of the supplement are really necessary would also help. Finally, it would also help if the terminology was kept as simple, transparent, and consistent as possible.

      I understand that my suggestion to thoroughly reorganize the presentation may feel like a big hassle, but I am afraid that in its current form, these important results are essentially rendered inaccessible to all but a small group of experts in this area. This paper deserves a wider readership.

      We thank the reviewer for these valuable suggestions. In the revision, we have significantly expanded and restructured the “Results” section to make the presentation more linear, as the reviewer suggested (see our reply to the public comment by the reviewer for details). We hope these changes will make the manuscript easier to read.

      Reviewer #2 (Recommendations For The Authors):

      I found this paper challenging to follow since the main text was so condensed and the supplementary material so extensive. Given that eLife does not impose strong limits on the length of the main text, I suggest moving some key sections from the supplement into the main text to make it easier for the reader to follow rather than flipping back and forth. Adding to the confusion, supplementary figures were referenced out of order in the main text (e.g. S23 is referenced before S1). Please check the numbering and ensure figures are mentioned in the main text in the correct order.

      We thank the reviewer for their feedback on the presentation of the results. In response to similar comments from Reviewer #1, we have significantly expanded and restructured the “Results” section to make it easier to read (see also our responses to Reviewer #1).

      Page 2: The term 'coevolution' is typically reserved for two species that mutually impose selective pressures on one another (e.g. predator-prey interactions; see Janzen, Evolution 1980). In the context of these two cyanobacterial species, it's not clear that this is the case so I would simply refer to them 'cohabitating' or being sympatric in the same environment.

      It is true that the term "coevolution” has become associated with predator-prey interactions, as the reviewer said. However, we feel that in our case “coevolution” fairly accurately describes the continual hybridization over long time scales we observe. We have therefore chosen to keep the term.

      Page 3: The authors mention that the gamma SAG is ~70% complete, which turns out to be quite high. It would be useful to mention early in the Results the mean/median completeness across SAGs, and how this leads to some challenges in analysing the data. Some of the material from the Supplement could be moved into the Results here.

      We have added a short note on the completeness in the Results (Lines 153-154). We have also added an extra figure in Appendix 1 with the completeness of all the SAGs for interested readers.

      I was left puzzled by the sentence: "Alternatively, high rates of recombination could generate different genotypes within each genome cluster that are adapted to different temperatures, with the relative frequencies of each cluster being only a correlated and not a causal driver of temperature adaptation." This is suggesting that individual genes or alleles, rather than entire genomes, could be adapted to temperature. But figure 1B seems to imply that the entire genome is adapted to different temperatures. Anyway, this does not seem to be a key point and could probably be removed (or clarified if the authors deem this an important point, which I failed to understand).

      We have revised this section to clarify the alternative hypothesis mentioned by the reviewer (Lines 100-103).

      Page 4. 'Several dozen' hybrid genes were found, but please also specify how many genes were tested. In general, it would be good to briefly outline the sample size (SAGs or genes) considered for each analysis.

      We have added the total numbers of genes we analyzed at each step of our analysis.

      'Mosaic hybrid loci' are mentioned alongside the issue of poor alignment. Presumably, the mosaic hybrid loci are first filtered to remove the poor alignments? This should be specified, and please mention how many loci are retained before/after this filter.

      We thank the reviewer for highlighting this important point. In the revision, we have implemented a more aggressive filtering of genes with poor alignments. We have added an extra paragraph to Appendix 1 (step 5 in the pipeline analysis) briefly explaining the issue.

      Page 5. "By contrast, the diversity of mosaic loci was typical of other loci within beta, suggesting most of the beta genome has undergone hybridization." Please point to the data (figure) to support this statement.

      We have restructured our discussion of the different hybrid loci so this comment is no longer relevant. In case the reviewer is interested, the synonymous diversity within beta was 0.047, while in mosaic hybrids it was 0.064.

      Page 6. "The largest diversity trough contained 28 genes." Since this trough is discussed in detail and seems to be of interest, it would be nice to illustrate it, perhaps as an inset in Figure 2 or as a separate figure. If I understood correctly, this trough includes genes (in a nitrogen-fixation pathway) that are present in all genomes, but are exchanged by homologous recombination. So I don't think it's correct to say that the "ancestors acquired the ability to fix nitrogen." Rather, the different alleles of these same genes were present in the ancestor. So perhaps there was a selective sweep involving alleles in this region that provided adaptation to local nitrogen sources or concentrations, but not a gain of new genes. Perhaps I misunderstood, in which case clarification would be appreciated.

      The reviewer raises an interesting possibility. We agree that it is in principle possible that the ancestor contained the nitrogen fixation genes and the selective sweep simply replaced the ancestral alleles. In this particular case, there is additional evidence that the entire pathway was acquired around roughly the same time from gene order. The gene order between alpha and beta is almost entirely different, with only a few segments containing more than 2-3 genes in the same order, as shown by Bhaya et al. 2007 and confirmed by additional unpublished analysis of the SAGs. One of the few exceptions is the nitrogen fixation pathway, which has essentially the same gene order over more than 20 kbp. Thus, if the ancestor of both alpha and beta contained the nitrogen-fixation pathway, we would expect these genes to be scatter across the genome. We have revised the sentences in question to clarify this point (Lines 260-271).

      Page 6. Last paragraph on epistasis references Fig 3C, but I believe it should be Fig 3D.

      Fixed.

      Page 7. Figure 3 legend. "Note that alpha-2 is identical to gamma here." I believe it should be beta, not gamma.

      The reviewer is correct. We have fixed this error.

      Page 8. What is the evidence for "at least six independent colonizers"? I could not find the data supporting this claim.

      The statement mentioned by the reviewer was based on the maximum number of species clusters we identified in different core genes. However, during the revision, we found that only a handful of genes contained five or more clusters. We did find several tens of genes with four clusters. In addition, Rosen et al. (2018) also found additional 16S clusters at low frequency in the same springs. Based on these results we conservatively estimate that at least four independent strains colonized the caldera, but the number could be much greater. We have revised the text in question accordingly (Lines 336-339) and added Fig. 2 in Appendix 1 to support the conclusion.

      Page 9. Line 200: "acting to homogenize the population." It should be specified that the population is only homogenized at these introgressed loci, not genome-wide. Otherwise, the genome-wide species clusters seen in Fig 1 would not be maintained.

      It is true that the selective sweeps that lead to diversity throughs only homogenize the introgressed loci. But other hybrid segments could also rise to high frequency in the population during the sweep through hitchhiking. The fact that we observe SNP blocks generated through secondary recombination events of introgressed segments throughout the genome supports this view. While we do not fully understand the dynamics of this process currently, we do feel that the current evidence supports the statement that mixing is occurring throughout the genome and not just at a few loci so we have kept the original statement.

      The final sentence (lines 221-222) is vague and uninformative. On the one hand, "investigating whether hybridization plays a major role" is what the current manuscript has already done - depending on what is meant by 'major' (how much of the genome? Or whether there are ecological implications?). It is also not clear what is meant by a predictive theory and 'possible evolutionary scenarios. This should be elaborated upon, otherwise, it is not clear what the authors mean. Otherwise, this sentence could be cut.

      We thank the reviewer for their feedback. One possible source of confusion could be that in this sentence we were referring to detecting hybridization in other communities. We have changed “these communities” to “other communities” to make this clearer.

      Supplement.

      Broadly speaking, I appreciate the thorough and careful analysis of the single cell data. On the other hand, it is hard to evaluate whether these custom analyses are doing what is intended in many cases. Would it be possible to consider an analysis using more established methods, e.g. taking a subset of genomes with 'good' completeness and using Panaroo to find the core and accessory genome, then ClonalFrameML or Gubbins to infer a phylogeny and recombination events? Such analyses could probably be applied to a subset of the sample with relatively complete genomes. I don't want to suggest an overly time-consuming analysis, but the authors could consider what would be feasible.

      We have added a comparison between our analysis and that from two other methods, including ClonalFrameML mentioned by the author. One important point that we feel might have been lost in the first version is that our linkage results imply that recombination is not rare such that it can be mapped onto an asexual tree as assumed by ClonalFrameML. Note that this is not simply due to technical limitations due to incomplete coverage and is instead a consequence of the evolutionary dynamics of the population. Consistent with this, we found several inconsistencies in how recombination events were assigned by ClonalFrameML. We have summarized these conclusions in Appendix 7 of the revised manuscript.

      Page 8. Line 190. What is meant by 'minimal compositional bias'?

      We mean that the sample is not biased towards strains that grow in the lab. We have revised the sentence to clarify.

      Page 25. Figure S14 is not referenced in the text.

      We have added part of this figure to the main text since it illustrates one of our main results, namely that sites at long genomic distances are essentially unlinked.

      Page 26. The 'unlinked controls' (line 530) are very useful, but it would be even more informative to see if these controls also show the same decline in linkage with distance in the genome as observed in the real data. In particular, it would be good to know if the observed rapid decline in linkage with distance in the low-diversity regions is also observed in controls. Currently, it is unclear if this observation might be due to higher uncertainty in inferring linkage in low-diversity regions, which by definition have less polymorphism to include in the linkage calculation.

      We thank the reviewer for the suggestion. After further consideration, we have decided to remove the subsection on linkage decrease in the low-diversity regions. We feel such detailed quantitative analysis would be better suited for a more technical paper, which we hope to do at a later time.

      Page 26. There are some sections with missing identifiers (Sec ??).

      Fixed.

      Page 27. The information about the typical breadth of SAG coverage (~30%) would be better to include earlier in the Supplement, and also mentioned in the main text so the reader can more easily understand the nature of the dataset.

      We have added an extra figure with the SAG coverages to Appendix 1.

      Page 29. Any sensitivity analysis around the S = 0.9 value? Even if arbitrary, could the authors provide justification why they think this value is reasonable?

      We have significantly revised this section in response to earlier comments by one of the reviewers. We hope that this would clarify the details of our methods to interested readers. To answer the reviewer’s specific question, we chose this heuristic after examining the fraction of cells of each species in different species clusters. For the clusters assigned to alpha and beta, we found a sharp peak near one and that a cutoff of 0.9 captured most clusters while still being high enough to inconsistent with a mixed cluster.

      Page 30. I could not see where Fig. S17 was mentioned in the text. Also, how are 'simple hybrid genes' defined?

      We have removed this figure in the revision. The definition of the different types of hybrid genes have been added to the main text in response to a comment from the other reviewer.

      Page 36. It is hard to see that divergence is 'high' relative to what reference. Would it be possible to include the expected value (from ref. 12) in the plot, or at least explicitly mentioned in the text?

      We have added the mean synonymous and non-synonymous divergences between alpha and beta to the figures for reference.

      Page 38. Line 770 "would be comparable to that of beta." This is not necessarily the case since beta could have a different time to its most recent common ancestor. It could have a different time to the last bottleneck or selective sweep, etc.

      We thank the reviewer for pointing out this misleading statement. Our point here was that in the first scenario the TMRCA of alpha and beta would be similar since the diversity in the high-diversity alpha genes is similar to beta. We have clarified this statement in the revision.

      Page 39. Line 793. The use of the term 'genomic backbone' implies the presence of a clonal frame, which is not what the data seems to support. Perhaps another term such as 'genetic diversity' would more appropriately capture the intended meaning here.

      We agree with the reviewer that the low-diversity regions may not be asexual. We used “genomic backbone” to distinguish from the “clonal frame,” which is usually used to mean that the backbone is asexual. We have added a note in the revision to clarify this point.

      Page 39. Lines 802-805. I found this explanation hard to follow. Could the logic be clarified?

      We simply meant that although the beta distribution is unimodal, it is not consistent with a simple Poisson distribution, unlike in alpha. We have added an extra sentence to clarify this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As it had been nearly two years since I last saw the manuscript, I reread the full text to familiarise myself again with the findings presented. While I appreciate the changes made and think they have strengthened the manuscript, I still find parts of it a bit too speculative or hyperbolic. In particular, I think claims of evolutionary acceleration and adaptation require more careful integration with existing human/chimpanzee genetics and functional genomics literature.

      We thank the reviewer heartfully for the great patience and valuable comments, which have helped us further improve the manuscript. Before responding to comments point by point, we provide a summary here.

      (1) On parameters and cutoffs.

      Parameters and cutoffs influence data analysis. The large number of Supplementary Notes, Supplementary Figures, and Supplementary Tables indicates that we paid great attention to the influence of parameters and robustness of analyses. Specifically, here we explain the DBS sequence distance cutoff of 0.034, which determines the top 20% genes that most differentiate humans from chimpanzees and influences the gene set enrichment analysis (Figure 2). As described in the revised manuscript, we estimated this cutoff based on Song et al., verified its rationality based on Prufer et al. (Song et al. 2021; Prufer et al. 2017), and measured its influence by examining slightly different cutoff values (e.g., 0.035).

      (2) Analyses of HS TFs and HS TF DBSs.

      It is desirable to compare the contribution of HS lncRNAs and HS TFs to human evolution. Identifying HS TFs faces the challenges that different institutions (e.g., NCBI and Ensembl) annotate orthologous genes using different criteria, and that multiple human TF lists have been published by different research groups. Recently, Kirilenko et al. identified orthologous genes in hundreds of placental mammals and birds and organized different types of genes into datasets of parewise comparison (e.g., hg38-panTro6) using humans and mice as references (Kirilenko et al. Integrating gene annotation with orthology inference at scale. Science 2023). Based on (a) the many2zero and one2zero gene lists in the “hg38-panTro6” dataset, (b) three human TF lists reported by two studies (Bahram et al. 2015; Lambert et al. 2018) and used in the SCENIC package, we identified HS TFs. The number of HS TFs and HS lncRNAs (5 vs 66) alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between <many2zero + one2zero> and the three <human TF list>).

      TF DBS (i.e., TFBS) prediction has also been challenging because they are very short (mostly about 10 bp) and TF-DNA binding involves many cofactors (Bianchi et al. Zincore, an atypical coregulator, binds zinc finger transcription factors to control gene expression. Science 2025). We used two TF DBS prediction programs to predict HS TF DBSs, including the well-established FIMO program (whose results have been incorporated into the JASPAR database) (Rauluseviciute et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles Open Access. NAR 2023) and the recently reported CellOracle program (Kamimoto et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 2023). Then, we performed downstream analyses and obtained two major results. One is that on average (per base), fewer selection signals are detected in HS TF DBSs (anyway, caution is needed because TF DBSs are very short); the other is that HS TFs and HS lncRNAs contribute to human evolution in quite different ways (Supplementary Figs. 25 and 26).

      (3) On genes with more transcripts may appear as spurious targets of HS lncRNAs.

      Now, the results of HS TF DBSs allow us to address the question of whether genes with more transcripts may appear as spurious targets of HS lncRNAs. We note that (a) we predicted HS lncRNA DBSs and HS TF DBSs in the same promoter regions before the same 179128 Ensembl-annotated transcripts (release 79), (b) we used the same GTEx transcript expression matrices in the analyses of HS TF DBSs and HS lncRNA DBSs (the GTEx database includes gene expression matrices and transcript expression matrices, the latter includes multiple transcripts of a gene). Thus, the analyses of HS TF DBSs provide an effective control for examining the question of whether genes with more transcripts may appear as spurious targets of HS lncRNAs, and consequently, cause the high percentages of HS lncRNA-target transcript pairs that show correlated expression in the brain (Figure 3). We find that the percentages of HS TF-target transcript pairs that show correlated expression are also high in the brain, but the whole profile in GTEx tissues is significantly different from that of HS lncRNA DBSs (Figure 3A; Supplementary Figure 25). On the other hand, on the distribution of significantly changed DBSs in GTEx tissues, the difference between HS lncRNA DBSs and HS TF DBSs is more apparent (Figure 3B; Supplementary Figure 26). Together, these suggest that the brain-enriched distribution of co-expressed HS lncRNA-target transcript pairs must arise from HS lncRNA-mediated transcriptional regulation rather than from the transcript number difference.

      (4) Additional notes on HS TFs and HS TF DBSs.

      First, the “many2zero” and “one2zero” gene lists in the “hg38-panTro6” dataset of Kirilenko et al. provide the most update, but not most complete, data on human-specific genes because “hg38-panTro6” is a pairwise comparison. On the other hand, the Ensembl database also annotates orthologous genes, but lacks such pairwise comparisons as “hg38-panTro6”. Therefore, not all HS genes based on “hg38-panTro6” agree with orthologous genes in the Ensembl database. Second, if HS genes are identified based on both Ensembl and Kirilenko et al., HS TFs will be fewer.

      (5) On speculative or hyperbolic claims.

      First, the title “Human-specific lncRNAs contributed critically to human evolution by distinctly regulating gene expression” is now further supported by HS TF DBSs analyses. Second, we have carefully revised the entire manuscript, trying to make it more readable, accurate, logically reasonable, and biologically acceptable. Third, specifically, in the revision, we avoid speculative or hyperbolic claims in results, interpretations, and discussions as possible as we can. This includes the tone-down of statements and claims, for example, using “reshape” to replace “rewire” and using “suggest” to replace “indicate”. Since the revisions are pervasive, we do not mark all of them, except those that are directly relevant to the reviewer’s comments.

      (1) Line 155: "About 5% of genes have significant sequence differences in humans and chimpanzees," This statement needs a citation, and a definition of what is meant by 'significant', especially as multiple lines below instead mention how it's not clear how many differences matter, or which of them, etc.

      Different studies give different estimates, from 1.24% (Ebersberger et al. Genomewide Comparison of DNA Sequences between Humans and Chimpanzees. Am J Hum Genet. 2002) to 5% (Britten RJ. Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. PNAS 2002). The 5% for significant gene sequence differences arises when considering a broader range of genetic variations, particularly insertions and deletions of genetic material (indels). To provide more accurate information, we have replaced this simple statement with a more comprehensive one and cited the above two papers.

      (2) line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      Here, we actually made an analogy but not an inference; therefore, we used such words as “suggesting” and “similar” instead of using more confirmatory words. We have revised the latter half sentence, saying “raising the possibility that these sequences have evolved considerably during human evolution”.

      (3) line 210: "Based on a recent study that identified 5,984 genes differentially expressed between human-only and chimpanzee-only iPSC lines (Song et al., 2021), we estimated that the top 20% (4248) genes in chimpanzees may well characterize the human-chimpanzee differences". I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below. I also find that my previous concerns with the very disparate numbers of results across the three archaics have not been suitably addressed.

      (1) Indeed, “we estimated that the top 20% (4248) genes in chimpanzees may well characterize the human-chimpanzee differences” is an improper claim; we made this mistake due to the flawed use of English.

      (2) What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human–chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzeees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzeees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377.

      (3) If we chose DBS sequence distance cutoff=0.033 or 0.035, slightly more or fewer genes would be determined, raising the question of whether they would significantly influence the downstream gene set enrichment analysis (Figure 2). We found that 91 genes have a DBS sequence distance of 0.034. Thus, if cutoff=0.035, 4248-91=4157 genes were determined, and the influence on gene set enrichment analysis was very limited.

      (4) On the disparate numbers of results across the three archaics. Figure 1A is based on Figure 2 in Prufer et al. 2017. At first glance, our Figure 1A indicates that Altai Neanderthal is older than Denisovan (upon kya), making our result “identified 1256, 2514, and 134 genes in Altai Neanderthals, Denisovans, and Vindija Neanderthals” unreasonable. However, Prufer et al. (2017) reported that “It has been suggested that Denisovans received gene flow from a hominin lineage that diverged prior to the common ancestor of modern humans, Neandertals, and Denisovans……In agreement with these studies, we find that the Denisovan genome carries fewer derived alleles that are fixed in Africans, and thus tend to be older, than the Altai Neandertal genome”. This note by Prufer et al. provides an explanation for our result, which is that more genes with large DBS sequence distances were identified in Denisovans than in Altai Neanderthals. Of course, the 1256, 2514, and 134 depend on the cutoff of 0.034. If cutoff=0.035, these numbers change slightly, but their relationships remain (i.e., more genes in Denisovans). We examined multiple cutoff values and found that more genes in Denisovans have large DBS sequence distances than in Altai Neanderthals.

      (4) I also think that there is still too much of a tendency to assume that adaptive evolutionary change is the only driving force behind the observed results in the results. As I've stated before, I do not doubt that lncRNAs contribute in some way to evolutionary divergence between these species, as do other gene regulatory mechanisms; the manuscript leans down on it being the sole, or primary force, however, and that requires much stronger supporting evidence. Examples include, but are not limited to:

      (1) Indeed, the observed results are also caused by other genomic elements and mechanisms (but it is hardly feasible to identify and differentiate them in a single study), and we do not assume that adaptive evolutionary change is the only driving force. Careful revisions have been made to avoid leaving readers the impression that we have this tendency or hold the simple assumption.

      (2) Comparing HS lncRNAs to HS TFs is critical, and we have done this.

      (5) line 230: "These results reveal when and how HS lncRNA-mediated epigenetic regulation influences human evolution." This statement is too speculative.

      We have toned down the statement, just saying “These results provide valuable insights into when and how HS lncRNA-mediated epigenetic regulation impacts human evolution”.

      Line 268: "yet the overall results agree well with features of human evolution." What does this mean? This section is too short and unclear.

      (1) First, the sentence “Selection signals in YRI may be underestimated due to fewer samples and smaller sample sizes (than CEU and CHB), yet the overall results agree well with features of human evolution” has been deleted, because CEU, CHB, and YRI samples are comparable (100, 99, and 97, respectively).

      (2) Now the sentence has been changed to “These results agree well with findings reported in previous studies, including that fewer selection signals are detected in YRI (Sabeti et al., 2007; Voight et al., 2006)”.

      (3) On “This section is too short and unclear” - To make the manuscript more readable, we adopt short sections instead of long ones. This section expresses that (a) our finding that more selection signals were detected in CEU and CHB than in YRI agrees with well-established findings (Voight et al. A Map of Recent Positive Selection in the Human Genome. PLoS Biology 2006; Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 2007), (b) in considerable DBSs, selection signals were detected by multiple tests.

      Line 325: "and form 198876 HS lncRNA-DBS pairs with target transcripts in all tissues." This has not been shown in this paper - sequence based analyses simply identify the “potential” to form pairs.

      This section describes transcriptomic analysis using the GTEx data. Indeed, target transcripts of HS lncRNAs are results of sequence-based analysis, and a predicted target is not necessarily regulated by the HS lncRNA in a tissue. Here, “pair” means a pair of HS lncRNA-target transcript whose expression shows significant Pearson correlation in a GTEx tissue (by the way, we do not mean correlation equals regulation; actually, we identified HS lncRNA-mediated transcriptional regulation upon both DBS-targeting relationship and correlation relationship).

      Line 423: "Our analyses of these lncRNAs, DBSs, and target genes, including their evolution and interaction, indicate that HS lncRNAs have greatly promoted human evolution by distinctly rewiring gene expression." I do not agree that this conclusion is supported by the findings presented - this would require significant additional evidence in the form of orthogonal datasets.

      (1) As mentioned above, we have used “reshape” to replace “rewire” and used “suggest” to replace “indicate”. In addition, we have substantially revised the Discussion, in which this sentence is replaced by “our results suggest that HS lncRNAs have greatly reshaped (or even rewired) gene expression in humans”.

      (2) Multiple citations have been added, including Voight et al. 2006 (Voight et al. A Map of Recent Positive Selection in the Human Genome. PLoS Biology 2006) and Sabeti et al. 2007 (Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 2007).

      (3) We have analyzed HS TF DBSs, and the obtained results also support the critical contribution of HS lncRNAs.

      I also return briefly to some of my comments before, in particular on the confounding effects of gene length and transcript/isoform number. In their rebuttal the authors argued that there was no need to control for this, but this does in fact matter. A gene with 10 transcripts that differ in the 5' end has 10 times as many chances of having a DBS than a gene with only 1 transcript, or a gene with 10 transcripts but a single annotated TSS. When the analyses are then performed at the gene level, without taking into account the number of transcripts, this could introduce a bias towards genes with more annotated isoforms. Similarly, line 246 focuses on genes with "SNP numbers in CEU, CHB, YRI are 5 times larger than the average." Is this controlled for length of the DBS? All else being equal a longer DBS will have more SNPs than a shorter one. It is therefore not surprising that the same genes that were highlighted above as having 'strong' DBS, where strength is impacted by length, show up here too.

      (1) In gene set enrichment analysis (Figure 2, which is a gene-level analysis), when determining genes differentiating humans from chimpanzees based on DBS sequence distance, if a gene has multiple transcripts/DBSs, we choose the DBS with the largest distance. That is, the input to g:Profiler is a non-redundant gene list.

      (2) In GTEx data analysis (Figure 3, which is a transcriptome-level analysis), the analyses of HS TF DBSs using the GTEx data provide evidence suggesting that different DBS/transcript numbers of genes are unlikely to cause confounding effects. As explained above, we predicted HS TF DBSs in the same promoter regions of 179128 Ensembl-annotated transcripts (release 79), but Supplementary Figures 25 and 26 are distinctly different from Figure 3AB.

      (3) In evolutionary analysis, a gene with 10 DBSs has a higher chance of having selection signals than a gene with 1 DBS. This is biologically plausible, because many conserved genes have novel transcripts whose expression is species-, tissue-, or developmental period-specific, and DBSs before these novel transcripts may differ from DBSs before conserved transcripts.

      (4) “line 246 focuses on genes with "SNP numbers in CEU, CHB, YRI are 5 times larger than the average." Is this controlled for the length of the DBS?” - This is a defect. We have now computed SNP numbers per base and used the new table to replace the old Supplementary Table 8. After examining the new table, we find that the major results of SNP analysis remain.

      (5) On “Is this controlled for length of the DBS? All else being equal a longer DBS will have more SNPs than a shorter one” - We do not think there are reasons to control for the length of DBSs; also, what “All else being equal” means matters. First, DBS sequences have specific features; thus, the feature of a long DBS is stronger than the feature of a short one, making a long DBS less likely to be generated by chance in the genome and less likely to be predicted wrongly than a short one. This means that longer DBSs are less likely to be false ones (note our explanation that the chance of a DBS of 147 bp, the mean length of DBSs, to be wrongly predicted is extremely low, p<8.2e-19 to 1.5e-48). Second, the difference in length suggests a difference in binding affinity, which in turn influences the regulation of the specific transcripts and influences the analysis of GTEx data. Third, it cannot be excluded that some SNPs may be selection signals (detecting selection signal is challenging, and many selection signals cannot be detected by statistical tests, see Grossman et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 2010).

      (6) On “It is therefore not surprising that the same genes that were highlighted above as having 'strong' DBS, where strength is impacted by length” - Indeed, strength is influenced by length, see the above response.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Finally, figure 1 panels D and F are not legible - the font is tiny! There's also a typo in panel A, where "Homo Sapien" should be "Homo sapiens".

      (1) “Homo sapien” is changed to “Homo sapiens”.

      (2) Even if we double the font size, they are still too small. Inserting a very large panel D into Figure 1 will make Figure 1 ugly, and converting Figure 1D into an independent figure is unnecessary. Actually, panels 1D and F are illustrative figures; the full Fig.1D is Supplementary Figure 6, and the full Fig.1F is Figure 3. We have revised Fig.1’s legend to explain these.

    1. Author response:

      We would like to express our gratitude to all three reviewers for their time and valuable feedback on the manuscript. Below, we provide our point-by-point responses to their comments. Additionally, we summarize here the experiments we plan to conduct in accordance with the reviewers' suggestions:

      Revision plan 1. To further explore the mechanisms of Notch signaling in the decision of regional EE pattern.

      Our observation of EE subtype changes in Notch mutant clones revealed that Notch is required for the specification of Type II EEs, but whether it promotes the generation of Type III EEs is not quite clear. In this revision, we will complete the quantification of Type I and Type III EEs in Notch mutant clones to demonstrate whether Notch signaling participate the determination of these two EE subtypes. Further, we will attempt to combine Notch mutant with different manipulation of WNT and BMP gradients to investigate their interplays.

      Revision plan 2. To supplement the global pattern of WNT and BMP gradient along the whole gut.

      The levels of WNT and BMP gradients are variable in different gut regions both under normal condition and genetic manipulation, leading to different outcomes of EE subtype composition. To further support our model, we will supply the changes of WNT and BMP gradients along the whole gut after genetic manipulation, and perform semi-quantification of their levels to correlate with EE subtype compositions. Additionally, we will also test the gradient levels at different time point during pupal stage to interpret the establishment of regional identity during the development.

      Revision plan 3. To investigate the involvement of apical-basal polarity in the determination of regional EE diversity.

      Although we have demonstrated WNT and BMP gradients orchestrate the regional EE identity, but some observations cannot be fully explained by their roles, such as asymmetric expression of CCHa2 in EE pairs from R4b. A potential mechanism is apical-basal polarity, which has been reported to determine cell fate of ISC progenies at pupal stage. We will specifically knockdown or overexpress key genes related to apical-basal polarity in ISCs or EEs to test whether they are involved preliminarily.

      Please find our detailed point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      This valuable study explores the regulatory mechanisms underlying the regional distribution of enteroendocrine cell subtypes in the Drosophila midgut. The regional distribution of EE cell subtypes is carefully documented, and the data convincingly show that each EE cell subtype has a unique spatial pattern. The study aims at determining how the spatial distribution of EE cell subtypes is established and maintained, and explores the roles of three pathways: Notch, WNT, and BMP. The data show evidence that Notch signaling regulates the subtype specificity, being necessary for the specification of Type II, but not Type I and III EE cell subtype specification. The immunofluorescence data in Figure 3 are convincing, but the analysis is incomplete due to a lack of quantification. How Notch signaling activity relates to the emergence of the regional EE cell patterns remains unclear.

      Indeed, the role of Notch signaling in regional EE determination was not fully characterized in this work. As the requirement of Notch activation for the differentiation of enterocytes, introduction of Notch or Delta mutant led to rapid accumulation of ISCs and EEs, making it being a challenge to dive into the details of how EE subtypes were generated. We will try to complete the quantification of Type I and Type III EEs in the Notch mutant clones from different gut regions to figure out whether Notch could influence the specification of these two EE subtypes. Additionally, different from WNT and BMP gradients, Notch signaling can only function locally and is not significantly changed along the whole gut, including Type II EE-enriched R1a and Type I EE-enriched R4b, which implies that function of Notch signaling may can be overridden by the impact of specific combination of WNT and BMP gradients. To test this hypothesis, we will attempt to combine Notch mutant with the activation or inhibition of WNT and BMP signaling since pupal stage, and further examine whether the regional EE identity could be altered, especially in R1a and R4b regions.

      As WNT and BMP are known as morphogens, the study explores their expression patterns and their roles in establishing and maintaining the subtype identities. The observed patterns of WNT and BMP are consistent with earlier studies. Manipulation of WNT and BMP pathway activities in intestinal stem cells is shown to have some region-specific effects on specific EE cell subtypes. The overall conclusion that both WNT and BMP have local effects on EE cell subtypes is based on solid evidence. However, the study falls short in achieving its main objective, i.e., to explain the regional subtype patterns by the action of WNT and BMP gradients. Despite displaying a large volume of phenotypic data in Figures 4-7, the study remains incomplete in providing sufficient evidence to support the models shown in Figures 7 M and N. The main challenge is that the reader is provided with a large volume of individual data fragments of selected regions (e.g., Figures 4 and 5) or images of whole midgut without proper quantification (Figure 7). There is not sufficient effort made to display the data in a way that allows observing changes in the global patterns of EE cell subtypes throughout the midgut and compare these patterns with the observed WNT and BMP gradients.

      As the variation of WNT and BMP gradients along the whole gut, manipulating these two pathways is not able to align their activation levels in different gut regions. This forced us to analyze the change of each region separately, making it to be a challenge to provide a comprehensive global overview. We will supplement the comprehensive profile of WNT and BMP activity under the manipulation of these two signaling pathways to correlated with the change of EE identity, and also try to perform a semi-quantitative interpretation to further support the model in Figure 7M and 7N.

      Reviewer #2 (Public review):

      Summary:

      By labeling the three major enteroendocrine cell markers - AstC, Tk, and CCHa2-the authors systematically investigated the distribution of distinct EE subtypes along the Drosophila midgut, as well as their emergence via symmetric and asymmetric divisions of enteroendocrine progenitor cells. Moreover, they dissected the molecular mechanisms underlying regional patterning by modulating Wnt and BMP signaling pathways, revealing that these compartment boundary signals play key roles in regulating EE subtype diversity.

      Strengths:

      This work establishes a solid methodological and conceptual foundation for future studies on how stem cells acquire positional identity and modulate region-specific behaviors.

      Weaknesses:

      Given that the transcriptional profiles of intestinal stem cells across different regions are highly similar, it is reasonable to hypothesize that the behavior of ISCs and enteroendocrine precursor cells may be regulated non-autonomously, potentially through interactions with enterocytes, which exhibit more distinct region-specific characteristics.

      This is a quite complicated point to discuss. Drosophila adult midgut is established by pISCs (pupal ISCs), which arise from AMPs (adult midgut progenitors) in larval midgut. AMPs are encased by PCs (peripheral cells) to be islands, scattered throughout the entire larval midgut by mid L3 stage (Mathur D. et al. Science. 2010). After pupariation, larval midgut is delaminated to become the yellow body and finally meconium in the pupal midgut. Simultaneously, PCs break down and die, allowing AMPs to give rise to the presumptive adult epithelium (generating enterocyte precursors) and the specification of ISCs in the adult midgut (Jiang H, Edgar BA. Development. 2009; Micchelli CA. et al. Gene Expr Patterns. 2011). During the pupal stage, pISCs only proliferate to generate new ISCs and EE lineages, while adult enterocytes start to appear after eclosion (Takashima S. et al. Dev Biol. 2011). This rules out the possibility that the interaction with enterocytes regulates regional ISC identity during pupal stage.

      However, whether AMPs already acquire the regional identity during larval stage, and whether pISCs interact with enterocyte precursors at pupal stage, are not quite clear. Our study revealed that pISCs can be influenced by WNT and BMP gradients to acquire regional identity, and further establish regional EE diversity. The change of WNT and BMP gradients during the metamorphosis will be supplemented in revision. While WNT and BMP signaling ligands are provided by muscles and adult enterocytes, and even other surrounding tissues, to regulate regional ISC identity, which indicates that non-autonomous mechanisms indeed exist.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to elucidate the mechanisms underlying the regional patterning of enteroendocrine cell (EE) subtypes along the Drosophila midgut. Through detailed immunohistochemical mapping and genetic perturbation of Notch, WNT, and BMP signaling pathways, they sought to determine how extrinsic morphogen gradients and intrinsic stem cell identity contribute to EE diversity.

      Strengths:

      A major strength of this work is the meticulous regional analysis of EE pairs and the use of multiple genetic tools to manipulate signaling pathways in a spatiotemporally controlled manner. The data robustly demonstrate that WNT and BMP signaling gradients play key roles in specifying EE subtypes and division modes across different gut regions.

      Weaknesses:

      However, the study does not fully explore the mechanistic basis for the region-specific dependence on Notch signaling. Additionally, while the authors propose that symmetric divisions occur in R1a and R4b, the observed heterogeneity in CCHa2 expression within AstC+ pairs in R4b suggests that asymmetric mechanisms may still be at play, possibly involving apical-basal polarity as previously reported.

      As previously mentioned, we acknowledge that the role of Notch signaling in regional EE determination remains further exploration. We will supplement the quantification of Type I and Type III EEs in Figure 3 and Figure S4, and further combine Notch mutant with activation or inhibition of WNT and BMP signaling to test whether they have any interplays, especially in R1a and R4b.

      Apical-basal polarity has been reported to determine the precise segregation of Pros to control ISC number and cell fate at the pupal stage (Wu S. et al. Cell Rep. 2023). During this time, generation of regional EEs are completed and may also be affected except for the influence of Notch, WNT and BMP pathways. Therefore, the apical-basal polarity is quite a potential mechanism to induce asymmetric cell division in R4b, which we will perform experiments to test.

      Appraisal of achievements:

      The authors successfully achieve their aims by providing a compelling model in which intercalated WNT and BMP gradients regulate EE subtype specification and EEP division modes. The genetic data strongly support the conclusion that these pathways are central to establishing regional EE diversity during pupal development.

    1. Author response:

      eLife Assessment

      This valuable study addresses the effects of selection on aggression on fitness and life-history trade-offs in Drosophila melanogaster. However, the evidence presented is incomplete and does not support the claims proposed in the study of increased survival of highly aggressive males at the expense of reproductive success and shorter mating duration. The main limitation of the study is the choice to use males from only one aggressive Drosophila line in combination with CantonS females, that do not allow disambiguation between nonaggression-related factors, such as hybrid vigor and aggression-related factors influencing mating and lifespan.

      We would like to clarify the points raised in the eLife assessment.

      The report states that we relied on a single line of hyper-aggressive males tested with CantonS females, and implies that Bully and Cs have not co-evolved. This is a misunderstanding: Bully flies were derived from Cs population. Thus, Bully and Cs have co-evolved. In addition to the Bully A line presented in the main figures of the manuscript, we replicated several of our findings with a second independent selected line, Bully B. Results from courtship assays involving both Bully A and Bully B couples males and females were presented in Figure Supp1. We apologies for not having made this more explicit in the original manuscript, which we will correct. These experiments should alleviate the concerns from the reviewers; they demonstrate that our conclusions are supported by two independent hyper-aggressive lines, and these include assays with selected male and female flies.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study asks how selection for male aggressiveness affects life-history and reproductive fitness traits in Drosophila melanogaster males.

      Strengths:

      Multiple comprehensive assays are used to address the question.

      We thank the reviewer for recognizing these strengths.

      Weaknesses:

      (1) The flies used for comparisons are inadequate. Behavioral assays compare Bully males mated to non-coevolved Cs females with Cs males mated to coevolved Cs females.

      We thank the reviewer for this comment, which made us realize that we had not sufficiently highlighted some of our experiments. The Bully lines used in our work were derived from Canton-S flies and thus did co-evolve with Cs. As originally described by Penn et al. (2010), highly aggressive “Bully” lines were generated through selective breeding from Canton-S males that consistently won aggressive encounters. After 34–37 generations, stable Bully lines were established. Thus, Bully and Cs flies have co-evolved and 2) the selection applied was male-specific. Independent selection replicates produced distinct lines, including Bully A and Bully B. Previous studies only characterized Bully A (Penn et al., 2010; Chowdhury et al., 2017), but our work includes both Bully A and Bully B (Fig. S1).

      The rationale for pairing Bully or Cs males with Cs females (with which both male types co-evolved) follows the approach used by Dierick et al. (2006), who investigated how the male-specific selection for aggression affected courtship and mating behaviors by testing them with standard Canton-S females. This design allows to isolate the effects of male genotype and behavior on courtship and mating outcomes, avoiding confounding effects from female behavioral changes.

      We initially compared selected Bully pairs (Bully males × Bully females) (Fig. S1) with Cs pairs and observed similarly shortened mating durations in both Bully × Bully and Bully × Cs matings (Fig. S1, Fig. 1F and G). Thus, the reduction in mating duration arises specifically from Bully males. We therefore chose to use Cs females as a standard background to assess the consequences of male-specific selection for aggression on reproductive behaviors.

      (2) Lifespan analysis is done on male progeny of Cs females mated to either genetically more distant Bully or co-evolved Cs males; the longer lifespan and performance on the former is interpreted as a trade-off with aggressiveness, rather than a simple explanation of hybrid vigor.

      We appreciate this comment, which again stems from a poor explanation from our part about the origin of the Bully line in the original manuscript. The Bully flies were derived from the same original population as the Cs line. Hybrid vigor typically arises when crossing individuals from distinct populations, which is not the case here as both Bully and CS come from the same population.

      To further support our conclusions, we conducted additional experiments using progeny from within-line crosses (Bully males × Bully females) and results revealed the same phenotype: the progeny of these flies also exhibited significantly longer lifespans than Cs males x Cs females progeny. This finding argues against hybrid vigor as the main explanation for the observed phenotype, since both the Bully and Cs crosses result in inbreeding, yet give longer lifespan in Bully. We will include these additional longevity data (currently not included in the manuscript) to strengthen our results and reinforce our interpretation.

      (3) Differences in CHCs between Bully and Cs males and Cs females mated to those males are not shown to cause differences in measured behavioral outcomes.

      We thank the reviewer for raising this important point regarding causality. One way to establish a causal link between differences in CHCs observed in Bully and Cs flies and the corresponding behavioral outcomes would be to experimentally manipulate CHC profiles. For instance, one could perfume oenocyte-less males with the compounds found in higher abundance in Bully flies, then perform behavioral assays to assess causality. We agree that such experiments would be highly informative in determining the functional roles of specific CHCs elevated in Bully males. However, this approach is technically challenging, as the perfuming technique must be optimized to transfer precise amounts of each compound. For example, this method can be used to gradually perfume flies to assess dose–response behavioral effects, whereas matching exactly the natural concentrations found in individuals, especially given inter-individual variability, remains difficult.

      We considered conducting such experiments during our study but did not pursue them for these technical reasons. Nevertheless, we can include a statement in the Discussion acknowledging this as an important future direction to test the causal relationship between CHC variation and behavior.

      Reviewer #2 (Public review):

      Summary:

      The authors compare "Bully" lines, selected for male aggression, to Canton-S controls and find that Bully males have lower mating success, shorter mating durations, and remate sooner. Chemical analyses show Bully males have distinct cuticular hydrocarbons (CHC) signatures and transfer markedly less cVA to females, offering a plausible mechanistic link to weaker mate-guarding.

      Paradoxically, Bully males live longer and remain fertile at older ages when CS males no longer mate, indicating a shift in the reproduction-survival trade-off in aggression-selected populations.

      Importantly, the work sheds light on proximate mechanisms, demonstrating that shifts in CHCs and pheromone transfer co-occur with changes in fitness traits, thus offering new entry points for understanding life-history evolution.

      We thank the reviewer for this positive summary of our work.

      Strengths:

      The manuscript's strengths lie in its comprehensive and integrative approach framed within an evolutionary context. By combining behavioral assays, chemical profiling, and lifespan measurements, the authors reveal a coherent pattern linking aggression selection to life-history trade-offs. The direct quantification of cVA in female reproductive tracts after mating provides a particularly compelling mechanistic correlate, strengthening the link between behavior and chemical signaling. Findings on altered 5-T and 5-P levels further highlight how chemical communication shapes mating and mate-guarding strategies. Analytical approaches are largely rigorous, and the results provide valuable insights into the pleiotropic effects of selection on socially relevant traits. The study will be of interest to Drosophila biologists working on sexual selection, behavioral evolution, and aging.

      We thank the reviewer for recognizing the integrative design and mechanistic contributions of our study.

      Weaknesses:

      The weaknesses are primarily conceptual rather than procedural. The generality of the findings is uncertain, as selection appears to be represented by only one (and a second closely related) Bully line, limiting conclusions about selection responses versus line-specific drift or founder effects. The causal link between aggression selection and increased longevity is not established: the data show a correlated shift but do not identify mechanisms underlying lifespan extension. In several places, the manuscript uses causal language (e.g., that selection 'influences' longevity or mating strategy) where association would be more accurate; this should be toned down to avoid overstatement. Ecological relevance is also not addressed, since laboratory conditions may bias the balance between costs and benefits of aggression compared with variable natural environments. Addressing these points would strengthen both the impact and clarity of the study.

      (1) Generality of findings and potential line effects

      We agree that our results presented in the main figures of the manuscript relied mainly on one Bully line (Bully A). To address potential line-specific effects, we replicated key courtship experiments with another independent line, Bully B, selected in parallel from the same Canton-S stock but through distinct selection replicates. The results obtained from Bully B closely matched those from Bully A, suggesting that the observed phenotypes are consistent consequences of aggression selection rather than random drift or founder effects.

      (2) Causality versus correlation

      We concur that some sentences in the manuscript could overstate causal interpretations. We will revise the text to clearly distinguish correlation from causation and to avoid implying direct causal relationships where data only support association.

      (3) Ecological relevance

      We appreciate this point. Our experiments were performed under controlled laboratory conditions, which may not fully capture the ecological contexts shaping the costs and benefits of aggression. We will acknowledge this limitation and expand the Discussion to consider how environmental variability could modulate the fitness trade-offs associated with aggression in natural populations.

      We thank both reviewers for their constructive feedback, which will help us strengthen the rigor and clarity of the manuscript. We believe that the additional results and revisions will satisfactorily address their concerns.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      We analyzed gait patterns using methods commonly found in the literature and discussed our results accordingly. However, the study of limbs support polygons was indeed developed specifically for studying locomotion on horizontal supports, and may not be applicable for studying vertical locomotion, which is in fact a type of locomotion shared by all arboreal species. In the future, it would be interesting to consider new methods for analyzing vertical gaits.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      We agree with this statement. In the future, we plan to study other species, particularly large-bodied ones with varied intermembral indexes.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

      We completely agree with this, and we would like to emphasize that our intention here was simply to conduct a modest inference test, the purpose of which is to provide food for thought for future studies, and whose results should be considered in light of a comprehensive evolutionary model.

      Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Weaknesses:

      The authors acknowledge the challenges that are inherent with working with captive animals in enclosures and how that might influence observed behaviors compared to these species' wild counterparts. The number of individuals per species in this sample is low; however, this is consistent with the majority of experimental papers in this area of research because of the difficulties in attaining larger sample sizes.

      Yes, that is indeed the main cost/benefit trade-off with this type of study. Working with captive animals allows for large comparative studies, but there is a risk of variations in locomotor behavior among individuals in the natural environment, as well as few individuals per species in the dataset. That is why we plan and encourage colleagues to conduct studies in the natural environment to compare with these results. However, this type of study is very time-consuming and requires focusing on a single species at a time, which limits the comparative aspect.

      Figure 2 is difficult to interpret because of the large amount of information it is trying to convey.

      We agree that this figure is dense. One possible solution would be to combine species by phylogenetic groups to reduce the amount of information, as we did with Fig. 3 on the dataset relating to gaits. However, we believe that this would be unfortunate in the case of speed and duty factor because we would have to provide the complete figure in SI anyway, as the species-level information is valuable. We therefore prefer to keep this comprehensive figure here and we will enlarge the data points to improve their visibility, and provide the figure with a sufficiently high resolution to allow zooming in on the details.

    1. Author response:

      We thank the two anonymous reviewers who took the time and effort to read and evaluate our work. We look forward to submitting a revised version of the manuscript that addresses their comments.

      A major concern shared between both reviewers is our use of Bayes factors instead pvalues to measure the strength of association. In revision, we will add a section in Supplementary to compare and constrast Bayes factor and p-values. Very briefly here, p-value is the tail probability under the null. Formally, it is defined as P(T > t|H<sub>0</sub>), for a test statistic T with obvserved value t computed from data D. But our interest is P(H<sub>0</sub>|D) and P(H<sub>1</sub>|D), posterior probabilities of the null and alternative models, about which p-value says nothing. With FDR approach, a q-value, the minium FDR at which a null is rejected, which can be estimated from a collection of p-values, has a Bayesian interpretation as the probability that H<sub>0</sub> is true conditioning on rejecting that H<sub>0</sub>. This is not quite P(H<sub>0</sub>|D) but nevertheless a useful probabilistic statement. For FDR approach to work, however, the collection of tests need to be reasonably independent, and their effect sizes need to be mixed. Both implicit assumptions can fail for cis eQTL analysis.

      On the other hand, with Bayes factors we can compute posterior probability P(H<sub>0</sub>|D) and P(H<sub>1</sub>|D) after specifying prior odds P(H<sub>1</sub>)/P(H<sub>0</sub>) (or equivalently P(H<sub>1</sub>) since P(H<sub>0</sub>)+ P(H<sub>1</sub>) = 1). In our manuscript, the prior odds used to determined Bayes factor threshold is 1/1000, or about 1 cis eQTL per gene. Bayes factor also allows us to directly compare two non-nested alternative models P(paternal effect|D) and P(maternal effect|D), which is difficult to do using p-values.

      It was suggested (by reviewer 2) that POE eQTL should be defined by testing H<sub>0</sub> : θ<sub>0</sub> = θ<sub>1</sub> against H<sub>1</sub> : θ<sub>0</sub> ̸= θ<sub>1</sub> where θ<sub>0</sub> and θ<sub>1</sub> are maternal and paternal effects respectively. This indeed was our initial approach, as evidenced in Table 1 (last column) and Section 4.5 in Methods. Our final approach is more stringent: H<sub>0</sub> : β<sub>0</sub> = β<sub>1</sub> = 0 against H<sub>1</sub> : β<sub>0</sub> = 0,β<sub>1</sub>/= 0, to use test for paternal effect as an example (the test for maternal effect can be obtained in a similar fashion). That is, we not only require that paternal and maternal effects be the same, as suggested by reviewer, but also require that they are both 0 under the null. This is partially motivated by an example in Table 1 (Gene ZNF890P) where both β<sub>0</sub> > 0 and β<sub>1</sub> > 0, and β<sub>0</sub>/= β<sub>1</sub>. In other words, examples like this where both paternal and maternal effects are significant and their differences are also significant were not included in our downstream classification and further analysis.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2 had several remaining suggestions:

      In some instances, the authors face well-known limitations. For example, bath application of drugs. Blockers of Gly and Gaba receptors are likely problematic when studying a network that includes a diverse set of inhibitory interneurons. Likewise, the results derived from application of AMPAR and KAR blockers should impact HC cell fxn, and presumably inner retina interneuron networks. In the Discussion the authors are encouraged to address more of these concerns (e.g., Discussion line 709).

      Rather than concluding that the bath application of drugs is without complications, they can conclude that under the experimental conditions, glutamate release from these On-bipolars continues to exhibit Transient and Sustained release. This is really the key point of their study.

      This is a good suggestion.  We have added a discussion of the complications of the pharmacology starting on line 754.  

      If indeed sustained release is a reflection of higher release rates, ribbon size is what point to but, there are many other possibilities, such as SV recycling, or recruitment of reserve pools of SVs, fusion machinery, Cav channel behavior. The authors could cite more literature in the Discussion.

      We added a sentence to this effect in the discussion, starting on line 866.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In the retina, parallel processing of cone photoreceptor output under bright light conditions dissects critical features of our visual environment and is fundamental to visual function. Cone photoreceptor signals are sampled by several types of bipolar cells and passed onto the ganglion cells. At the output of retinal processing, retinal ganglion cells send about 40 different codes of the visual scene to the brain for further processing. In this study, the authors focus on whether subtype-specific differences in the size of synaptic ribbon-associated vesicle pools of bipolar cells contribute to different retinal ganglion cell (RGC) responses. Specifically, inputs to ON alpha RGCs producing transient versus sustained kinetics (ON-S vs. ON-T, respectively) are compared. The authors first demonstrate that ON-S vs. ON-T RGCs are readily identifiable in a whole mount preparation and respond differently to both static and to a spatially uniform, randomly fluctuating (Gaussian noise) light stimulus. Liner-nonlinear (LN) models were used to estimate the transformation between visual input and excitatory synaptic input for each RGCs; these models suggested the presence of transient versus sustained kinetics already in the excitatory inputs to ON-T and ON-S RGCs. Indeed, the authors show that (glutamatergic) excitatory inputs to ON-S vs. ON-T RGCs are of distinct kinetics. The subtypes of bipolar cells providing input to ON-S are known (i.e., type 6 and 7), but the source of excitatory bipolar inputs to ON-T RGCs needed to be determined. In a tedious process, it is elegantly shown here that ON-T RGCs receive most of their excitatory inputs from type 5 and 6 bipolars. Interestingly, the temporal properties of light-evoked responses of type 5, 6, and 7 bipolars recorded from the somas were indistinguishable and rather sustained, suggesting that the origin of transient kinetics of excitatory inputs to ON-T RGCs suggested by the LN model might be found in the processing of visual signals at the bipolar cell axon terminal. Blocking GABA- or glycinergic inhibitory inputs did not alter the light-evoked excitatory input kinetics to ON-T and ON-S RGCs. Twophoton glutamate sensor imaging revealed significantly faster kinetics of light-evoked glutamate signals at ON-T versus ON-S RGCs. Detailed EM analysis of bipolar cell ribbon synapses onto ON-T and ON-S RGCs revealed fewer ribbon-associated vesicles at ON-T synapses, which is consistent with stronger paired-flash depression of lightevoked excitatory currents in ON-T RGCS versus ON-S RGCs. This study suggests that bipolar subtype-specific differences in the size of synaptic ribbon-associated vesicle pools contribute to transient versus sustained kinetics in RGCs. 

      Strengths: 

      The use of multiple, state-of-the-art tools and approaches to address the kinetics of bipolar to ganglion cell synapse in an identified circuit. 

      Weaknesses: 

      For the most part, the data in the paper support the conclusions, and the authors were careful to try to address questions in multiple ways. Two-photon glutamate sensor imaging experiment showing that blocking GABA- and glycinergic inhibition does not change the kinetics of light-evoked glutamate signals at ON-T RGCs would strengthen the conclusion that bipolar subtype-specific differences in the size of synaptic ribbon-associated vesicle pools contribute to transient versus sustained kinetics in RGCs. 

      Thank you for this suggestion. We have revised the text throughout to be careful not to imply that amacrine cells have no role in shaping EPSCs and spike output, but instead that the transience of the On-T responses persists without amacrine cells (see for example lines 91, 450-453, 514-518, 696-714). We have also added additional iGluSnFR experiments to the paper to further test this conclusion (new Figure 7). The new data shows that the transience of glutamate release from the On-T cells is retained when 1) spiking amacrine cell activity is suppressed by blocking voltage-gated Na<sup>+</sup> channels with TTX or 2) all amacrine cell activity is suppressed by blocking AMPA receptors with NBQX. This does provide nice additional evidence that amacrine cells are not necessary for the sustained/transient distinction.

      Reviewer #2 (Public Review): 

      Summary: 

      Goal of the study. The authors tried to pinpoint the origins of transient and sustained responses measured at retinal ganglion cells (rgcs), which is the output layer of the retina. Response characteristics of rgcs are used to group them into different types. The diversity of rgc types represents the ability of the retina to transform visual inputs into distinct output channels. They find that the physical dimensions of bipolar cell's synaptic ribbons (specialized release sites/active zones) vary across the different types of cone on-bpcs, in ways that they argue could facilitate transient or sustained release. This diversity of release output is what they argue underlies the differences in on-rgcs response characteristics, and ultimately represents a mechanism for creating parallel cone-driven channels. 

      Strengths: 

      The major strengths of the study are the anatomical approaches employed and the use of the "glutamate sniffer" to assay synaptic glutamate levels. The outline of the study is elegant and reflects the strengths of the authors. 

      Weaknesses: 

      The major weakness is that the ambitious outline is not matched with a complete set of results, and the set of physiological protocols is disjointed, not sufficient to bridge the systems-level question with the presynaptic release question. 

      Thank you for this comment as it provides an opportunity (here and in the paper) for us to clarify our main goal. We wanted to link the well-established distinction between transient and sustained retinal responses to anatomy. This required locating where this difference arises within the circuitry – which we show to be at least largely the bipolar output synapse – and then examining the structure of this synapse in detail. While we would certainly be interested in connecting our results to a biophysical description of the synapse, that was not the primary focus of our study and was not something we could add without substantial additional work.  

      Major comments on the results and suggestions. 

      The ribbon model of release has been explored for decades and needs to be further adapted to systems-level work. The study under consideration by Kuo et al. takes on this task. Unfortunately, the experimental design does not permit a level of control over presynaptic/bpc behavior that is comparable to earlier studies, nor do they manipulate release in ways that test the ribbon model (i.e., paired recordings or Ribeye-ko). Furthermore, the data needs additional evaluation, and the presentation and interpretations should draw on published biophysical and molecular studies. 

      As described above, our goal was to test several possible explanations for the difference between transient and sustained responses in OnT and OnS ganglion cells: (1) differences in the light responses of the bipolar cells that convey photoreceptor signals to the relevant ganglion cells; (2) shaping of bipolar transmitter release by presynaptic inhibition; (3) shaping of ganglion cell responses by postsynaptic inhibition or spike generation; (4) differences in feedforward bipolar synapses. We were surprised to find that the feedforward bipolar synapses play a central role in this difference, and your comment nicely prompts us to relate this to the large literature on biophysical studies of release from ribbon synapses. We have made substantial revisions in the text to do this. This includes anticipating the importance of feedforward synaptic properties in the abstract and introduction (lines 36-37 and 61-64), pointers in the results (lines 539-548), and several new paragraphs in the discussion (starting on lines 751, 773 and 787). By showing that the transient/sustained differences originates largely at feedforward bipolar synapses, we set the stage for future work that shows how biophysical properties of the synapse shape physiological signals that traverse it.

      To build a ribbon-centric context, consider recent literature that supports the assertion that ribbons play a role in forming AZ release sites and facilitating exocytosis. Reference Ribeye-ko studies. For example, ribbonless bpcs show an 80% reduction in release (Maxeiner et al EMBO J 2016), the ribbonless retina exhibits signaling deficits at the output layer (Okawa et al ...Rieke, ..Wong Nat Comm 2019), and ribbonless rods show an 80% reduction the readily releasable pool (RRP) of SVs (Grabner Moser, elife 2021). In addition, the authors could refer to whole-cell membrane capacitance studies on mammalian rods, cones, and bpcs, because the size of the RRP of SVs scales with the dimensions and numbers of ribbons (total ribbon footprint). For comparison, bipolars see the review by Wan and Heidelberger 2011. For a comparison of mammalian rods and cones, see, rods: Grabner and Moser (2021 eLife), Mueller.. Regus Leidig et al. (2019; J Neurosci) and cones Grabner ...DeVries (Nat Comm 2023). A comparison of cell types shows that the extent of release is (1) proportional to the total size of the ribbon footprint, and (2) less release is witnessed when ribbons are deleted (also see photo ablation studies by Snellman.... And Mehta..Zenisek, Nat Neurosci and Neuron).

      Thank you for these pointers into the literature.  We have included much of this work in the revised Discussion (see three paragraphs starting on line 751). The revised text focuses on the evidence that larger and more numerous ribbons lead to increased release. The direct evidence from previous work for this relationship supports our (indirect) conclusions in the current paper about the role of ribbon size and associated vesicle pools in transient vs sustained responses.  

      Ribbon morphology may change in an activity-dependent manner. The rod ribbon AZ has been reported to lengthen in the dark (Dembla et al 2020), and deletion of the ribbon shortens the length of the AZ (defined by Cav1,4 or RIM2); in addition, the Ribeye-ko AZs fail to change in size with light and dark conditioning. Furthermore, EM studies on rod and cone AZs in light and dark argue that the number of SVs at the base of the ribbon increases in the dark, when PRs are depolarized (see Figure 10, Babai et al 2016 JNeurosci). Lastly, using goldfish Mb1 on-bipolars, Hull et al (2006, J Neurophysio) correlated an increase in release efficiency with an increase in ribbon numbers, which accompanied daylight. >> When release activity is high, ribbon AZ length increases (Dembla, rods), the number of docked SVs increases (Babai, rods cones), and the number of ribbons increases (Hull, diurnal Mb1s). 

      We have extensively revised the discussion section to include more discussion of ribbons, particularly emphasizing evidence supporting the general argument that larger ribbons support higher release rates. We focused on studies that provided direct links between release rates and ribbon size or number of ribbon-associated vesicles.  This includes studies that pair electrophysiology and anatomy and those that measure the consequences of ablating ribbons,

      The results under review, Kuo et al., were attained with SBF-SEM, which has the benefit of addressing large-volume questions as required here, yet it achieves lower spatial resolution than what is attained with TEM tomography and FIB-EM. Ideally, the EM description would include SV size, and the density of ribbon-tethered SVs that are docked at the plasma membrane, because this is where the SVs fuse (additional non-ribbon release sites may also exist? Mehta ... Singer 2014 J Neurosci). Studies by Graydon et al 2011 and 2014 (both in J Neurosci), and Jean ... Moser et al 2018 (eLife) are good examples of quantitative estimates of SVs docking sites at ribbons. SBF-SEM does not allow for an assessment of SVs within 5 nm of the PM, but if the authors can identify the number of SVs that appear within the limit of resolution (10 to 15 nm) from the PM, then this data would be useful. Also, what dimension(s) of the large ribbons make them larger? Typically, ribbons are fixed in height (at least in the outer retina, 200 to 250 nm), but their length varies and the number ribbons per terminal varies. Is the larger ribbon size observed in type 6 bpcs do to longer ribbons, or taller ribbons? A longer ribbon likely has more docked SVs. An additional possibility is that more SVs are about the ribbon-PM footprint, either more densely packed and/or expanding laterally (see definitions in Jean....Moser, elife 2018). 

      We have included an additional analysis of ribbon surface area from our 3D SBFSEM reconstructions. As with the volume measurements included in the original submission, ribbon surface areas are distinct between type 5i and type 6 bipolar cells (Fig. S10A), ON-T RGCs on average receive input from ribbons with smaller surface area than ON-S RGCs (Fig. S10B), and ribbon surface area predicts the number of adjacent vesicles across bipolar cell types (Fig. S10C).  We agree that a higher resolution view of presynaptic structures would be very helpful, but the resolution of our SBF-SEM data is limited (e.g. each pixel is 40 nm on a side).  This resolution does not allow us to distinguish between vesicles at vs near the membrane. 

      In our observations, both length and height of the ribbons showed variability across individual bipolar cells. And ribbons in type 6 bipolar cells tended to be either longer and/or taller compared to those in type 5 cells. We agree that a longer ribbon may accommodate more docked SVs. A more definitive analysis would benefit from higher-resolution, isotropic 3D reconstructions of ribbons, which would allow more precise shape analysis and ,together with a detailed assessment of docked SVs at the ribbons.

      The ribbon literature given above makes the argument that ribbons increase exocytotic output, and morphological studies suggest that release activity enhances 1) ribbon length (Dembla) and 2) the density of SVs near the PM (Babai). These findings could lead one to propose that type 6 bpcs (inputs to On-sustained) are more active than type 5i (feed into On-transient). Here Kuo et al. show that the bpcs have similar Vm (measured from the soma) in response to light stimulation. Does Vm predict release? Not entirely as the authors acknowledge, because: Cav channel properties, SV availability, and negative feedback are all downstream of bpc Vm. The only experiment performed to test downstream factors focused on negative feedback from amacrines. The data presented in Figures 5C-F led me to conclude the opposite of what the authors concluded. My impression is that the T-ON rgc exhibits strong disinhibition when GABA-blockers are applied (the initial phase is greatly increased in amplitude and broadened with the drug), which contrasts with the S-On rgc responses that show a change in the amplitude of the initial phase but not its width (taus would be nice). Here and in many places the authors refer to changes in release kinetics, without implementing a useful description of kinetics. For instance, take the cumulative current (charge) in Figure 5C and fit the control and drug traces to arrive at taus, and their respective amplitudes, and use these values to describe kinetic phases. One final point, the summary in Figure 5D has a p: 0.06, very close to the cutoff for significance, which begs for more than an n = 5. Given that previous studies have shown that bpc output is shaped by immediate msec GABA feedback, in ways that influence kinetic phases of release (..Mb1 bipolars, see Vigh et al 2005 Neuron), more attention to this matter is needed before the authors rule out feedback inhibition in favor of ribbon size. If by chance, type 5i bpcs are under uniquely strong feedback inhibition, then ribbon size may result from less activity, not less output resulting from smaller ribbons.

      The text surrounding Figure 5 led to some confusion, and we have revised that text and the figure for clarity.  First, the data in that figure is entirely from On-T cells (the upper and lower panels show block of GABA and glycine receptors separately).  Second, the observation that we make there is that block of inhibitory receptors increases the transience of the On-T excitatory input, rather than decreasing it as would be expected if the transience is created by presynaptic inhibition. We have added additional data and that increase in transience is now significant. Inhibitory block does substantially increase the amplitude of the postsynaptic response, and a likely origin of this change in response is inhibitory feedback to the bipolar synaptic terminal. We now indicate this in the text on page 13, lines 438-453. 

      The key result of this figure for our purposes here is that the transience of the excitatory input to the OffT cell remains with inhibitory input blocked. We have clarified throughout the text that our results indicate that inhibitory feedback is not necessary for the difference between transient release into On-T and sustained release onto On-S. This does not mean that inhibitory feedback does not shape the responses in other ways or contribute to the transient/sustained difference - just that for the specific stimuli we use that difference is retained without presynaptic inhibition. We have also added citations to past work showing that activity of amacrine cells can modulate bipolar transmitter release. 

      Whether strong feedback inhibition limits activity and therefore limits ribbon size in an activity-dependent way is an intriguing possibility. Indeed, addressing why ribbons are larger in type 6 bipolar cells vs. other bipolar types will be an interesting avenue of further study. However, it would be surprising if ribbon sizes changed during the acute pharmacological block conditions (~10-15 minutes) we employed in our study. Our point here is that there is an interesting correlation between presynaptic ribbon size and the kinetics of glutamate release. We do not think that the two possibilities stated in the last sentence (“…ribbon size may result from less activity, not less output resulting from smaller ribbons”) are mutually exclusive.

      We have not further quantified the response kinetics in the experiments of Figure 5 as the large changes induced by the pharmacology (especially GABA receptor block) make it unclear how to interpret quantitative differences.  In other places we have quantified kinetics through the STA or specified that our focus was more qualitative (i.e. transient vs sustained kinetics). 

      As mentioned above, the behavior of Cav channels is important here. This is difficult to address with voltage clamps from the soma, especially in the Vm range relevant to this study. Given that it has previously been modeled that the rod bpc to AII pathway adapts to prolonged depolarization of rbcs through downregulating Cav channel-mediated Ca<sup>2+</sup> influx (Grimes ....Rieke 2014 Neuron), it seems important for Kou et al to test if there is a difference in Cav regulation between type 6 and 5i bpcs. Ca<sup>2+</sup>  imaging with a GCaMP strategy (Baden....Lagnado Current Biology, 2011) or filling the presynapse with Ca dyes (see inner hair cells: Ozcete and Moser, EMBO J 2020) would allow for the correlation of [Ca]intra with GluSnf signals (both local readouts).

      This is a good suggestion but is outside the scope of our current paper. Our focus was on the circuit origin of the difference in response of the OnT and OnS responses rather than the specific biophysical mechanism.  We are of course interested in the mechanism, but the additional experiments needed to pin that down would need to be a part of future experiments. The work here represents an important step in that direction as it greatly reduces the number of possible locations and mechanisms for the sustained/transient difference and hence serves to focus any future mechanistic investigations.

      Stimulation protocol and presentation of Glutamate Sniffer data in Figure 6. In all of your figures where you state steady st as a % of pk amplitude, please indicate in the figure where you estimate steady state. Alternatively, if you take the cumulative dF/F signal, then you can fit the different kinetic phases. From the appearance of the data, the Sustained Glu signals look like square waves (Figure 6B ROI1-4), without a transient at onset, which is not predicted in your ribbon model that assumes different kinetic phases (1. depletion of docked SVs, and 2. refilling and repriming). The Transient responses (Figure 6B ROI5-8) are transient and more compatible with a depressing ribbon scheme. If you take the cumulative, for all of the On-S and compare it to all of the On-T responses, my guess is the cumulative dF/F will be 10 to 20 larger for the S-On. Would you conclude that bpc inputs to On-S (type 6) release 20fold more SVs per 4 seconds on a per ribbon basis, and does the surface area of the type 6 bpcs account for this difference? From Figures 8B and D, the volume of the ribbon is ~2 fold greater for type 6 vs 5i, but the Surface Area (both faces of ribbon) is more relevant to your model that claims ribbon size is the pivotal factor. If making cumulative traces, and comparisons on an absolute scale is unfounded, then we need to know how to compare different observations. The classic ribbon models always have a conversion factor such as the capacitance of an SV or q size that is used to derive SV numbers from total dCm or Qcontent. See Kim ....et al von Gersdorff, 2023, Cell Reports. Why not use the Gaussian noise stimulus in Fig 6 as in Figure 1 and 2? 

      For iGluSnFR recordings, steady-state responses were measured from the mean fluorescence over the last 1 sec of the light step (2 sec duration) response. We have included this information in the figure caption and in the Methods. 

      There is a good deal of variability in the iGluSnR responses from one ROI to another, and the ROIs shown in the original submission had a less prominent transient component than many other ROIs. We have replaced this figure with another that is more representative of the average behavior across ROIs. The full range of behavior is captured in Figure 6C; it is clear across ROIs that glutamate release near ON-S dendrites shows both sustained and transient components. The new experiments in which we block amacrine cell activity also include a few more example ROIs from ON-S cells, and those also show both transient and sustained components.

      Your suggestion to integrate the iGluSnFR signals to compare to our structural analysis of ribbons is interesting. However, we are hesitant to make a quantitative comparison between the two without further experiments to validate how the iGluSnFR signals we measure relate to release of single vesicles. For example, a quantitative measure of release based on the iGluSnR experiments would require accounting for possible differences in the expression of the indicator - which could differ both in overall level and/or location relative to release sites. 

      This comment and one above highlight the importance of measures of ribbon surface area, which we now provide (Figure S10).

      Figure 7. What is the recovery time for mammalian cones derived from ribbon-based models? There are estimates from membrane capacitance studies. Ground squirrel cones take 0.7 to 1 sec to recover the ultrafast, primed pool of SVs when probed with a paired-pulse protocol (Grabner ...DeVries 2016, Neuron). Their off-bpcs take anywhere from under 0.2 sec to a second to recover, which is a combination of many synaptic factors (Grabner ...DeVries Nat Comm 2023). Rod On bpcs take over a second (Singer Diamond 2006, reviewed Wan and Heidelberger 2011). In Figure 7B, the recovery time is ~150 ms for the responses measured at rgcs. This brief recovery time is incompatible with existing ribbon models of release. Whole-cell membrane capacitance measurements would be helpful here.

      Thanks for drawing our attention to this issue. Indeed, we see a relatively rapid recovery in the paired-flash experiments. We now discuss this recovery time in the context of past measurements of recovery of responses in cones and bipolar cells (paragraph starting on line 773). There are many factors that could contribute to the relatively rapid recovery we observe - including synaptic factors such as those highlighted by Grabner et al., (2016) either at the cone-to-bipolar synapses or the bipolar-to-RGC synapses. We are certainly interested in a more detailed understanding of this issue, but the additional experiments are outside the scope of this paper.  

      Experimental Suggestion: Add GABA blockers and see if type 5i bpc responds with more release (GluSniff) and prolonged [Ca2+] intra (GCaMP). Compare this to type 6 bpc behavior with GABA/gly blockers. This will rule in or out whether feedback inhibition is involved. 

      Figure 7 in the revised manuscript includes two new experiments examining glutamate release (without the simultaneous measurement of bipolar cell intracellular calcium) while blocking (1) all/most amacrine cell-mediated inhibition via inclusion of NBQX in the bath solution, and (2) blocking spiking amacrine cells via inclusion of TTX in the bath solution. The transient vs sustained difference in light-evoked glutamate release around ON-T and ON-S RGC dendrites remained with amacrine activity suppressed. These new results are consistent with the anatomical and pharmacological data that were included in the initial submission of the manuscript (Fig. 5) that indicate presynaptic inhibition does not have a major role in shaping release kinetics at these synapses. 

      Reviewer #3 (Public Review): 

      Summary: 

      Different types of retinal ganglion cell (RGC) have different temporal properties - most prominently a distinction between sustained vs. transient responses to contrast. This has been well established in multiple species, including mice. In general, RGCs with dendrites that stratify close to the ganglion cell layer (GCL) are sustained; whereas those that stratify near the middle of the inner plexiform layer (IPL) are transient. This difference in RGC spiking responses aligns with similar differences in excitatory synaptic currents as well as with differences in glutamate release in the respective layers - shown previously and here, with a glutamate sensor (iGluSnFR) expressed in the RGCs of interest. Differences in glutamate release were not explained by differences in the distinct presynaptic bipolar cells' voltage responses, which were quite similar to one another. Rather, the difference in transient vs. sustained responses seems to emerge at the bipolar cell axon terminals in the form of glutamate release. This difference in the temporal pattern of glutamate release was correlated with differences in the size of synaptic ribbons (larger in the bipolar cells with more sustained responses), which also correlated with a greater number of vesicles in the vicinity of the larger ribbons. 

      The main conclusion of the study relates to a correlation (because it is difficult to manipulate ribbon size or vesicle density experimentally): the bipolar cells with increased ribbon size/vesicle number would have a greater possibility of sustained release, which would be reflected in the postsynaptic RGC synaptic currents and RGC firing rates. This model proposes a mechanism for temporal channels that is independent of synaptic inhibition. Indeed, some experiments in the paper suggest that inhibition cannot explain the transient nature of glutamate release onto one of the RGC types. Still, it is surprising that such a diverse set of inhibitory interneurons in the retina would not play some role in diversifying the temporal properties of RGC responses. 

      Strengths: 

      (1) The study uses a systematic approach to evaluating temporal properties of retinal ganglion cell (RGC) spiking outputs, excitatory synaptic inputs, presynaptic voltage responses, and presynaptic glutamate release. The combination of these experiments demonstrates an important step in the conversion from voltage to glutamate release in shaping response dynamics in RGCs. 

      (2) The study uses a combination of electrophysiology, two-photon imaging, and scanning block-face EM to build a quantitative and coherent story about specific retinal circuits and their functional properties. 

      Weaknesses: 

      (1) There were some interesting aspects of the study that were not completely resolved, and resolving some of these issues may go beyond the current study. For example, it was interesting that different extracellular media (Ames medium vs. ACSF) generated different degrees of transient vs. sustained responses in RGCs, but it was unclear how these media might have impacted ion channels at different levels of the circuit that could explain the effects on temporal tuning.

      We do not have an explanation for the quantitative differences in response kinetics we observed in Ames’ medium vs. ACSF. There are modest differences in calcium and magnesium concentration and a larger difference in potassium (2.5 mM in ACSF vs 3.6 mM in Ames). It would be interesting to test which of these (or other) differences accounts for the difference in response kinetics.

      (2) It was surprising that inhibition played such a small role in generating temporal tuning. At the same time, there were some gaps in the investigation of inhibition (e.g., IPSCs were not measured in either of the RGC types; pharmacology was used to investigate responses only in the transient RGCs).

      We were also surprised at this result. We have included additional data on inhibition in the revised manuscript. Figure S3 shows light-evoked IPSC data from both RGC types (Fig. S3) and Fig. 7 shows additional iGluSnFR measurements around both ON-T and ON-S RGC dendrites with inhibition blocked via bath application of NBQX (Fig. 7) and separately with inhibition from spiking amacrine cells blocked with TTX. These experiments provide additional evidence for the small role of inhibition. We attempted to measure the kinetics of excitatory input to ON-S cells with inhibition blocked, but we found that the excitatory input showed strong spontaneous oscillations under these conditions and the light responses were changed so drastically that we did not feel we could make a clear comparison with control conditions.

      (3) There could be additional discussion and references to the literature describing several topics, including: temporal dynamics of glutamate release at different levels of the IPL; previous evidence that release sites from a single presynaptic neuron can differ in their temporal properties depending on the postsynaptic target; previous investigations of the role of inhibition in temporal tuning within retinal circuitry. 

      Thanks, we have included more discussion and references to the relevant literature as you have suggested in the recommendations to authors.

      Reviewer #1 (Recommendations For The Authors): 

      The presented raw data of the pharmacological experiments show that SR95531 and TPMPA robustly increased both the amplitude and duration of the transient component of the light step-evoked excitatory currents, with slight, if any enhancement of the sustained component in ON-T RGCs Figure 5C. Statistical analysis of the population data (n=5) with Wilcoxon signed rank test yielded no significant difference (ln 363). However, reanalyzing the data extracted from the graph (Figure 5D) revealed that the difference between the paired observations is normally distributed (Shapiro-Wilk normality test, P=0.48) allowing parametric statistics to be used, which provides higher statistical power. Accordingly, reanalyzing the presented data with paired Student's t-test data revealed significant differences (P=0.01) in the steady-state amplitude normalized to that of the peak, recorded in the presence of SR95531 and TPMPA. In other words, based on the (rough) analysis of the presented pharmacology data GABAergic feedback inhibition significantly contributes to shaping the transient portion of the light-evoked excitatory currents in ON-T RGCs, by making it more transient. I believe a similar analysis based on the actual data is necessary, and the results should be communicated either way. However, if warranted, two-photon glutamate sensor imaging experiments showing that blocking GABA- and glycinergic inhibition does not change the kinetics of light-evoked glutamate signals at ON-T RGCs should also be performed, as these would be critical in drawing a conclusion regarding the effect of feedback inhibition on glutamate release from bipolar cells.

      Thanks for this feedback. We have added another cell to the data set in Fig. 5D. With this addition, SR95531/TPMPA application significantly increases the response transience of excitatory currents measured in ON-T RGCs compared to control. This enhanced transience in GABA<sub>A/C</sub> receptor blockers is due to an increase in the amplitude of the initial peak component of the response (control peak amplitude: -833.7±103.3 pA; SR95531+TPMPA peak amplitude: 2023±372.7pA; p=0.03, Wilcoxon signed rank test), with no change to the later sustained component (control plateau amplitude: -200.7±14.71pA; SR95531+TPMPA plateau amplitude: -290.9±43.69pA; p=0.15, Wilcoxon signed rank test).

      We should clarify that this result indicates that GABAergic inhibition makes the excitatory inputs to ON-T RGCs less transient. Block of GABA receptors increased transience, thus intact GABAergic transmission appears to limit the initial peak of the response and therefore make excitatory currents more sustained. We unfortunately were not able to examine whether sustained excitatory currents in ON-S RGCs would become more transient using the same approach. In our hands, bath application of SR95531+TPMPA led to the generation of large-amplitude (>1nA) oscillatory bursts of excitatory input that developed within 5 minutes and persisted for the duration of the incubation (up to ~30 min) in drugs. Further, presentation of light steps tended to induce variable amplitude responses, likely dependent on the presence of spontaneous bursts; when large amplitude responses were evoked, these typically oscillated for several seconds after the step.

      To examine a potential role for presynaptic inhibition in transient vs. sustained bipolar cell output, we therefore chose to eliminate amacrine cell-mediated inhibition by bath application of the AMPA/kainate receptor antagonist NBQX in additional iGluSnFR measurements. This manipulation should leave ON bipolar cell responses intact while eliminating most amacrine cell-mediated responses (and OFF bipolar cell driven responses). In separate experiments, we also eliminated inhibition from spiking amacrine cells by bath application of TTX. As shown in new Fig. 7, sustained and transient responses persisted in distal versus proximal RGC dendrites, respectively. Compared to SR95531/TPMPA, bath application of NBQX was not associated with spontaneous bursts of glutamate release around ON-S dendrites. These results show that amacrine cell-mediated inhibition is not required for either sustained or transient glutamate release from bipolar cells that provide input to ON-S and ON-T RGCs.

      Small points: 

      (1) The legend of Figure 1 (D) refers to shaded areas to show {plus minus} SEM, but no shade is visible (at least in my printout).

      The SEM shading is there in Fig. 1D but is mostly obscured by the mean lines for the respective RGC types. We have added this to the figure caption.

      (2) I found the reported Vrest for the ON bipolar cells somewhat depolarized. Perhaps due to the uncompensated junction potentials? 

      These measurements are indeed not corrected for the liquid junction potential (which is approximately -10.8 mV between K-gluconate internal and Ames’ solution). We did not apply this correction since the appropriate value is not clear in perforated patch recordings as the intracellular chloride concentration is unknown (and can differ from that in the pipette solution). We have clarified this in the results text where we describe the Vrest values (lines 335-338).

      (3) It is Wilcoxon signed rank test, not Wilcoxan. 

      Thanks for catching this. This has been corrected in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      Some amacrines express vesicular Glut-3 transporter and are reported to release glutamate (Marshak, Vis Neurosci 2016). Are Amacrine vGlut3 signals postsynaptic (within ~0.5 um) to cone bpc ribbons?

      We did not characterize VgluT3-expressing amacrine cells in our SEM datasets. A recent study by Friedrichson et al. (Nat. Comm. 2024; PMID 38580652) using 3D SEM reconstructions found that Vglut3-amacrines are postsynaptic to both type 5i and type 6 bipolar cells, as well as other type 5/xbc bipolar cells (and receive >50% of their input from type 3a OFF bipolar cells).

      How far apart are the postsynaptic targets from the ribbon release sites? The ribbons at type 5i bpc/On-T input appear separated from the dendrites of On-T rgcs (Figure 8C). At least further away than the type 6 bpc ribbons are from On-S rgc dendrites (Figure 8C). Distance may create a thresholding phenomenon, whereby only multivesicular bouts at the onset of depolarization are able to elevate synaptic Glu to levels needed to activate On-T GluRs. See Grabner et al Nat Comm 2023 for such scenarios in the outer retina.

      This is an intriguing possibility, but we should point out that the presynaptic ribbons in Fig. 9C (former Fig. 8C) are similar distances (within the resolution of our reconstructions) from the ON-T and ON-S dendrites. We have increased the brightness of the dendrite segments for both RGC types in the resubmission figure; note that ON-T RGCs have spine-like protrusions that may not have been as apparent in the previously submitted version of our manuscript.

      In Figures 1 and 2, Sustained responses look like the derivative of Transient responses, minus the negative going inflection. In addition, the sustained responses appear to have a lower threshold of activation than the transient On rgcs, because there are more bouts of action potentials (and membrane depol in V-clamp) with earlier onset in sustained than transients traces. It would be great if the GLuSniff data captured these differences. Take cumulative dF/F and see what the onset time is, or an initial tau if possible.

      This is a good suggestion. However, we are reluctant to make detailed quantitative comparisons such as this without further validation of how the kinetics of the iGluSnFR signals relate to kinetics of glutamate release.  A specific concern is that differences in the location and amount of iGluSnFR expression could impact any such comparisons.

      A recent study by Kim et al von Gersdorff (Cell Reports, 2023) presents interesting phases of release in response to light flashes, measured from AIIs, and complementary results from pairs of rbcs-AIIs. The findings highlight the complexity of SV pools under well-controlled experiments. Could their results be explained as variations in rbc ribbon size through development, and possibly between rbcs or within an rbc? 

      This certainly seems possible and would be consistent with the dependence of release on ribbon size that our results support.  It would be interesting to see if there are clear anatomical correlates of that change in release properties.  

      Figure 5 is a pivotal point in the study, but my review has identified numerous weaknesses. The feedback inhibition onto bipolar cell terminals is likely to sculpt glutamate release, and the results do not convincingly rule out this possibility. The suggestions for improvements range from the data needing to be reanalyzed with regard to statistical tests, and/or adding a few more data points (n = 5) before concluding a p: 0.06 is insignificant. 

      We have added an additional recording to this data set. With n= 6 cells, there is now a statistically significant difference between ON-T RGC excitatory currents measured in control conditions versus during GABA<sub>A/C</sub> receptor blockade. Please note that all the recordings shown in Figure 5C-F are from ON-T RGCs (the two panels show separately block of GABergic and glycinergic receptors). We did not make it sufficiently clear that the original trend (now statistically significant) is opposite of that expected if presynaptic GABAergic inhibition contributes to response transience in ON-T RGCs.  What we see is that excitatory synaptic inputs to ON-T RGCs become more transient (rather than mpre sustained) during GABA<sub>A/C</sub> receptor blockade. We have revised the text in that section to make this point more clearly.

      We have also included new data from iGluSnFR measurements showing that bath application of NBQX does not affect light step-evoked glutamate release kinetics at proximal (sustained) or distal (transient) RGC dendrites (control: steady-state amp. as % of peak amp. 13 ± 10; mean ± S.D.; n = 189 ROIs/4 FOVs for ON-T dendrites vs 40 ± 12; mean ± S.D.; n = 287 ROIs/8 FOVs for ON-S dendrites; NBQX: 6 ± 3; mean ± S.D.; n = 112 ROIs/1 FOV for ON-T dendrites vs 23 ± 9; mean ± S.D.; n = 97 ROIs/2 FOVs for ON-S dendrites; *p<0.001). By blocking glutamate receptors on amacrine cells, NBQX (AMPA/KAR antagonist) eliminates all/most amacrine cell-mediated signaling in the retina and should therefore abolish presynaptic inhibitory input to bipolar cell terminals across the IPL. Taken together, our results indicate that presynaptic inhibition does not play a critical role in establishing transient versus sustained kinetics for the stimulus conditions we employed in our study.

      There is a need to cite more recent literature on bipolar cell ribbons (e.g. see Wakeham et al., Front. Cell. Neurosci., 2023), in order to support experimental design and interpretation of the results. The authors should discuss their Ribeye-KO data from Okawa et al 2019 Nat Comm, Figure 7, in the context of their new iGluSnFR results. 

      Thank you for prompting us on this issue. We have expanded the discussion regarding ribbons and included more citations to the ribbon literature. That is largely in the three paragraphs starting on line 727.

      One point deserves emphasis because it is central to the authors' ribbon model but not consistent with their data. The ribbon model as they put it, and as commonly stated, holds that a transient phase of release at the onset of depolarization indicates the depletion of the primed SVs, and the subsequent slower rate of release (steady state release in the authors' terms) reflects recruiting, priming, and release of new SVs. The On-transient dendrite GluSnf responses agree with this multiphasic process, but the sustained responses show only an elevation in glutamate without a pronounced initial peak, creating a square-wave-shaped response (Figure 6B). This does not agree with the simple ribbon-based release model. I would expect the signals from the T- and S-on dendrites to have a comparable initial phase, while the sustained phase should be greater in amplitude for the S-on dendrites. More discussion may clarify possible mechanisms.

      Thanks for pointing this out. The example iGluSnFR traces we originally included in the manuscript were not entirely representative in that they did not show much initial transient phase. Note there is a distribution of steady-state amplitudes for proximal dendrites in Fig. 6C; the examples are from ROIs from the upper end of the distribution. In the new Figure 7, we have included some additional examples that show both a clear transient and sustained component. The summary data in Figure 6C shows the distribution of sustained/transient ratios across ROIs.  

      Reviewer #3 (Recommendations For The Authors): 

      (1) It would be interesting to understand the differences in IPSCs in the two RGC types. Perhaps they are small in both types, which would explain their apparent lack of impact on temporal tuning. The authors may already have these data.

      We did make measurements of noise-evoked IPSCs (as well as EPSCs) in a subset of ON-T and ON-S recordings. We have now included this data as Figure S3. There are slight differences in the kinetics of inhibition between RGC types (Fig. S3C) and there is a trend towards stronger inhibition (relative to excitation) in ON-T RGCs compared to ON-S RGCs (Fig. S3E), although there is not a statistically significant difference. In both cases excitatory synaptic currents are as large or larger than inhibitory currents, and this does not include the difference in driving force near spike threshold which will favor excitatory input by a factor of 2-3.  Hence our data suggests that postsynaptic inhibition does not play a major role in generating the differential temporal spiking responses of ON-T and ON-S RGCs. However, additional experiments examining the relative contribution of excitation and inhibition to spiking output in these RGCs would be needed to reach a firm conclusion.

      The pharmacological experiments in which we blocked inhibition (Fig. 5C-F, new Fig. 7) were designed to test the effect of presynaptic inhibition on bipolar cell output (voltage-clamp isolation of excitatory currents in Fig. 5; iGluSnFR measurements of glutamate release in Fig. 7). We do not mean to suggest that postsynaptic inhibition does not have any role in shaping the spiking behavior of these RGC types, but that transient vs. sustained kinetics are already present in the bipolar cell output and that presynaptic inhibition of bipolar cell terminals does not appear to account for this difference.  We have revised the text throughout to be clearer on this point.

      (2) It could be convincing to show transient/sustained differences between RGC types in dim light, where the response would depend on the rod bipolar/AII circuit. In this case, any difference in temporal properties would presumably be explained by differences that localize to the cone bipolar cell axon terminals. Indeed, is that the result in Figure 1B? This seems to be a dim stimulus presented on darkness, which may be driven through the rod bipolar pathway. The authors could then discuss the interpretation of this data in terms of the rod bipolar circuit. 

      Yes, Figure 1B is a dim light step (~30R*/rod/s) presented from darkness and the distinction between cells is clear down at still lower light levels that more effectively isolate signaling through the rod bipolar pathway. Thanks for making this point that observation of distinct temporal responses under scotopic conditions where signals suggests these differences must arise at and/or downstream of cone bipolar cell output. We have included additional text (lines 361-365) in the results describing bipolar cell responses that raise this point.

      (3) Glutamate release was already measured across the full IPL depth by Borghuis et al. (2013) and Franke et al. (2017). It would be appropriate to better motivate the current study based on these existing measurements.

      We have clarified that these important studies provided important motivation for measuring excitatory synaptic input to ON-T vs. ON-S RGCs (lines 165-169).   

      (4) Line 212/213. It would be appropriate to add to the list of papers showing the different stratification of transient vs. sustained responses: Borghuis et al. (2013) and Beaudoin et al. (2019).

      Thank you - these references have been added.  

      (5) Line 635-638. It would be useful to discuss papers by Pottackal et al. (2020, 2021), which suggested that a single presynaptic cell (starburst) can signal with different temporal properties depending on the postsynaptic target (other starburst vs. DSGCs). The mechanism was not completely resolved (i.e., it was not explained by differences in presynaptic Ca channels at the two synapse types), but it at least shows that neurotransmitter release can show different filtering depending on the postsynaptic target from the same presynaptic neuron. (This could also be at play for the type 6 bipolar cell inputs to ON-S vs. ON-T RGCs in the present study.)

      We have added a reference to Pottackal et al 2021 in this section.

      (6) Line 714. Should describe the procedure for embedding the tissue in agarose. 

      We have added more detail regarding agarose embedding for preparation of retinal slices in the methods.

      (7) Line 775. Need a better description of the virus (not the construct), what serotype? Provide the Addgene number if available. 

      This has been added to the methods.

      (8) Line 808. Was the SD for the gaussian really 50%? That would cut off a lot of the distribution, i.e., it would get clipped at 0. 

      Yes, the SD for Gaussian noise was 50%. This high contrast stimulus was used in part to achieve measurable signals from bipolar cells. You are correct that some of the distribution was clipped at 0 (it was also clipped at twice the mean to make sure that the distribution remained symmetrical). The clipping was accounted for during our LN analyses.

      (9) The paper should discuss Swygart et al. (2024) results showing different spatial surround properties of neighboring synapses from a type 6 bipolar cell. Based on this result, it would seem very likely that amacrine cells could play a role in shaping the temporal processing of bipolar cell glutamate release as well. Indeed, spatial and temporal processing will not be completely independent in a typical experiment. For example, with the spot stimulus used in the present study, bipolar cells within the center versus the edge of the spot will have different balances of center/surround activation, which could potentially influence their temporal processing.

      We have included discussion of results from Swygart et al 2024 in the section of the Discussion in which we point out differences in surround inhibition between ON-S and ON-T RGCs (lines 710-714). We agree that spatial and temporal processing are not completely independent. Our results with SR95531/TPMPA indicate ON-T RGCs receive stronger GABAergic surround inhibition than ON-S RGCs (Fig. S8). However, our results in Fig. 5C-D show GABAergic surround inhibition makes ON-T excitation more sustained rather than more transient. So even though bipolar cells presynaptic to ON-T RGCs receive stronger surround inhibition (Fig. S8), this inhibition does not establish the transient kinetics of glutamate release from these bipolar cells (in fact, it works to make release more sustained). Additional iGluSnFR experiments where we used NBQX to block all/most amacrine cell-mediated responses also suggest presynaptic inhibition does not have an important role in establishing differential glutamate release kinetics onto ON-S vs. ON-T RGC dendrites (Fig. 7).

      (10) Cui et al. 2016 described ON-S Alpha as having a divisive suppression mechanism that explained the temporal properties of white-noise response better than a standard LN model. Do the authors think the divisive suppression reflects a property of the excitatory synapses independent of inhibition?

      This is an interesting question, but one for which we don’t have a good answer for now. As mentioned in some of the above responses and as we have tried to clarify in the manuscript, we do not mean to imply that there is no role for presynaptic inhibition in modulating bipolar cell output, including for the divisive suppression described by Cui et al. Rather, our point is that the distinction between transient and sustained excitatory input to ON-T and ON-S RGCs does not require presynaptic inhibition and is more likely an intrinsic property of the bipolar cell synapses. 

      (11) Do the authors mean to imply that the pool size at bipolar cell ribbon synapses could depend on the use of Ames vs. ACSF? 

      For now, we do not have a good answer as to why there are quantitative differences in response kinetics between Ames and ACSF. We have not done any experiments to investigate whether ribbon sizes or ribbon pools are different in the different solutions.

      (12) More generally, different mean luminance levels could drive different levels of baseline glutamate release, which could alter the available pool of vesicles at bipolar cell ribbon synapses. Can we explain varying degrees of transient/sustained in the same cell at different levels of mean luminance based on this mechanism (e.g., Grimes et al., 2014)?

      Yes, the emergence of a transient component of excitatory input to ON-S RGCs at ~100 R*/rod/s versus at scotopic levels (0.5 R*/rod/s) in Grimes et al. (2014) could be due to differences in the number of releasable vesicles (due to different type 6 bipolar cell axon terminal membrane potentials and hence differences in spontaneous release rates) at the different light levels.

      We should note that although ON-T and ON-S RGCs exhibit some changes in transient/sustained kinetics across different light levels, the relative differences between these RGC types are preserved across light levels. We have included a statement about this in the text (lines 361-367).

      (13) Figure 1. Have the authors considered performing the LN analysis of the firing responses, to compare the degree of rectification between the two RGC types?

      This is a good suggestions. From an LN analysis of spiking responses, we do not observe a clear difference between the static nonlinearity component of the model for ON-T and ON-S RGCs. Both RGC types are strongly rectified under our experimental conditions.  

      (14) Figure 5. Do the authors have the pharmacology data for the ON-S cells? There are examples of sustained EPSCs in amacrine cells that become more transient after blocking inhibition, which at least suggests that inhibition can play some role in the transient/sustained nature of glutamate release (Park et al., 2015, Figure 3). Perhaps ON-S cells likewise become more transient with inhibition blocked. 

      (The colored symbols in A were not visible in a printout. It would be useful to indicate the cell type (ON-T) in C and E). 

      As described above in the response to reviewer 1’s recommendation for authors, we were not able to use SR95531/TPMPA for recordings from ON-S RGCs. Bath application of these drugs led to oscillatory bursts of excitatory input to ON-S RGCs. However, the lack of effect of bath-applied NBQX on the kinetics of glutamate release around either ON-T or ON-S RGC dendrites (new Fig. 7) suggests that presynaptic inhibition does not contribute to generating sustained excitation to ON-S RGCs (or transient excitation to ON-T RGCs).  

      We have corrected Fig. 5A to include the referenced colored symbols and have also edited Fig 5C and E to clarify that measurements in Fig. 5C-F are from ON-T RGCs.

      (15) Figure 6 legend. Should be Kcng4-Cre, not KCNG-Cre. Also, it should make clear that this is cre-dependent expression of iGluSnFR. For C, were the statistics based on the number of FOVs? 

      Thanks for catching this, we have corrected Figure 6 legend. The methods section includes a description of how we achieved iGluSnFR expression on alpha RGC dendrites via a cre-dependent viral strategy in Kcng4-Cre mice.  We have also clarified that the statistics are based on ROIs in Figure 6C.

      (16) Figure 7, Flashes were apparently 400% contrast on a dim background. What was the background? Is there a rod component to the response in this case? 

      In Figure 7 (now Figure 8), the same background (~3300 R*/rod/s; 2000 P*/Scone/s) was used as in the Gaussian noise and step response experiments. At this light level, the response should be primarily be mediated by cones.

      (17) Figure S1. The colors here differ from those in previous figures (Here, ON-T, magenta; ON-S, cyan). Is something mislabeled? 

      Thanks for catching this. We mistakenly swapped the labels in the legend for Fig. S1. The figure colors were correct, but we have corrected the legend in the revised manuscript.

      (18) Figure S2. For the LN model for RGC synaptic currents, the ON-S are more rectified than some previous recordings (Cui et al., 2016). Is this perhaps explained by different light levels?

      We aren’t sure why ON-S excitatory currents are more strongly rectified in our recordings compared to Cui et al., 2016. Cui et al. used an ~20-fold higher background light intensity (~40,000 P*/cone/s vs. ~2000 P*/cone/s in our study), so different light levels may be a factor (although we should point out that rectification increases in these RGCs between scotopic to low photopic light levels (see Grimes et al., 2014 and Kuo et al., 2016).

      (19) The study is apparently comparing PV1 and PV2 described in Farrow et al. (2013; see Supplementary information for stratification analysis), which should be cited.

      Thanks, we have corrected this oversight in the revised manuscript. We now cite Farrow et al and mention the connection to PV1 and PV2 in the first paragraph of Results (lines 104-108).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, but in its current form, the study feels incomplete and requires additional work.

      We thank the reviewer for the encouragement and for recognizing the potential of clinical chemistry plasma as an accessible source for cfDNA-based analyses. We look forward to addressing the gaps described below.

      My major concerns/suggestions are as follows:

      (1) Context and literature

      The introduction provides only limited background on prior attempts to use heparinized plasma for cfDNA work. It is well known that heparin can inhibit PCR and sequencing library preparation, which has historically discouraged its use. The authors should summarize the relevant literature more comprehensively and explain clearly why this approach has not been widely adopted until now, and how their work differs from or overcomes these earlier challenges.

      We thank the reviewer for their valuable comments and agree that the review of prior work needs to be more thorough, with the gaps clearly identified. In the revised manuscript, we will expand the introduction to include a more comprehensive summary of prior studies. Some of the material was in the Discussion, but we will move it to the introduction in the revision. In general, we will comment briefly here about the novelty of this work and the previous gap in the literature:

      (1) Previous pre-analytical studies use DNA fluorometry and qPCR, which cannot distinguish between genomic DNA contamination (from cells) and cfDNA. In contrast, our study uses adapter-based NGS with DNA spike-ins, which can exclude genomic DNA contamination and enable precise quantification of cfDNA input and measurement of their lengths. In Figure 5b-c, we demonstrate that we were able to match our paired sample results only under the measurements of our NGS study, not in previous attempts. Note the current Fig. 5 captions b&c should be swapped and will be corrected in the revision.

      (2) As the reviewer has astutely mentioned, heparin is a well-recognized inhibitor of PCR, and heparinized specimens are historically contraindicated for molecular testing. However, most modern cfDNA assays now use NGS, which includes multiple purification steps before PCR amplification, minimizing the impact of heparin interference.

      (3) Previous clinical chemistry tests used serum tubes, which are known to generate background gDNA during clotting and are therefore unsuitable for cfDNA-based analyses. In recent years, modern hospital chemistry laboratories, especially those supporting emergency departments, have gradually transitioned to heparin separator tubes for faster turnaround. Hence, residual plasma from heparin separator tubes is a more recent option, one that was not widely available when key pre-analytical studies on cfDNA were performed.

      (2) Genome-wide coverage

      The analyses focus on correlations in methylation patterns and fragmentation metrics, but there is no evaluation of sequencing coverage across the genome. For both WGS and WMS, it would be important to demonstrate whether cfDNA from heparin plasma provides unbiased coverage, or whether certain genomic regions are systematically under-represented. A comparison against coverage profiles from cell-derived DNA (e.g., PBMC genomic DNA) would help to put the results in context and assess whether the material is suitable for whole-genome applications.

      Thank you for the insightful comment. We agree that evaluating sequencing coverage across the genome is important for assessing the suitability of cfDNA from heparin separators. In response, we are performing additional, in-depth runs to compare genome-wide coverage profiles in the Hospital Cohort. The results of these analyses will be included in the revised version of the manuscript.

      (3) Viral detection sensitivity

      The study shows strong concordance in viral detection between EDTA and heparin samples, but the sensitivity analysis is lacking. For clinical relevance, it is critical to demonstrate how well heparin-derived plasma performs in low viral load cases. A quantitative comparison of viral read counts and genome coverage across tube types would strengthen the conclusions.

      We agree that evaluating analytical sensitivity in cases with low viral loads is important for understanding clinical performance. To address this point, we plan to include additional paired cases with viral loads below 1,000 IU/mL and examine the correlation of viral read counts between EDTA and heparin separators in this subset.

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      We thank the reviewer for the insightful comments and will work to clarify and address the mentioned issues. We do not find the residual plasma from the heparin separator to be a replacement for gold standard methods. Instead, we take it as a practical and complementary resource that may help broaden the accessibility of samples. Comparable cfDNA metrics highlight its potential to serve as an additional source for biobanking and research applications.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      We thank the reviewer for the encouraging comment. While cost-effectiveness is a practical advantage, we believe the greater strength of this approach lies in the accessibility of sampling. Residual plasma from routine clinical tests offers an opportunity to include patients or time points that would otherwise be difficult to capture, such as those with severe illness or those sampled before treatment.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      We thank both reviewers for this comment. See above under Reviewer 1’s responses for our provisional perspective on the background literature and gap. We will expand the Introduction to provide a more comprehensive summary of prior studies.

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      We appreciate this thoughtful comment, which highlights reasonable concerns about cfDNA degradation in heparin. We would like to clarify that the Hospital Cohort, which only used leftover plasma in the clinical lab, was designed to reflect real-world clinical workflows, where unavoidable delays before plasma processing are already incorporated. In the Healthy Cohort, a subset of samples is also processed after controlled delays, as shown in Supplementary Figure 2.

      Regarding the differing results in Barra et al. (2025, LabMed), where heparin tubes showed 85% cfDNA degradation, it is important to note that samples were incubated at 37 °C for 24 hours. We anticipate that endogenous nuclease would be active under 37 °C and would cause cfDNA degradation. However, this condition differs markedly from the relevant clinical workflows we describe here. In the routine hospital settings, blood samples are typically kept at room temperature for up to 60 minutes during transport and waiting. The outpatient setting can be more variable, but samples here are supposed to be refrigerated during transportation. They are then processed in high-throughput, fully automated systems that comply with nationally standardized quality regulations in the United States (CLIA). The resultant plasma will be physically separated from cellular components because of the gel in the heparin separators. The processed tubes are subsequently transferred to refrigerated storage at 4 °C. Under these conditions, samples do not experience prolonged exposure to elevated temperatures such as 37 °C, and refrigeration usually occurs within two hours of collection. We will incorporate these details in the revised manuscript.

      Also, as we mentioned in our reply to Reviewer 1, Barra et al. used qPCR like most cfDNA pre-analytical studies, but qPCR is not a perfect DNA quantification method for NGS-based downstream analyses because it measures both cfDNA and contaminating genomic DNA. The latter can be excluded by most NGS assays. By using constant spike-in internal controls, our approach directly quantifies the amount of sequenceable cfDNA, providing a more accurate estimate of input DNA (Figure 5c). In one possible future experiment, the same sample in the Healthy Cohort can be delayed by 1-2 hours prior to processing (centrifugation and refrigeration) and kept at room temperature rather than 4 °C to mimic real-world delays. Outputs would be cfDNA yields and fragment sizes, and we would use constant spike-ins to quantify the amount of sequenceable DNA.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      We appreciate the reasonable concerns regarding cfDNA degradation and agree that the methylation profile is not an adequate metric for degradation. To evaluate for degradation, we will focus on NGS-derived length profiles (WGS data) and constant spike-in DNA. We appreciate the reviewer’s suggestion to use the Spearman correlation, and this will be incorporated.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      We appreciate the reviewer’s constructive comments regarding the CNV analysis. We agree that the use of low-coverage WGS (~5×) limits the reliable detection of small CNVs, and we will acknowledge this as a limitation in the revised manuscript. To address this point, we will perform additional analyses using 50kb as bin sizes. To reduce potential bias from single-sample representation, we will show the aggregated CNV plots for all CNA-positive cases along with their log₂ copy ratio correlations, and Spearman’s correlation will be applied as suggested.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      Thank you for this suggestion. We will evaluate CNV profiles using alternative informatics methods.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      We thank the reviewer for emphasizing SNVs as an important application of cfDNA. We agree that the limited volume of residual plasma is a constraint. Routine chemistry tests leave ~1–2 mL of plasma, and this limited volume places an upper limit on performing SNV analysis. We will expand the discussion of this limitation in the paper. Our approach is not intended to replace specialized tubes for large-volume cfDNA collection but rather to complement them by enabling the use of residual material.

    1. Author response:

      Reviewer #1:

      We agree with the reviewer that a limitation of our study is its focus on cell-based assays rather than in vivo experiments. We did consider evaluating the effects of statins on B cell responses in vivo; however, this approach is complicated by findings that statins can influence antigen presentation by dendritic cells, thereby impacting antibody responses (Xia et al, 2018). One possible solution would be to use B cell-specific conditional knockout models to study the roles of the identified proteins in an in vivo context. However, we currently do not have access to these models and were therefore unable to include such experiments within a feasible timeframe. We will revise the discussion section to acknowledge these points.

      The reviewer also noted that our study assessed the roles of HMGCR, SQLE, and prenylation in B cell activation using pharmacological inhibitors and genetic knockdown/out approaches. Loss-of-function techniques such as RNAi, siRNA, and CRISPR can be challenging to apply to primary B cells, but we are exploring their feasibility for future revisions. While we acknowledge the limitations of using pharmacological inhibitors, we have taken several steps to mitigate these, including targeting multiple steps in the cholesterol biosynthetic pathway using structurally distinct inhibitors and conducting rescue experiments by supplementing downstream metabolites. To further investigate potential off-target effects of statins, we have recently performed proteomic analysis of B cells treated with and without fluvastatin. The data suggest that fluvastatin primarily affects cholesterol metabolism and does not cause widespread off-target effects. We will include this new data in the revised manuscript.

      Reviewer #2:

      The reviewer suggested that the study would be strengthened by determining whether the observed changes are specific to LPS + IL-4 stimulation or represent a more general B cell response to mitogenic signals.

      A complementary study by James et al. (James et al, 2024) investigated murine B cells stimulated via the B cell receptor (BCR) and CD40, using anti-IgM and anti-CD40 antibodies alongside IL-4. Their proteomic analysis showed that such co-stimulation induces a fivefold increase in total cellular protein mass within 24 hours, mirroring our findings with LPS + IL-4. They also reported upregulation of proteins associated with cell cycle progression, ribosome biogenesis, and amino acid transport. Furthermore, by using SLC7A5 knockout mice, they demonstrated that this transporter is required for B cell activation. We will expand our discussion to include and these findings.  We will also expand on the final figure in our paper showing that the effects of statins are not limited to LPS.

      References:

      James O, Sinclair LV, Lefter N, Salerno F, Brenes A & Howden AJM (2024) A proteomic map of B cell activation and its shaping by mTORC1, MYC and iron. bioRxiv 2024.12.19.629506 doi:10.1101/2024.12.19.629506 [PREPRINT]

      Xia Y, Xie Y, Yu Z, Xiao H, Jiang G, Zhou X, Yang Y, Li X, Zhao M, Li L, et al (2018) The Mevalonate Pathway Is a Druggable Target for Vaccine Adjuvant Discovery. Cell 175: 1059-1073.e21

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      This research investigates how the cellular protein quality control machinery influences the effectiveness of cystic fibrosis (CF) treatments across different genetic variants. CF is caused by mutations in the CFTR gene, with over 1,700 known disease-causing variants that primarily work through protein misfolding mechanisms. While corrector drugs like those in Trikafta therapy can stabilize some misfolded CFTR proteins, the reasons why certain variants respond to treatment while others don't remain unclear. The authors hypothesized that the cellular proteostasis network-the machinery that manages protein folding and quality control-plays a crucial role in determining drug responsiveness across different CFTR variants. The researchers focused on calnexin (CANX), a key chaperone protein that recognizes misfolded glycosylated proteins. Using CRISPR-Cas9 gene editing combined with deep mutational scanning, they systematically analyzed how CANX affects the expression and corrector drug response of 234 clinically relevant CF variants in HEK293 cells. 

      In terms of findings, this study revealed that CANX is generally required for robust plasma membrane expression of CFTR proteins, and CANX disproportionately affects variants with mutations in the C-terminal domains of CFTR and modulates later stages of protein assembly. Without CANX, many variants that would normally respond to corrector drugs lose their therapeutic responsiveness. Furthermore, loss of CANX caused broad changes in how CF variants interact with other cellular proteins, though these effects were largely separate from changes in CFTR channel activity. 

      This study has some limitations: the research was conducted in HEK293 cells rather than lung epithelial cells, which may not fully reflect the physiological context of CF. Additionally, the study only examined known diseasecausing variants and used methodological approaches that could potentially introduce bias in the data analysis. 

      We agree that the approaches employed here are not fully physiological, though we would remind the reviewer that we previously benchmarked the results generated by this experimental platform against a variety of other published datasets (PMID: 37253358). Regarding the issue of bias, we outline several pieces of evidence suggesting we retain robust and near-uniform sampling of these variants across these experimental conditions. We hope our comments below address all of these concerns. Overall, we believe deep mutational scanning is actually remarkably unbiased relative to other approaches due to the fact that all measurements are taken from a single dish of cells that is processed in parallel. Moreover, we show the trends are highly reproducible across replicates and users (see Figure S1). 

      How cellular quality control mechanisms influence the therapeutic landscape of genetic diseases is an emerging field. Overall, this work provides important cellular context for understanding CF mutation severity and suggests that the proteostasis network significantly shapes how different CFTR variants respond to corrector therapies. The findings could pave the way for more personalized CF treatments tailored to patients' specific genetic variants and cellular contexts. 

      Strengths: 

      (1) This work makes an important contribution to the field of variant effect prediction by advancing our understanding of how genetic variants impact protein function. 

      (2) The study provides valuable cellular context for CFTR mutation severity, which may pave the way for improved CFTR therapies that are customized to patient-specific cellular contexts. 

      (3) The research provides further insight into the biological mechanisms underlying approved CFTR therapies, enhancing our understanding of how these treatments work. 

      (4) The authors conducted a comprehensive and quantitative analysis, and they made their raw and processed data as well as analysis scripts publicly available, enabling closer examination and validation by the broader scientific community. 

      We are grateful for this broad perspective on the general relevance of this work.

      Weaknesses: 

      (1) The study only considers known disease-causing variants, which limits the scope of findings and may miss important insights from variants of uncertain significance. 

      We agree with this caveat. A more comprehensive library of CFTR variants will undoubtedly be useful for assigning variants of uncertain significance, though we note that such a large library would involve trade-offs in depth/ coverage that will compromise the sensitivity/ precision of the measurements. This will, in turn, make it challenging to compare the effects of CFTR modulators across the spectrum of clinical variants. For this reason, we believe the current library will remain a useful tool for CF variant theratyping.

      (2) The cellular context of HEK293 cells is quite removed from lung epithelia, the primary tissue affected in cystic fibrosis, potentially limiting the clinical relevance of the findings. 

      We concede this limitation, but note that we did carry out functional measurements in FRT monolayers, which are a prevailing model that closely mimics pharmacological outcomes in the clinic (see Fig. 6). 

      (3) Methodological choices, such as the expansion of sorted cell populations before genetic analysis, may introduce possible skew or bias in the data that could affect interpretation. 

      We respectfully disagree with this point. The recombination system we employ in these studies generates millions of recombinant cells per transfection, which corresponds to tens of thousands of clones per variant. Moreover, our sequencing data contain exhaustive coverage of every variant characterized herein within each of the final data sets. Generally, we do not see any evidence to suggest certain variants are lost from the population. We note that, while HEK293T cells are not the most physiological relevant system, they are robust to uniformly express these variants in a manner that provides a precise comparison of their effects and/ or response to CFTR modulators. To address this concern, we added Document S1 to the revised draft, which shows the total number of reads for each variant within each fraction and each experiment.

      (4) While the impact on surface trafficking is convincingly demonstrated, how cellular proteostasis affects CFTR function requires further study, likely within a lung-specific cellular context to be more clinically relevant.

      We agree with this caveat.

      Reviewer 1 (Recommendations for the authors):

      Major Issues

      Cell Growth Bias? After sorting cell populations into quartiles, cells were expanded before genetic analysis - if CFTR variants affect cell doubling time (e.g., severely misfolded variants causing cellular stress), this could skew variant abundance within sorted quartiles and bias results.

      Based on several observations, we do not believe this to be a significant issue. First, we note that we previously benchmarked the quantitative outputs of these experiments against a variety of other investigations and found very good agreement with previous variant classifications and expression levels (PMID: 37253358). If there were significant bias, we believe this would have come up in our efforts to benchmark the assay. Second, we note that we typically create recombinant cell lines that express WT or ΔF508 CFTR only alongside each recombinant cellular library. Importantly, we have never observed any difference in the growth rate of cultures expressing different CFTR variants. Third, even if cells expressing certain variants grow slower, it seems likely this slow growth would consistently occur in the context of each sorted subpopulation. Given that scores are derived from the relative amount of identifications across each subpopulation, we do not suspect this should impact the scoring. Overall, we believe the robustness of this cell line is a key feature that allows us to avoid any such issues related to proteostatic toxicity.

      (1) Please add methodological detail. The data analysis pipeline lacks adequate description beyond referencing prior studies - essential details about what the Plasma Membrane Expression (PME) values represent (fold enrichment vs input library) and calculation methods must be provided.

      We thank the reviewer for this helpful comment. We have added the text below to the revised manuscript in order to provide more detail to the reader:

      “Briefly, low quality reads that likely contain more than one error were first removed from the demultiplexed sequencing data. Unique molecular identifier sequences within the remaining reads were then counted within each sample to track the relative abundance of each variant. To compare read counts across fractions, the collection of reads within each population were then randomly down-sampled to ensure a consistent total read count across each sub-population. The surface immunostaining of each variant was then estimated by calculating the the weighted-average immunostaining intensity for each variant using the following equation:

      where ⟨I⟩<sub>variant</sub> is the weighted-average fluorescence intensity of a given variant, ⟨F⟩<sub>i</sub> is the mean fluorescence intensity associated with cells from the ith FACS quartile, and Ni is the number of variant reads in the i<sup>th</sup> FACS quartile. Variant intensities from each replicate were normalized relative to one another using the mean surface immunostaining intensity of the entire recombinant cell population for each experiment to account for small variations in laser power and/ or detector voltage. Finally, to filter out any noisy scores arising from insufficient sampling, we repeated the down-sampling and scoring process then rejected any variant measurements that exhibit more than X% variation in their intensity scores across the two replicate analyses. The reported intensity values represent the average normalized intensity values from two independent down-sampling iterations across three biologicals replicates.”

      (3) Add detail on library composition. The distribution of CFTR variants within the parental HEK293T library after landing pad insertion needs documentation, including any variant dropout or overrepresentation issues.

      As noted in our previous work (PMID: 37253358), our CF variant library is quite uniform, with each mutant contributing on average, 0.43% of the library with a standard deviation of +/- 0.16%. This corresponds to an average read depth of over 40K reads per variant, per experimental condition in the final analyses. Indeed, the most abundant variant in the pool was ΔF508 (1.67% of total reads). In contrast, the least sampled variant was S549R (1647T>G) was still sampled an average of 3,688 times per replicate, which corresponds to 0.09% of the total reads. See Doc S1.

      (4) Documentation of CFTR variant overlap between parental and CANX KO HEK293T libraries is needed, including whether every variant was present at equivalent input abundance in both libraries.

      We thank the reviewer for this suggestion. Though there are small deviations in the composition of recombinant parental and knockout cell lines, the relative abundances of individual variants within the recombinant populations only differs by an average of 18.5% between the parental and knockout lines. There are no cases in which we observe a single variant increasing by more than 50% in the knockout line relative to the parent. However, there is a single variant, Y563N, that exhibits a 96% decrease in its abundance in the context of the knockout cell line. Nevertheless, even this variant was sampled over 1,000 times, and it’s final score passed all quality control metrics. In the revised draft, we have provided a complete table containing the total number of reads and percent of total reads for each variant for each cell line and condition (see Doc. S1).

      (5) The section reporting CANX impact on functional rescue of CF variants requires clearer logic flow - the conclusion about higher specific activity of CFTR assembled without CANX appears misleading, given later discussion about CANX allowing suboptimally folded CFTR to traffic to the surface.

      We apologize for any confusion. We invoked the term “specific activity” in the enzymological sense, which is to say the proportion of active enzyme (i.e. channel) at the plasma membrane differs in the knockout line. The logic is quite simple- if protein levels are lower while ion conductance remains the same in the knockout cells, then a higher proportion of the mature channels must be inactive in the parental cell line. Thus, we suspect fewer of the channels at the plasma membrane are active in the context of the parental cell line containing CANX. We considered modifications to the text in the discussion, but ultimately feel the current text strikes a reasonable balance between nuance and simplicity.

      (6) In your discussion, consider that HEK293T cellular context differs significantly from lung epithelia, and the hYFP quenching assay may have insufficient dynamic range or high noise for detecting relevant functional differences.

      We modified the following sentence in the discussion to introduce this possibility:

      “While these discrepancies could stem from differences in the dynamic range of the functional assays, they may also suggest the stringency of QC is more finely tuned to ion channel biosynthesis in epithelial monolayers.”

      Minor Issues

      (1) Include immunostaining quartiles as a supplementary figure overlaid on Figure 1A, and clarify whether quartiles were consistent across experiments or adjusted for each sort.

      We added a new figure to demonstrate the gating approach in the revised manuscript (see Fig. S10). We have also added the following text to the Methods section:

      “Sorting gates for surface immunostaining were independently set for each biological replicate and in each condition to ensure that the population was evenly divided into four equal subpopulations.”

      (2) Figure 2C improvements. Flip the figure 180 degrees to position MSD1 and NBD1 on the left, replace the blue-to-red color scale with yellow-to-blue or monochromatic scaling for better intermediate value differentiation.

      Respectfully, we prefer not to do this so that our figures can be easily compared across our previous and forthcoming publications. We chose this rendering because this view depicts certain trends in variant response more clearly. 

      (3) Indicate the location of ECL4 on the protein structure shown in Figure 2C for better reference.

      We appreciate the suggestion. However, most of ECL4 is missing from the experimental cryo-EM models of CFTR due to a lack of density. For this reason, we did not modify the figure. 

      Reviewer 2 (Public review):

      In this work, the authors use deep mutational scanning (DMS) to examine the effect of the endogenous chaperone calnexin (CANX) on the plasma membrane expression (PME) and potential pharmacological stabilization cystic fibrosis disease variants. This is important because there are over 1,700 loss-of-function mutations that can lead to the disease Cystic Fibrosis (CF), and some of these variants can be pharmacologically rescued by small-molecule "correctors," which stabilize the CFTR protein and prevent its degradation. This study expands on previous work to specifically identify which mutations affect sensitivity to CFTR modulators, and further develops the work by examining the effect of a known CFTR interactor-CANX-on PME and corrector response. 

      Overall, this approach provides a useful atlas of CF variants and their downstream effects, both at a basal level as well as in the context of a perturbed proteostasis. Knockout of CANX leads to an overall reduced plasma membrane expression of CFTR with CF variants located at the C-terminal domains of CFTR, which seem to be more affected than the others. This study then repeats their DMS approach, using PME as a readout, to probe the effect of either VX-445 or VX-455 + VX-661-which are two clinically relevant CFTR pharmacological modulators. I found this section particularly interesting for the community because the exact molecular features that confer drug resistance/sensitivity are not clear. When CANX is knocked out, cells that normally respond to VX-445 are no longer able to be rescued, and the DMS data show that these non-responders are CF variants that lie in the VX-445 binding site. Based on computational data, the authors speculate that NBD2 assembly is compromised, but that remains to be experimentally examined. Cells lacking CANX were also resistant to combinatorial treatment of VX-445 + VX-661, showing that these two correctors were unable to compensate for the lack of this critical chaperone. 

      One major strength of this manuscript is the mass spectrometry data, in which 4 CF variants were profiled in parental and CANX KO cells. This analysis provides some explanatory power to the observation that the delF508 variant is resistant to correctors in CANX KO cells, which is because correctors were found not to affect protein degradation interactions in this context. Findings such as this provide potential insights into intriguing new hypothesis, such as whether addition of an additional proteostasis regulators, such as a proteosome inhibitor, would facilitate a successful rescue. Taken together, the data provided can be generative to researchers in the field and may be useful in rationalizing some of the observed phenotypes conferred by the various CF variants, as well as the impact of CANX on those effects. 

      To complete their analysis of CF variants in CANX KO cells, the research also attempted to relate their data, primarily based on PME, to functional relevance. They observed that, although CANX KO results in a large reduction in PME (~30% reduction), changes in the actual activation of CFTR (and resultant quenching of their hYFP sensor) were "quite modest." This is an important experiment and caveat to the PME data presented above since changes in CFTR activity does not strictly require changes in PME. In addition, small molecule correctors also do not drastically alter CFTR function in the context of CANX KO. The authors reason that this difference is due to a sort of compensatory mechanism in which the functionally active CFTR molecules that are successfully assembled in an unbalanced proteostasis system (CANX KO) are more active than those that are assembled with the assistance of CANX. While I generally agree with this statement, it is not directly tested and would be challenging to actually test. 

      The selected model for all the above experiments was HEK293T cells. The authors then demonstrate some of their major findings in Fischer rat thyroid cell monolayers. Specifically, cells lacking CANX are less sensitive to rescue by CFTR modulators than the WT. This highlights the importance of CANX in supporting the maturation of CFTR and the dependence of chemical correctors on the chaperone. Although this is demonstrated specifically for CANX in this manuscript, I imagine a more general claim can be made that chemical correctors depend on a functional/balanced proteostasis system, which is supported by the manuscript data. I am surprised by the discordance between HEK293T PME levels compared to the CTFR activity. The authors offer a reasonable explanation about the increase in specific activity of the mature CFTR protein following CANX loss. 

      For the conclusions and claims relevant to CANX and CF variant surveying of PME/function, I find the manuscript to provide solid evidence to achieve this aim. The manuscript generates a rich portrait of the influence of CF mutations both in WT and CANX KO cells. While the focus of this study is a specific chaperone, CANX, this manuscript has the potential to impact many researchers in the broad field of proteostasis.

      We thank the reviewer for their thoughtful and comprehensive perspectives on the scope and relevance of this work.

      Reviewer 2 (Recommendations for the authors):

      While I did not identify any major weaknesses in this manuscript, I offer some suggestions below, as well as some conclusions to consider:

      (1) Missing period at the end of line 51.

      We thank the reviewer for catching this grammatical error and have added proper punctuation.

      (2)Figure S1 "repre-sent"??

      We have corrected this punctuation error.

      (3) Figure S2 missing parentheses A)

      We have corrected the punctuation error.

      (4) Figure S5, "B) The total ΔRMSD of the active conformation of NBD2 is shown for variants bound to VX-445. Red bars show increasing deviations from the native NBD2 conformation in the mutant models, and blue bars show how much VX-445 suppresses these conformational defects in NBD2."

      VX-445 should not bind/stabilize the G85E from the calculations in Figure S5A. As a confirmation, it would be nice to see the calculated hypothetical effect of VX-445 in the G85E variant as performed for L1077P and N1303K. I also want to point out that G58E is referred to as being non-responsive in S5A, but then in S5D, N103K is referred to as non-responsive, but this variant falls pretty far below the stabilized region calculated in S5A, right?

      We agree that it would be insightful to examine the RMSD changes in a non-responsive variant such as G85E. We added the G85E NBD2 ∆RMSD to Supplemental Figure S5B and a G85E ∆RMSD structure map as an additional subpanel at Supplemental Figure S5C. As the reviewer expected, VX-445 fails to confer any stability to G85E as shown by a lack of significant change in NBD2 ∆RMSD or any visible ∆RMSD throughout the structure.  Finally, we acknowledge that N1303K falls below the stabilized region as calculated in S5A. However, we note that the binding energy only suggests it is likely to interact with the protein- this does not to necessarily mean that binding will allosterically suppress conformational defects in NBD2. Moreover, this is simply an in silico calculation, that does not necessarily capture all of the nuanced interactions in the cell (or lack thereof). We have corrected this in the Figure S5 caption, which reads as follows:

      “Maps of the change in RMSD between N1303K modeled with and without VX-445 shows that few structural regions are stabilized by VX-445 for N1303K, which responds poorly to VX-445 in vitro.”

      (5) "stan-dard" standard?

      We have corrected this punctuation error.

      (6) Line 270, "these variants" is written twice

      We have corrected this typographical error.

      (7) Figure 6 B. What is being compared? The text writes "there are prominent differences in the activity of these variants [those with CANX] (two-way ANOVA, p = 3.8 x 10-27." Does this mean WT vs. delF508, P5L, V232D, T1036N, and I1366N combined? I have not seen a set of 5 variables compared to a single variable. Usually, it would be WT vs. DelF508, WT vs. P5L, WT vs. V232D...right? Maybe this is normal in this specific field. The same goes for the CANX knockout comparison "(two-way ANOVA, p = 0.06).".

      In this instance, the two-way ANOVA test is evaluating whether there are differences in the half-lives of individual variants and/ or systematic differences across the variant measurements in the knockout line relative to the parental cells. The test gives independent p-values for these two variables (variant and cell line). We chose this test because it makes it clear that, when you consider the trends together, one variable has a significant effect while the other does not.

      (8) Why don't the CFTR modulators rescue CFTR activity in the WT FRT monolayers?

      We thank the reviewer for this inquiry. Please note that compared to DMSO, VX-661 does significantly enhance the forskolin-mediated response of WT-CFTR (red asterisk). Treatments with VX-445 alone, VX-661+VX-445, or VX-661+VX-445+VX-770 showed no significant forskolin stimulation of WT-CFTR. These observations could be attributable to the brief period in which WT-CFTR cDNA is transiently transfected. However, it is not necessarily anticipated that modulators would enhance WT-CFTR function. Correctors and potentiators are designed to rescue processing and gating abnormalities, respectively. WT-CFTR channels do not exhibit such defects.

      In both constitutive overexpression systems and primary human airway epithelia, published literature demonstrates that prolonged exposure to CFTR modulators has resulted in variable consequences on WT-CFTR activity. For example, forskolin-mediated responsiveness of WT-CFTR is not altered by chronic application of VX-445 (PMID: 34615919) nor VX-770 (PMID: 28575328, 27402691, 37014818). In contrast, short-circuit current measurements show that forskolin stimulation of WT-CFTR is augmented by chronic treatment with VX-809 (PMID: 28575328), an analog of VX-661. Thus, our findings are congruent with observations reported by other groups.

      (9) General comment: As someone not familiar with the field, it would be nice to see the structures of VX-445 and VX-661 somewhere in the figures or at least in the SI.

      We appreciate this suggestion, but do not feel that we include enough structural analyses to justify a stand-alone figure for these purposes. The structures of these compounds are easily referenced on a variety of internetbased resources.

      (10) Weakness: As an ensemble, the data points CANX as required for plasma membrane expression, particularly those that lie in the C-terminal domain, but when considering individual CF variants, there is no clear trend. Similarly, when looking at the effect of the pharmacological correctors on PME, no variant strays from the linear trend.

      We generally agree that the predominant trend is a uniform decrease in CFTR PME across all variants and that individual variant effects are hard to generalize. Indeed, this latter point has been widely appreciated in the CF community for several decades. Our approach exposes this variability in detail, but we concede that we cannot yet fully interpret the full complexity of the trends.

      (11) Something to consider: Knockout of calnexin, a central ER chaperone, is going to set off the UPR, which in turn will activate the ISR and attenuate translation. From what I can tell, in general, all CF variant PME is decreased. Is this simply because less CF protein is being synthesized?

      The reviewer raises an excellent point. However, to investigate this possibility further, we compared whole-cell proteomic data for the parental and knockout cell lines. Our analysis suggests there is no significant upregulation of proteins associated with UPR activation, as is shown in the graphic to the right. In fact, only proteins associated with the PERK branch of the UPR exhibit any statistically significant changes between these two cell lines across three biological replicates. Based on this consideration, we suspect any wider changes in ER proteostasis must be relatively subtle. 

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      However, some methodological choices, such as the use of a 5-year sliding window to compute trend values, are insufficiently justified and under-explained. The paper also does not fully address disparities in data coverage across disciplines and time, which may affect the reliability of historical comparisons. Finally, minor issues in grammar and clarity reduce the overall polish of the manuscript.

      We thank the reviewer for pointing out the weakness of the manuscript. We addressed these comments in our response to Recommendations A and B. Minor grammar and clarity issues have also been addressed.

      Reviewer #2 (Public review):

      The first thing that comes to mind is the epistemic mechanism of the study. Why should there be a joint discussion combining internationalism and interdisciplinarity? While internationalism is the tendency to form multinational research teams to work on research projects, interdisciplinarity refers to the scope and focus of papers that draw inspiration from multiple fields. These concepts may both fall into the realm of diversity, but it remains unclear if there is any conceptual interplay that underlies the dynamics of their increase in research journals.

      We thank the reviewer for pointing out the lack of clarity in our decision to conduct a joint discussion of interdisciplinarity and internationalization.

      It is a well-known fact that team science has increased in importance over time. An important question then is whether teams have only grown in size and frequency or whether they have changed in other aspects. Interdisciplinarity and internationalization are two aspects in which teams could have changed.

      We revised the Introduction (Lines 68–70 of the revised manuscript) to address this matter.

      It is also unclear why internationalization is increasing. Although the authors have provided a few prominent examples in physics, such as CERN and LAGO, which are complex and expensive experimental facilities that demand collective efforts and investments from the global scientific community, whether some similar concerns or factors drive the growth of internationalism in other fields remains unknown. I can imagine that these concerns do not always apply in many fields, and the authors need to come up with some case studies in diverse fields with some sociological theory to support their empirical findings.

      We thank the reviewer for requesting further evidence concerning why our findings may be correct. Physics is an area where the need for extraordinary resources has naturally led to large international collaborative efforts. As we discuss in line 255 of the revised manuscript, this is actually also the case for biology. The Human Genome Project and subsequent projects have also required massive investments, leading to further internationalization.

      We believe that the drive toward internationalization for medicine has to do with the need for establishment of robust results that are not specific to a single country or medical system. Additionally, the impact of global epidemics — Acquired immunodeficiency Syndrome (AIDS), Severe Acute Respiratory Syndrome (SARS) — has also increased the needs to involve researchers from around the world.

      The case for increased internationalization in the social sciences is, we believe, related to the desire to identify phenomena that extend beyond the Western, educated, industrialized, rich and democratic (WEIRD) societies.

      We have expanded the discussion around these points in lines 274–283 of the revised manuscript.

      The authors use Shannon entropy as a measure of diversity for both internationalism and interdisciplinarity. However, entropy may fail to account for the uneven correlations between fields, and the range of value chances when the number of categories changes. The science of science and scientometrics community has proposed a range of diversity indicators, such as the RaoStirling index and its derivatives. One obvious advantage of the RS index is that it explicitly accounts for the heterogeneous connections between fields, and the value ranges from 0 to 1. Using more state-of-the-art metrics to quantify interdisciplinarity may help strengthen the data analytics.

      We thank the reviewer for pointing the need to provide a deeper discussion of the impact of different metrics on how disciplinary diversity is calculated. We chose Shannon’s entropy because it accounts for both richness (the number of distinct fields) and evenness (the balance of representation across fields). While measures such as the Rao-Stirling index can be very useful when considering disciplines at different levels of aggregation, since to consider only level 0 Field-of-Study (FoS) tags, that problem is not as much a concern for our analysis.

      We have added a further clarification in lines 145–151 of the revised manuscript.

      Reviewer #1 (Recommendations for the authors)

      Ambiguity in the Trend Calculation Methodology in Figure 4 and 5

      The manuscript uses a 5-year sliding window to calculate recent trends in interdisciplinarity (I<sub>d</sub>) and internationalization (I<sub>n</sub>), but the method is not clearly described. Could the authors clarify whether the trend is calculated by (1) performing linear regression on the index values over the past 5 years, (2) using the regression slope as the trend value, and (3) interpreting the sign and magnitude of the slope to indicate increasing, decreasing, or stable trends? Additionally, the rationale for choosing a 5-year window over other durations (e.g., 10 or 15 years) is not discussed. Given that different time windows could yield different insights, a brief justification or sensitivity check would strengthen the methodological transparency.

      Thank you for pointing the lack of clarity in our description. In an attempt to increase clarity, we added a specific case study to illustrate the use of 5-year trend in the Supplementary Information: Estimation of tendency of the revised manuscript (Lines 691–704 of the revised manuscript).

      Specifically, imagine we want to calculate the trend of the Interdisciplinarity Index for 2010 for Annalen der Physik. We would perform an ordinary least squares linear fit to the 6 data points for the Index in years 2005–2010.

      The reason to focus on a 5-year window is two-fold. First, a longer time period would — as suggested by the data on Figure S10 — likely aggregate over multiple trends. Second, a shorter time period would result in too great an uncertainty in the estimation of the trend.

      This is the reason why we did not implement a sensitivity analysis. Reasonable time windows that consider the two reasons expressed above would be too narrow to provide a worthwhile analysis.

      Lack of Discussion on Temporal Coverage Disparities Across Disciplines

      The study spans publications from 1900 to 2021, but the completeness and representativeness of the data-especially in earlier decades-may differ significantly across disciplines. For instance, OpenAlex has limited coverage for publications before the mid-20th century, and disciplines such as Medicine and Political Science may have adopted journal-based publishing at different historical periods compared to Physics or Chemistry. These temporal disparities could bias cross-disciplinary comparisons of long-term trends in interdisciplinarity and internationalization. I recommend that the authors briefly discuss this limitation and, if possible, report when coverage becomes reliable for each discipline. A sensitivity analysis starting from a common baseline year (e.g., 1950 or 1970) could also help assess whether the observed disciplinary differences are driven in part by unequal temporal data availability.

      We thank the reviewer for the requesting further clarification on this matter. We completely agree that “completeness and representativeness of the data – especially in earlier decades-may differ significantly across disciplines”. That is exactly the reason why we made the analyses choices described in the manuscript.

      Indeed, we consider only three journals for the analysis of the entire 1900–2021 period. Those 3 journals, Nature, PNAS and Science are ones that we know to be well recorded.

      When conducting the disciplinary analysis, we focus on the period 1960–2021. While we know that the coverage for the social sciences is less robust until the 1990s, we address this concern by implementing several safeguards:

      Manual selection of representative journals in each discipline to ensured that their publications are well represented in OpenAlex.

      Decade by decade analysis of interdisciplinarity and internationalization so that changes over time can be identified and potential issues with data coverage are restricted to only some aspects of the analysis.

      We also acknowledge the potential coverage disparities in earlier years of the data source (Lines 319-326 of the revised manuscript).

      The authors use both interdisciplinarity and multidisciplinarity. While these concepts offer similar definitions of diversity, it may help the reader if there is some explanation to clarify their subtle differences. (Reviewer #2)

      It is a well-known fact that team science has increased in importance over time. An important question then is whether teams have only grown in size and frequency or whether they have changed in other aspects. Interdisciplinarity and internationalization are two aspects in which teams could have changed.

      We revised the Introduction (Lines 68–70 of the revised manuscript) to address this matter.

      Minor Comments

      Several sentences

      (1) Line 11: The phrase “authors form multiple countries” contains a typographical error. The word “form” should be corrected to “from” so that the sentence reads: “authors from multiple countries.”

      tences and phrases throughout the manuscript could be improved for grammatical accuracy, clarity, and stylistic appropriateness:

      (2) Line 63: The clause “these expansion is well described by a logistic model” contains a subject-verb agreement error. “These” should be replaced by the singular demonstrative pronoun “this”, resulting in: “This expansion is well described by a logistic model.”

      (3) Line 89: The phrase “were quickly overcame” misuses the verb form. “Overcame” is a past tense form and should be replaced with the past participle “overcome” to match the passive construction. Suggested revision: “were quickly overcome.”

      (4) Line 106: The verb “refered” is misspelled. It should be corrected to “referred” for proper past tense. The corrected phrase should read: “we referred to...”

      (5) Line 127: The phrase “sing discipline papers” contains a typographical error. “Sing” should be “single”, yielding: “single discipline papers.”

      (6) Lines 238–239: The sentence “An exception to this pattern are the two mega open-access journals: PLOS One and Scientific Reports, which have internationalization indices as high the the most internationalized Physics journals.” contains multiple grammatical issues.

      First, the subject “An exception” is singular, but the verb “are” is plural; this results in a subject-verb agreement error.

      Second, the phrase “the the” includes a typographical repetition.

      Third, the comparative construction is incomplete; “as high the the...” is ungrammatical and should use “as high as.”

      Suggested revision: “An exception to this pattern is the pair of mega open-access journals— PLOS One and Scientific Reports—which have internationalization indices as high as those of the most internationalized Physics journals.”

      (7) Line 254: The sentence “biological research been revolutionized...” lacks an auxiliary verb. To be grammatically correct, it should read: “biological research has been revolutionized...”

      (8) Line 258: The phrase “need global spread of...” is syntactically awkward. Depending on the intended meaning, it could be revised to either “the global spread of...” or “the global need for the spread of...” for clarity.

      (9) Figure S2 Caption: The term “Microsofe Academic Graph” is a typographical error and should be corrected to “Microsoft Academic Graph.”

      (10) Reference [40]: The link “ttps://doi.org/10.1038/nature02168” is missing the “h” in “https.” The corrected version is: “https://doi.org/10.1038/nature02168.”

      We appreciate your comments on the grammar and clarity of the manuscript. We have thoroughly reviewed and corrected these issues to improve the overall clarity of the text.

      Line 11: We changed the typo “form” to “from”.

      Line 63: We changed the sentence to “There has been a significant expansion in the number of countries where scientists are publishing in selective journals”.

      Line 89 (Line 93 of the revised manuscript): We revised the sentence as suggested, and the revised sentence becomes “Even the significant impacts on publication rates of the two World Wars were quickly overcome, and exponential growth resumed. ”

      Line 106 (Line 110 of the revised manuscript): We changed the typo “refered” to “referred”.

      Line 127 (Line 131 of the revised manuscript): We changed the typo “Sing” to “single”.

      Lines 238-239 (Lines 245-247 of the revised manuscript): We thank the issues pointed out by the reviewer, and we took the reviewer’s suggested version and changed the original sentence to “An exception to this pattern is the pair of mega open-access journals — PLOS One and Scientific Reports — which have internationalization indices as high as those of the most internationalized Physics journals”.

      Line 254 (Line 262 of the revised manuscript): We added the auxiliary verb to the sentence, and the sentence now becomes “biological research has been revolutionized”

      Line 258 (Line 266 of the revised manuscript): We changed the phrase to “the global need for the spread of”.

      Figure S2 Caption: We corrected the typo of “Microsoft Academic Graph”.

      Reference [40]: We corrected the URL of the reference.

      Reviewer #2 (Recommendations for author):

      Some typos:

      (1) Page 2: On page 2, “contributions from a multiple disciplines” and ”these expansion is well described”.

      (2) Page 4: “World Wars were quickly overcame”.

      (3) Page 5: “to quantify the the internationalization of a journal”.

      (4) Page 10: “indices as high the the most internationalized Physics journals”

      (5) Page 10: The sentence “indices as high the the most internationalized Physics journals” contains multiple issues. The phrase “the the” is a typographical error, and the comparative construction is incomplete. It should be revised to: “indices as high as those of the most internationalized Physics journals.”

      We revised those typographical errors on page 2, 4, 5, and 10 pointed out by the reviewer. We truly thank the reviewer’s critical examination on the syntax of the manuscript.

      Page 2: We removed “a” so now the sentence reads: “contributions from multiple disciplines.”

      Page 2: We changed the sentence to “There has been a significant expansion in the number of countries where scientists are publishing in selective journals”.

      Page 4: We replaced “overcame” with the past participle “overcome” , resulting in: “World Wars were quickly overcome.”

      Page 5: The phrase “to quantify the the internationalization of a journal” contains a typographical repetition. We changed it to: “to quantify the internationalization of a journal.”

      Page 10: For the sentence “indices as high the the most internationalized Physics journals”, we removed duplicated “the” as a typographical error. We revised the sentence into: “indices as high as those of the most internationalized Physics journals.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors should provide a detailed description of the pathogenesis of Haemorrhagic Fever with Renal Syndrome (HFRS) and elaborate on the crucial role of IgG proteins in the disease's progression (line 65).

      As suggested, we have now provided a detailed description of the pathogenesis of HFRS and elaborated on the crucial role of IgG proteins in the disease's progression:

      "Hantaviruses are tri-segmented, single-stranded, negative-sense RNA viruses, whose genomes consist of three regions: large (L), medium (M), and small (S). The glycoproteins Gn and Gc, encoded by the M segment, can infect target cells - primarily vascular endothelial cells - via β3 integrin receptors (Pizarro et al., 2019). Simultaneously, they could also infect other cell types, such as mononuclear macrophages and dendritic cells, leading to systemic viral infection. Although hantavirus replication is thought to occur primarily in the vascular endothelium without direct cytopathic effects, a plethora of innate immune cells mediate host antiviral defenses. These include natural killer cells, neutrophils, monocytes, and macrophages, together with pattern recognition receptors (PRRs), interferons (IFNs), antiviral proteins, and complement activation, e.g., via the pentraxin 3 (PTX3) pathway, which can exacerbate HFRS disease progression leading to immunopathological damage through cytokine/chemokine production, cytoskeletal rearrangements in endothelial cells, ultimately amplifying vascular dysfunction (Tariq & Kim, 2022). Rapid and effective humoral immune responses, however, such as neutralizing antibody responses targeting the glycoproteins Gn/Gc, contribute to rapid recovery from HFRS and are critical for protection from severe disease (Engdahl & Crowe, 2020; Li et al., 2020)." Please see the Introduction (Page 4, lines 65-81).

      (2) An additional discussion on the significance of glycosylation, particularly IgG N-glycosylation, in viral infections should be included in the Introduction section.

      Thank you for the suggestion and we have added an additional discussion on the significance of glycosylation in viral infections in the revised Introduction section.

      "Immunoglobulin G (IgG) N-linked glycosylation mediates critical functions modulating antiviral immunity during viral infection. Changes in the conserved N-linked glycan Asn297 in the Fc region of IgG typically by fucosylation, galactosylation, or sialylation can alter antibody effector function. A reduction in core fucosylation decreases IgG binding to NK cell FcγRIIIa promotes antibody-dependent cellular cytotoxicity (ADCC) necessary for clearance of viruses, including SARS-CoV-2, dengue and HIV-1 whereas sialylation can attenuate immune responses resulting in immune evasion (Ash et al., 2022; Haslund-Gourley et al., 2024; Hou et al., 2021; Wang et al., 2017). Changes in IgG and other protein N-linked glycosylation profiles therefore shape virus-host interactions and disease progression." (Page 4, lines 82-91).

      (3) In the abstract section, the authors state that HTNV-specific IgG antibody titers were detected and IgG N-glycosylation was analyzed. However, the analysis of plasma IgG N-glycans is described in the Methods section. Therefore, the authors should clarify the glycome analysis process. Was the specific IgG glycome profile similar to the total IgG N-glycome? Given the biological relevance of specific IgG in immunological diseases, characterizing the specific IgG N-glycome profile would be more significant than analyzing the total plasma IgG.

      We are grateful to the reviewer for the comments. Previous studies on viral infections have revealed that the pattern of virus-specific IgG N-glycans may be similar to that of total IgG N-glycome, and we therefore analyzed the total plasma IgG glycosylation profiling in the HFRS patients. However, we have discussed this in the Discussion section.

      "Despite establishing a well-characterized patient cohort and performing systematic IgG glycosylation profiling based on HTNV NP antibody status, this study has several noteworthy limitations. Most notably, while preliminary comparisons suggested similar patterns between virus-specific and total IgG N-glycome, our total plasma IgG analysis may have introduced confounding factors in the observed associations. This methodological constraint could potentially affect the interpretation of certain disease-specific glycosylation signatures." Please see the Discussion (Page 12, lines 274-280). 

      References

      (1) Mads Delbo Larsen, Erik L de Graaf, Myrthe E Sonneveld, et al. Afucosylated IgG characterizes enveloped viral responses and correlates with COVID-19 severity. Science . 2021 Feb 26;371(6532):eabc8378.

      (2) Chakraborty S, Gonzalez J, Edwards K, et al. Proinflammatory IgG Fc structures in patients with severe COVID-19. Nat Immunol. 2021 Jan;22(1):67-73.

      (3) Tea Petrović, Amrita Vijay, Frano Vučković, et al. IgG N-glycome changes during the course of severe COVID-19: An observational study. EBioMedicine. 2022 Jul ;81: 104101. 

      (4) Hou H, Yang H, Liu P, et al. Profile of Immunoglobulin G N-Glycome in COVID-19 Patients: A Case-Control Study. Front Immunol. 2021 Sep 23;12:748566.

      (4) Further details regarding the N-glycome analysis should be provided, including the quantity of IgG protein used and the methodology employed for analyzing IgG N-glycans (lines 286-287).

      As suggested, we have provided further details regarding the N-glycome analysis in the Method section.

      "Briefly, the diluted plasma samples were transferred onto a 96-well protein G monolithic plate (BIA Separations, Slovenia) for the isolation of IgG. The isolated IgG was eluted with 1 mL of 0.1 M formic acid and was immediately neutralized with 170 µL of 1M ammonium bicarbonate.

      The released N-glycans were labelled with 2-aminobenzamide (2-AB) and were then purified from a mixture of 100% acetonitrile and ultrapure water in a 1:1 ratio (v/v). This was then analyzed by hydrophilic interaction liquid chromatography using ultra-performance liquid chromatography (HILIC-UPLC; Walters Corporation, Milford, MA) (Hou et al., 2019). As previously reported, the chromatograms were separated into 24 IgG glycan peaks (GPs) (Menni et al., 2018)." Please see the Method section (Page 15, lines 346-355).

      (5) Additional statistical analyses should be performed, including multiple comparisons with p-value adjustment, false discovery rate (FDR) control, and Pearson correlation (line 291).

      As suggested, we have performed additional statistical analyses and mentioned the results in the revised manuscript.

      "Positive correlations were observed between the ASM subsets and both galactosylation (p=0.017, r<sub>s</sub>=0.418) and sialylation (p=0.008, r<sub>s</sub>=0.458) in the antibody Fc region, as well as between the PB subsets and sialylation (p=0.036, r<sub>s</sub>=0.372) (Figure 4A-C). (Page 8, lines 180-183)"

      "The Benjamini - Hochberg (BH) method was used to adjust the raw p-values from DEG analysis, controlling the false discovery rate (FDR)." Please see the Materials and Methods (Page 16, lines 369-371).

      (6) Quality control should be conducted prior to the IgG N-glycome analysis. Additionally, both biological and technical replicates are essential to assess the reproducibility and robustness of the methods.

      Thank you for the suggestion. We have added descriptions on the biological and technical replicates in the Method section.

      "Our study incorporated both biological and technical replicates to ensure a robust glycomic profiling analysis. Specifically, we analyzed paired acute/convalescent-phase samples from 65 confirmed HFRS patients to assess inter-individual biological variability, while technical reproducibility was validated through comparison with standard chromatographic peak plots (Vučković et al., 2016). This dual-replicate strategy enabled a comprehensive evaluation of both biological heterogeneity and assay precision." (Page 15, lines 356-362).

      (7) Multiple regression analysis should be conducted to evaluate the influence of genetic and environmental factors on the IgG N-glycome.

      As suggested, we have conducted multiple regression analysis to evaluate the influence of genetic and environmental factors on the IgG N-glycome. These results have been provided in the revised Result section.

      "Multivariate linear regression was employed to mitigate potential confounding by genetic and environmental factors in the glycomics analysis. While no significant associations were observed for most glycan models (fucosylation, p=0.526; bisecting GlcNAc, p=0.069; and sialylation, p=0.058), we discovered sex showed a potentially influential effect on galactosylation (p=0.001) (Supplementary files 5-8). These results suggest that while most glycan features appear unaffected by the examined covariates, galactosylation may be subject to sex-specific biological regulation." (Page 7, lines 153-160).

      (8) Line 196. Additional discussions should be included, focusing on the underlying correlation between the differential expression of B-cell glycogenes and the dysregulated IgG N-glycome profile, as well as the potential molecular mechanisms of IgG N-glycosylation in the development of HFRS.

      Thank you for your suggestions. We have added these contents in the Discussion section.

      "Antibody-related glycogenes are significantly activated following Hantaan virus infection. We noted that ribophorin I and II (RPN1 and RPN2) were significantly upregulated in the ASM/IM/PB/RM subsets after Hantaan virus infection, which linked the high mannose oligosaccharides with asparagine residues found in the Asn-X-Ser/Thr consensus motif (Hwang et al., 2025). We speculate that they continuously attach the synthesized glycan chains to the constant region of antibodies during antibody synthesis. Similarly, fucosyltransferase 8 (FUT8) in the ASM subset, catalyzing the alpha1-2, alpha1-3, and alpha1-4 fucose addition (Wang & Ravetch, 2019; Yang et al., 2015), was downregulated in the mRNA translation, and the levels of fucosylated antibodies were naturally lower in the acute HFRS patients. Meanwhile, the beta-1,4-galactosyltransferase (beta4GalT) gene expression was significantly elevated in the ASM subpopulation during the acute phase, which also correlated with increased levels of galactosylated antibodies in serum (Wang & Ravetch, 2019). However, we did not observe significant upward changes in sialyltransferase mRNA expression in the acute HFRS patients, similar with the finding from severe COVID-19 cohorts (Haslund-Gourley et al., 2024). The neuraminidase 1 (NEU1) gene is strikingly upregulated and may potentially explain the decreased sialylation on the secreted HTNV-specific IgG antibodies during convalescence. Overall, the glycosylation of immunoglobulin G is regulated by a large network of B-cell glycogenes during HTNV infection." Please see the Discussion (Page 11, lines 254-273).

      Reviewer #2 (Public review):

      (1) While it is great to reference prior publications in the Materials and Methods section, the current level of detail is insufficient to clearly understand the study design and experimental procedures performed. Readers should not be expected to consult multiple previous papers to grasp the core methodological aspects of the present paper. For instance, the categorization of HFRS patients into different clinical subtypes/ courses, and the methods for measuring Fc glycosylation should be explicitly described in the Materials and Methods section of this manuscript. 

      Many thanks for your comments. We have added more details regarding the study design and experimental procedures in the Materials and Methods section. "Clinical specimens were collected from HFRS patients who were hospitalized in Baoji Central Hospital between October 2019 and January 2022. Patients were categorized into four clinical subtypes (mild, moderate, severe, and critical) based on the diagnostic criteria for HFRS issued by the Ministry of Health (Ma et al., 2015). This study was approved by the ethics committee of the Shandong First Medical University & Shandong Academy of Medical Sciences (R201937). Written informed consent was obtained from each participant or their guardians.

      The clinical course of HFRS is grouped into acute (febrile, hypotensive, and oliguric stages) and convalescent (diuretic and convalescent stages) phases. The acute phase was defined as within 12 days of illness onset, and the convalescent phase was defined as a period of illness lasting 13 days or longer (Tang et al., 2019; Zhang et al., 2022). The earliest sample was selected if there were multiple blood samples available in the acute phase and the last available sample before discharge was selected if there were multiple blood samples in the convalescent phase.

      Briefly, the diluted plasma samples were transferred onto a 96-well protein G monolithic plate (BIA Separations, Slovenia) for the isolation of IgG. The isolated IgG was eluted with 1 mL of 0.1 M formic acid and was immediately neutralized with 170 µL of 1M ammonium bicarbonate.

      The released N-glycans were labelled with 2-aminobenzamide (2-AB) and were then purified from a mixture of 100% acetonitrile and ultrapure water in a 1:1 ratio (v/v). This was then analyzed by hydrophilic interaction liquid chromatography using ultra-performance liquid chromatography (HILIC-UPLC; Walters Corporation, Milford, MA) (Hou et al., 2019). As previously reported, the chromatograms were separated into 24 IgG glycan peaks (GPs) (Menni et al., 2018)." Please see the Materials and Methods (Page 13, lines 290-303, and Page 15, lines 346-355).

      (2) The authors should explain the nature of their cohort in a bit more detail. While it appears that HFRS cases were identified based on IgM ELISA and/or PCR, these are indicators of the Haantan virus infection. My understanding is that not all Haantan virus infections progress to HFRS. Thus, it is unclear whether all patients in the HFRS group actually had hemorrhagic fever. This distinction is critical for interpreting how the results observed relate to disease severity.

      We are sincerely grateful for this valuable suggestion. We have carefully revised Figure 1 and the texts (Page 5, lines 104-107) in the revised manuscript.

      "To characterize the humoral immune profiles in HFRS patients, we enrolled 166 suspected HTNV-infected patients who were admitted to Baoji Central Hospital in Shaanxi Province, China, between October 2019 and January 2022. Among them, 65 met the inclusion criteria and were included in the study (Figure 1)."

      (3) The authors state that: "A 4-fold or greater increase in HTNV-NP-specific antibody titers usually indicates a protective humoral immune response during the acute phase", but they do not cite any references or provide any context that supports this claim. Given that in their own words, one of the most significant findings in the study is changes in glycosylation coinciding with this 4-fold increase, it is important to ground this claim in evidence. Without this, the use of a 4-fold threshold appears arbitrary and weakens the rationale for using this immune state as a proxy for protective immunity.

      Thank you for the suggestion and we have provided relevant references in the Results section (Page 8, lines 171-173).

      According to the Expert Consensus on Prevention and Treatment of Hemorrhagic  Fever with Renal Syndrome (HFRS) (https://ts-cms.jundaodsj.com/file/163823638693909.pdf), a confirmed diagnosis requires, based on a suspected or clinical diagnosis, one of the following: positive serum-specific IgM antibodies, detection of Hantavirus RNA in patient specimens, a four-fold or greater rise in titer of serum-specific IgG antibodies in the convalescent phase compared to the acute phase, or isolation of Hantavirus from patient specimens. A four-fold or greater rise in titer of convalescent serum-specific IgG antibodies compared to the acute phase not only suggests a recent Hantaan virus infection, but also the production of antibodies helping to combat the viral infection. In addition, the antibody glycosylation modifications may thus play a significant role in the antiviral immune response.

      (4) The authors also claim that changes in Fc glycosylation influence recovery from HFRS - a point even emphasized in the manuscript title. However, this conclusion is not well supported by the data for two main reasons. First, the authors appear to measure bulk IgG Fc glycans, not Fc glycans of Hantaan virus-specific antibodies. While reasonable, this is something that should be communicated in the manuscript. Hantaan virus-specific antibodies are likely a very small fraction of total circulating IgG antibodies (perhaps ~1%), even during acute infection. As a result, changes in bulk Fc glycosylation may (or may not) accurately reflect the glycosylation state of Hantaan virus-specific antibodies. Second, even if the bulk Fc glycan shifts do mirror those of Hantaan virus-specific antibodies, it remains unclear whether these changes causally drive recovery or are merely a consequence of the infection being resolved. Thus, while the differences in Fc glycosylation observed are interesting - and it is tempting to speculate on their functional significance - the manuscript treats the observed correlations as causal mechanistic insight without sufficient data or justification.

      Thank you for your valuable comments. This study measured bulk IgG Fc glycans, not Fc glycans of Hantaan virus-specific antibodies. We have described this limitation in the Discussion section (Page 12, lines 274-280). As reported in previous studies (references provided below), the changed pattern of virus-specific IgG N-glycans may reflect the total IgG N-glycome. Nevertheless, more studies are clearly needed to directly measure virus-specific IgGs and to clarify the causal mechanistic insights.

      References

      (1) Mads Delbo Larsen, Erik L de Graaf, Myrthe E Sonneveld, et al. Afucosylated IgG characterizes enveloped viral responses and correlates with COVID-19 severity. Science. 2021 Feb 26;371(6532): eabc8378.

      (2) Chakraborty S, Gonzalez J, Edwards K, et al. Proinflammatory IgG Fc structures in patients with severe COVID-19. Nat Immunol. 2021 Jan;22(1):67-73.

      (3) Tea Petrović, Amrita Vijay, Frano Vučković, et al. IgG N-glycome changes during the course of severe COVID-19: An observational study. EBioMedicine. 2022 Jul ;81: 104101. 

      (4) Hou H, Yang H, Liu P, et al. Profile of Immunoglobulin G N-Glycome in COVID-19 Patients: A Case-Control Study. Front Immunol. 2021 Sep 23;12: 748566.

      (5) Fc glycosylation is known to be influenced by covariates such as age and sex. While it is helpful that the authors stratified the patients by age group and looked for significant differences in glycosylation across them, a more robust approach would be to directly control for these covariates in the statistical analysis - such as by using a linear mixed effects model, in which disease state (e.g., acute vs. convalescent), age, and sex are treated as fixed effects, and subject ID is included as a random effect to account for repeated measures. This would allow the authors to assess whether observed differences in Fc glycosylation remain significant after accounting for potential confounders. This could be important given that some of the reported differences are quite small, for example, 94.29% vs. 94.89% fucosylation.

      Thank you for your valuable suggestion. As suggested, we have conducted multiple regression analysis to evaluate the influence of genetic and environmental factors on the IgG N-glycome, and have provided these results in the revised Result section.

      "Multivariate linear regression was employed to mitigate potential confounding by genetic and environmental factors in the glycomics analysis. While no significant associations were observed for most glycan models (fucosylation, p=0.526; bisecting GlcNAc, p=0.069; and sialylation, p=0.058), we discovered sex showed a potentially influential effect on galactosylation (p=0.001) (Supplementary files 5-8). These results suggest that while most glycan features appear unaffected by the examined covariates, galactosylation may be subject to sex-specific biological regulation." (Page 7, lines 153-160).

      (6) The manuscript states that there are limited studies on antibody glycosylation in the context of HFRS, but does not cite any relevant literature. If prior work exists, it should be cited to contextualize the current study. If no prior studies have been conducted/reported, to the author's knowledge, that should be stated explicitly to show the novelty of the work.

      Thank you for your suggestion. To our knowledge, there has been no prior reports regarding the regulation of IgG glycosylation in HFRS, particularly in relation to seroconversion. We have reworded this sentence in the revised manuscript. "Importantly, there have not been prior studies specifically examining plasma IgG N-glycome profiles derived from chromatographic peak data in HFRS patients, particularly in relation to seroconversion status. This gap in our knowledge motivated our systematic investigation of both total and virus-specific IgG glycosylation dynamics during acute infection." Please see the Introduction (Page 5, lines 92-96).

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      (1) Line 47, 78: The use of the word 'However' appears to be an incorrect expression.

      We have made this correction.

      (2) Line 127: The term 'glycome' should be replaced with 'N-glycome,' and all relevant expressions should be corrected accordingly, such as 'N-glycosylation.

      We have made this correction.

      (3) Line 84-87: The sentence 'A total of 166 HFRS patients...' contains a grammatical error.

      We have made tis correction (Page 5, lines 99-101).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors describe a good-quality ancient maize genome from 15th-century Bolivia and try to link the genome characteristics to Inca influence. Overall, the manuscript is below the standard in the field. In particular, the geographic origin of the sample and its archaeological context is not well evidenced. While dating of the sample and the authentication of ancient DNA have been evidenced robustly, the downstream genetic analyses do not support the conclusion that genomic changes can be attributed to Inca influence. Furthermore, sections of the manuscript are written incoherently and with logical mistakes. In its current form, this paper is not robust and possibly of very narrow interest. 

      Strengths: 

      Technical data related to the maize sample are robust. Radiocarbon dating strongly evidenced the sample age, estimated to be around 1474 AD. Authentication of ancient DNA has been done robustly. Spontaneous C-to-T substitutions, which are present in all ancient DNA, are visible in the reported sample with the expected pattern. Despite a low fraction of C-to-T at the 1st base, this number could be consistent with the cool and dry climate in which the sample was preserved. The distribution of DNA fragment sizes is consistent with expectations for a sample of this age. 

      Weaknesses: 

      Thank you for all your thoughtful comments. See below for comments on each.

      (1) Archaeological context for the maize sample is weakly supported by speculation about the origin and has unreasonable claims weighing on it. Perhaps those findings would be more convincing if the authors were to present evidence that supports their conclusions: i) a map of all known tombs near La Paz, ii) evidence supporting the stone tomb origins of this assemblage, and iii) evidence supporting non-Inca provenance of the tomb. 

      We believe we are clear about what information we have about context.  First, the intake records from the MSU Museum from 1890 are not as detailed as we would like, but we cannot enhance them. The mummified girl and her accoutrements, including the maize, came from a stone tower or chullpa south of La Paz, in what is now Bolivia. We do not know which stone chullpa, so a map would be of limited use.  The mortuary group is identified as Inca, but as we note the accoutrements do not appear of high status, so it is possible that she is not an elite.  Mud tombs are normally attributed to the local population, and stone towers to Inca or elites. We have clarified at multiple places in the text that the maize is from the period of Inca incursion in this part of Bolivia and have modified text to reflect greater uncertainty of Inca or local origin, but that selection for environmentally favorable characteristics had taken place.  Regardless, there are three 15th c CE or AD AMS ages on the maize, a cucurbita rind, and a camelid fiber.  The maize is almost certainly mid to late 15th century CE.

      (2) Dismissal of the admixture in the reported samples is not evidenced correctly. Population f3 statistic with an outgroup is indeed one of the most robust metrics for sample relatedness; however, it should not be used as a test of admixture. For an admixture test, the population f3 statistic should be used in the form: i) target population, ii) one possible parental population, iii) another possible parental population. This is typically done iteratively with all combinations of possible parental populations. Even in such a form, the population f3 statistic is not very sensitive to admixture in cases of strong genetic drift, and instead population f4 statistic (with an outgroup) is a recommended test for admixture. 

      We have removed “Our admixture f3-statistics test results suggest aBM is not admixed” in our revised manuscript. Since our goal here is to identify which group(s) has(have) the highest relatedness with aBM, so population f3 statistic with an outgroup is the most robust metric to do the test and to support our conclusion here.

      (3) The geographic placement of the sample based on genetic data is not robust. To make use of the method correctly, it would be necessary to validate that genetic samples in this region follow the assumption of the 'isolation-by-distance' with dense sampling, which has not been done. Additionally, the authors posit that "This suggests that aBM might not only be genetically related to the archaeological maize from ancient Peru, but also in the possible geographic location." The method used to infer the location is based on pure genetic estimation. The above conclusion is not supported by this method, and it directly contradicts the authors' suggestion that the sample comes from Bolivia.  

      We understood that it is necessary to validate the assumption of the 'isolation-by-distance' with dense sampling. But we did not do it because: 1) the ancient maize age ranges from ~5000BP to ~100BP and they were found in very different countries at different times. 2) isolation-by-distance is a population genetic concept and it's often used to test whether populations that are geographically farther apart are also more genetically different. Considering we only have 17 ancient samples in total our sample size is not sufficient for a big population test.

      For "It directly contradicts the authors' suggestion that the sample comes from Bolivia.”, as we described in our manuscript that “Given the provenience of the aBM and its age, it is possible the samples were local or alternatively were introduced into western highland Bolivia from the Inca core area – modern Peru.” The sample recording file did show the aBM sample was found in Bolivia, but we do not know where aBM originally came from before it was found in Bolivia. To answer this question, we used locator.py to predict the potential geographic location that aBM may have originally come from, and our results showed that the predicted location is inside of modern Peru and is also very close to archaeological Peruvian maize.  

      Therefore, our conclusion that "This suggests that aBM might not only be genetically related to the archaeological maize from ancient Peru, but also in the possible geographic location” does not contradict that the sample was found Bolivia.

      (4) The conclusion that Ancient Andean maize is genetically similar to European varieties and hence shares a similar evolutionary history is not well supported. The PCA plot in Figure 4 merely represents sample similarity based on two components (jointly responsible for about 20% of the variation explained), and European samples could be very distant based on other components. Indeed, the direct test using the outgroup f3 statistic does not support that European varieties are particularly closely related to ancient Andean maize. Perhaps these are more closely related to Brazil? We do not know, as this has not been measured. 

      Our conclusion is “We also found that a few types of maize from Europe have a much closer distance to the archaeological maize cluster compared to other modern maize, which indicates maize from Europe might expectedly share certain traits or evolutionary characteristics with ancient maize. It is also consistent with the historical fact that maize spread to Europe after Christopher Columbus's late 15th century voyages to the Americas. But as shown, maize also has diversity inside the European maize cluster. It is possible that European farmers and merchants may have favored different phenotypic traits, and the subsequent spread of specific varieties followed the new global geopolitical maps of the Colonial era”.

      We understood your concerns that two components only explain about 20% of the variation. But as you can see from the Figure 2b in Grzybowski, M.W. et al., 2023 publication, it described that “the first principal component (PC1) of variation for genetic marker data roughly corresponded to the division between domesticated maize and maize wild relatives is only 1.3%”. It shows this is quite common in maize, especially when the datasets include landraces, hybrids, and wild relatives. For our maize dataset, we have archaeological maize data ranging from ~5,000BP to ~100BP, and we also have modern maize, which makes the genetic structure of our data more complicated. Therefore, we think our two components are currently the best explanation currently possible. We also included PCA plot based on component 1 and 3 in Fig4_PCA13.pdf. It does not show that the European samples are very distant.

      For “Perhaps these are more closely related to Brazil?”, thank you for this very good question, but we apologize that we cannot answer this question from our current study because our study focuses on identifying the location where aBM originally came from, establishing and explaining patterns of genetic variability of maize, with a specific focus on maize strains that are related to our current aBM. Thus, we will not explore the story between maize from Brazil and European maize in our current study.

      (5) The conclusion that long branches in the phylogenetic tree are due to selection under local adaptation has no evidence. Long branches could be the result of missing data, nucleotide misincorporations, genetic drift, or simply due to the inability of phylogenetic trees to model complex population-level relationships such as admixture or incomplete lineage sorting. Additionally, captions to Figure S3, do not explain colour-coding.  

      We have removed “aBM tends to have long branches compare to tropicalis maize, which can be explained by adaption for specific local environment by time.” in our revised manuscript.

      We have added the color-coding information under Fig. S3 in our revised manuscript.

      (6) The conclusion that selection detected in aBM sample is due to Inca influence has no support. Firstly, selection signature can be due to environmental or other factors. To disentangle those, the authors would need to generate the data for a large number of samples from similar cultural contexts and from a wide-ranging environmental context, followed by a formal statistical test. Secondly, allele frequency increase can be attributed to selection or demographic processes, and alone is not sufficient evidence for selection. The presented XP-EHH method seems more suitable. Overall, methods used in this paper raise some concerns: i) how accurate are allele-frequency tests of selection when only single individual is used as a proxy for a whole population, ii) the significance threshold has been arbitrary fixed to an absolute number based on other studies, but the standard is to use, for example, top fifth percentile. Finally, linking selection to particular GO terms is not strong evidence, as correlation does not imply causation, and links are unclear anyway. 

      In sum, this manuscript presents new data that seems to be of high quality, but the analyses are frequently inappropriate and/or over-interpreted. 

      Regarding your suggestion that “from similar cultural contexts and from a wide-ranging environmental context, followed by a formal statistical test”, we apologize that this cannot be done in our current study because we could not find other archaeological maize samples/datasets that are from similar cultural contexts.

      For “Secondly, allele frequency increase can be attributed to selection or demographic processes, and alone is not sufficient evidence for selection.” Yes, we agree, and that’s why we said it “inferred” the conclusion instead of “indicated”. Furthermore, we revised the whole manuscript following all reviewers’ comments and reorganized and reduced the part on selection on aBM.

      For “The presented XP-EHH method seems more suitable”, we do not think XP-EHH is the best method that could be used here because we only have one aBM sample, but XP-EHH is more suitable for a population analysis.

      For “Finally, linking selection to particular GO terms is not strong evidence, as correlation does not imply causation, and links are unclear anyway.”, as we described in our manuscript, our results “inferred” instead of “indicated” the conclusion.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript presents valuable new datasets from two ancient maize seeds that contribute to our growing understanding of the maize evolution and biodiversity landscape in pre-colonial South America. Some of the analyses are robust, but the selection elements are not supported. 

      Strengths: 

      The data collection is robust, and the data appear to be of sufficiently high quality to carry out some interesting analytical procedures. The central finding that aBM maize is closely related to maize from the core Inca region is well supported, although the directionality of dispersal is not supported. 

      Weaknesses: 

      Thank you for your comments and suggestions. See below for responses and explanations.

      The selection results are not justified, see examples in the detailed comments below. 

      (1) The manuscript mentions cultural and natural selection (line 76), but then only gives a couple of examples of selecting for culinary/use traits. There are many examples of selection to tolerate diverse environments that could be relevant for this discussion, if desired. 

      We have added related examples with references supported in our revised manuscript.  

      (2) I would be extremely cautious about interpreting the observations of a Spanish colonizer (lines 95-99) without very significant caveats. Indigenous agriculture and food ways would have been far more nuanced than what could be captured in this context, and the genocidal activities of the Europeans would have impacted food production activities to a degree, and any contemporaneous accounts need to be understood through that lens.  

      We agree with the first part of this comment and have softened our use of this particular textual material such that it is far less central to interpretation.While of interest, we cannot evaluate the impact of colonial European activities or observational bias for purposes of this analysis.

      (3) The f3 stats presented in Figure 2 are not set up to test any specific admixture scenarios, so it is unsupported to conclude that the aBM maize is not admixed on this basis (lines 201-202). The original f3 publication (Patterson et al, 2012) describes some scenarios where f3 characteristics associate with admixture, but in general, there are many caveats to this approach, and it's not the ideal tool for admixture testing, compared with e.g., f4 and D (abba-baba) statistics.  

      You make an important point that f3 stats is not the ideal tool for admixture testing. Since our study goal here is to identify which group(s) has(have) the highest relatedness with aBM, the population f3 statistic with an outgroup is the most robust metrics with which to do the test and to support our conclusion here. We have removed the “Our admixture f3-statistics test results suggest aBM is not admixed” in our revised manuscript.

      (4) I'm a little bit skeptical that the Locator method adds value here, given the small training sample size and the wide geographic spread and genetic diversity of the ancient samples that include Central America. The paper describing that method (Battey et al 2020 eLife) uses much larger datasets, and while the authors do not specifically advise on sample sizes, they caution about small sample size issues. We have already seen that the ancient Peruvian maize has the most shared drift with aBM maize on the basis of the f3 stats, and the Locator analysis seems to just be reiterating that. I would advise against putting any additional weight on the Locator results as far as geographic origins, and personally I would skip this analysis in this case.  

      As we described in our manuscript, we have 17 archaeological samples in total. Please find more detailed information from the “geographical location prediction” section.

      We cannot add more ancient samples because they are all that we could find from all previous publications. We may still want to keep this analysis because f3 stats indicates the genome similarity, but the purpose of locator.py analysis is indicating the predicted location of origin of a genetic sample by comparing it to a set of samples of known geographic origin. 

      (5) The overlap in PCA should not be used to confirm that aBM is authentically ancient, because with proper data handling, PCA placement should be agnostic to modern/ancient status (see lines 224-226). It is somewhat unexpected that the ancient Tehuacan maize (with a major teosinte genomic component) falls near the ancient South American maize, but this could be an artifact of sampling throughout the PCA and the lack of teosinte samples that might attract that individual.  

      We have removed “which supports the authenticity of aBM as archaeological maize” in our revised manuscript. The PCA was only applied for all maize samples, so we did not include any teosinte samples in the analysis.

      (6) What has been established (lines 250-251) is genetic similarity to the Inca core area, not necessarily the directionality. Might aBM have been part of a cultural region supplying maize to the Inca core region, for example? Without a specific test of dispersal directionality, which I don't think is possible with the data at hand, this is somewhat speculative. 

      We added this and re-wrote this part in our revised manuscript.

      (7) Singleton SNPs are not a typical criterion for identifying selection; this method needs some citations supporting the exact approach and validation against neutral expectations (line 278). Without Datasets S2 and S3, which are not included with this submission, it is difficult to assess this result further. However, it is very unexpected that ~18,000 out of ~49,000 SNPs would be unique to the aBM lineage. This most likely reflects some data artifact (unaccounted damage, paralogs not treated for high coverage, which are extremely prevalent in maize, etc). I'm confused about unique SNPs in this context. How can they be unique to the aBM lineage if the SNPs used overlap the Grzybowski set? The GO results do not include any details of the exact method used or a statistical assessment of the results. It is not clear if the GO terms noted are statistically enriched.  

      We have added references 53 and 54 in our revised manuscript, and we also uploaded the Datasets S2 and S3.

      For “I'm confused about unique SNPs in this context. How can they be unique to the aBM lineage if the SNPs used overlap the Grzybowski set?”, as we described in our materials and method part that “To achieve potential unique selection on aBM, we calculated the allele frequency for each SNPs between aBM and other archaeological maize, resulting in allele frequency data for 49,896 SNPs. Of these,18,668 SNPs were unique to aBM.”  Thus, the unique SNPs for aBM came from the comparison between aBM with other archaeological maize, and we did not use any modern maize data from the Grzybowski set.

      For “The GO results do not include any details of the exact method used or a statistical assessment of the results. It is not clear if the GO terms noted are statistically enriched.” We did not do GO Term enrichment, so there are no statistical assessments for the results. What we have done was we retained the GO Terms information for each gene by checking their biological process from MaizeGDB, after that, we summarized the results in Dataset S4.

      (8) The use of XP-EHH with pseudo haplotype variant calls is not viable (line 293). It is not clear what exact implementation of XP-EHH was used, but this method generally relies on phased or sometimes unphased diploid genotype calls to observe shared haplotypes, and some minimum population size to derive statistical power. No implementation of XP-EHH to my knowledge is appropriate for application to this kind of dataset. 

      We used the same XP-EHH as this publication “Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913-918 (2007).” Specifically in our analysis, the SNP information of modern maize was compared with ancient maize. The code is available in https://doi.org/10.5061/dryad.w6m905qtd.

      XP-EHH is a statistical method used in population genetics to detect recent positive selection in one population compared to another, and it often applied in modern large maize populations in previous research. In our study, we wanted to detect recent positive selection in modern maize compared to ancient maize, thus, we applied XP-EHH here. Although the population size of ancient maize is not big, it is the best method that we can apply for our dataset here to detect recent selection on modern maize.

      Reviewer #3 (Public review): 

      Summary: 

      The authors seek to place archaeological maize samples (2 kernels) from Bolivia into genetic and geographical context and to assess signatures of selection. The kernels were dated to the end of the Incan empire, just prior to European colonization. Genetic data and analyses were used to characterize the distance from other ancient and modern maize samples and to predict the origin of the sample, which was discovered in a tomb near La Paz, Bolivia. Given the conquest of this region by the Incan empire, it is possible that the sample could be genetically similar to populations of maize in Peru, the center of the Incan empire. Signatures of selection in the sample could help reveal various environmental variables and cultural preferences that shaped maize genetic diversity in this region at that time. 

      Strengths: 

      The authors have generated substantial genetic data from these archaeological samples and have assembled a data set of published archaeological and modern maize samples that should help to place these samples in context. The samples are dated to an interesting time in the history of South America during a period of expansion of the Incan empire and just prior to European colonization. Much could be learned from even this small set of samples. 

      Weaknesses: 

      Many thanks for your comments and suggestions.  We have addressed these below and provided further explanation.

      (1) Sample preparation and sequencing: 

      Details of the quality of the samples, including the percentage of endogenous DNA are missing from the methods. The low percentage of mapped reads suggests endogenous DNA was low, and this would be useful to characterize more fully. Morphological assessment of the samples and comparison to morphological data from other maize varieties is also missing. It appears that the two kernels were ground separately and that DNA was isolated separately, but data were ultimately pooled across these genetically distinct individuals for analysis. Pooling would violate assumptions of downstream analysis, which included genetic comparison to single archaeological and modern individuals. 

      We did not do the morphological assessment of the samples and comparison to morphological data from other maize varieties because we only have 2 aBM kernels, and we do not have other archaeological samples that could be used to do comparison.

      For “It appears that the two kernels were ground separately and that DNA was isolated separately, but data were ultimately pooled across these genetically distinct individuals for analysis”, as you can see from our Materials and Methods section that “Whole kernels were crushed in a mortar and pestle”, these two kernels were ground together before sequenced. 

      While morphological assessment of the sample would be interesting, most morphological data reported for maize are from microremains (starch, phytoliths, pollen) and this is beyond the scope of our study. Most studies of macrobotanical remains do not appear to focus solely on individual kernels, but instead on (or in combination with) cob and ear shape, which were not available in the assemblage.

      (2) Genetic comparison to other samples: 

      The authors did not meaningfully address the varying ages of the other archaeological samples and modern maize when comparing the genetic distance of their samples. The archaeological samples were as old as >5000 BP to as young as 70 BP and therefore have experienced varying extents of genetic drift from ancestral allele frequencies. For this reason, age should explicitly be included in their analysis of genetic relatedness. 

      We have changed related part in our revised manuscript.

      (3) Assessment of selection in their ancient Bolivian sample: 

      This analysis relied on the identification of alleles that were unique to the ancient sample and inferred selection based on a large number of unique SNPs in two genes related to internode length. This could be a technical artifact due to poor alignment of sequence data, evidence supporting pseudogenization, or within an expected range of genetic differentiation based on population structure and the age of the samples. More rigor is needed to indicate that these genetic patterns are consistent with selection. This analysis may also be affected by the pooling of the Bolivian archaeological samples.  

      We do not think it is because of poor alignment of sequence data since we used BWA v0.7.17 with disabled seed (-l 1024) and 0 mismatch alignment. Therefore, there are no SNPs that could come from poor alignment. Please see our detailed methods description here “For the archaeological maize samples, adapters were removed and paired reads were merged using AdapterRemoval60 with parameters --minquality 20 --minlength 30. All 5՛ thymine and 3՛ adenine residues within 5nt of the two ends were hard-masked, where deamination was most concentrated. Reads were then mapped to soft-masked B73 v5 reference genome using BWA v0.7.17 with disabled seed (-l 1024 -o 0 -E 3) and a quality control threshold (-q 20) based on the recommended parameter61 to improve ancient DNA mapping”.

      For “More rigor is needed to indicate that these genetic patterns are consistent with selection”, Could you please be more specific about which method or approach we should use here? For example, methods from specific publications that could be referenced? Or which specific tool could be used?

      “This analysis may also be affected by the pooling of the Bolivian archaeological samples.” As we could not prove these two seeds came from two different individual plants, we do not think this analysis was affected by the pooling of the Bolivian archaeological samples.

      (4) Evidence of selection in modern vs. ancient maize: In this analysis, samples were pooled into modern and ancient samples and compared using the XP-EHH statistic. One gene related to ovule development was identified as being targeted by selection, likely during modern improvement. Once again, ancient samples span many millennia and both South, Central, and North America. These, and the modern samples included, do not represent meaningfully cohesive populations, likely explaining the extremely small number of loci differentiating the groups. This analysis is also complicated by the pooling of the Bolivian archaeological samples. 

      Yes, it is possible that ovule development might be a modern improvement. We re-wrote this part in our revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      My suggestion is to address the comments that outline why the methods used or results obtained are not sufficient to support your conclusions. Overall, I suggest limiting the narrative of Inca influence and framing it as speculation in the discussion section. Presenting conclusions of Inca influence in the title and abstract is not appropriate, given the very questionable evidence. 

      We agree and have changed the title to “Fifteenth century CE Bolivian maize reveals genetic affinities with ancient Peruvian maize”.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 74: Mexicana is another subspecies of teosinte; the distinction is between ssp. mexicana and ssp. parviglumis (Balsas teosinte), not mexicana and teosinte. 

      We have corrected this in our revised manuscript.

      (2) Line 100-102: This is a bit confusing, it cannot have been a symbol of empire "since its first introduction", since its introduction long predates the formation of imperial politics in the region. Reference 17 only treats the late precolonial Inca context, while ref 22 (which cites maize cultivation at 2450 BC, not 3000 BC) makes no reference to ritual/feasting contexts; it simply documents early phytolith evidence for maize cultivation. As such, this statement is not supported by the references offered.

      lines 100-102. This point is well taken and was poor prose on our part.  We have modified this discussion to reflect both the confusing statement and we have corrected our mistake in age for reference 22. associated prose has been modified accordingly.

      We have corrected them as “Indeed, in the Andes, previous research showed that under the Inca empire, maize was fulfilled multiple contextual roles. In some cases, it operated as a sacred crop” and “…since its first introduction to the region around 2500 BC”.

      (3) Line 161: IntCal is likely not the appropriate calibration curve for this region; dates should probably be calibrated using SHCal.  

      We greatly appreciate this important (and correct) observation. We have completely recalibrated the maize AMS result based on the southern hemisphere calibration curve, discussed the new calibrations, and have also invoked two other AMS dates also subjected to the southern hemisphere calibration on associated material for comparison.We are confident in a 15th century AD/CE age for the maize, most likely mid- to late 15th century.  

      (4) Lines 167-169: The increase of G and A residues shown in Supplementary Figure S1a is just before the 5' end of the read within the reference genome context, and is related to fragmentation bias - a different process from postmortem deamination. Deamination leads to 5' C->T and 3' G->A, resulting in increased T at 5' ends and increased A at 3' ends, and the diagnostic damage curve. The reduction of C/T just before reads begin is not a result of deamination. 

      We have removed the “Both features are indicative of postmortem deamination patterns” in our revised manuscript.

      (5) Lines 187-196 This section presents a lot of important external information establishing hypotheses, and needs some references.  

      We have added the related references here.

      (6) Line 421: This makes it sound like damage masking was done BEFORE read mapping. However, this conflicts with the previous paragraph about map Damage, and Supplementary Figure 1 still shows a slight but perceptible damage curve, which is impossible if all terminal Ts and As are hard-masked. This should be reconciled.  

      The Supplementary Figure 1 shows the raw ancient maize DNA sample before damage masking. Specifically, Step1: We used map Damage to check/estimate if the damage exists, and we made the Supplementary Figure 1. Step 2: Then we used our own code hard-masked the damage bases and did read mapping.

      The purpose of Supplementary Figure 1 is to show the authenticity of aBM as archaeological maize. Therefore, it should show a slight but perceptible damage curve.

      (7) Line 460: PCA method is not given (just the LD pruning and the plotting).  

      The merged dataset of SNPs for archaeological and modern maize was used for PCA analysis by using “plink –pca”.

      (8) "tropicalis" maize is not common usage, it is not clear to me what this refers to. 

      We have changed all “tropicalis maize” as “tropical maize” in our revised manuscript.

      (9) The Figure 4 color palette is not accessible for colorblind/color-deficient vision.  

      We have changed the color of Figure 4. Please find the new colors in our upload Figure 4.

      (10) Datasets S2 and S3 are not included with this submission. 

      Thank you for letting us know and your suggestion. We have included Datasets S2 and S3 here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In Causal associations between plasma proteins and prostate cancer: a Proteome-Wide Mendelian Randomization, the authors present a manuscript which seeks to identify novel markers for prostate cancer through analysis of large biobank-based datasets and to extend this analysis to potential therapeutic targets for drugs. This is an area that is already extensively researched, but remains important, due to the high burden and mortality of prostate cancer globally.

      Strengths:

      The main strengths of the manuscript are the identification and use of large biobank data assets, which provide large numbers of cases and controls, essential for achieving statistical power. The databases used (deCODE, FinnGen, and the UK Biobank) allow for robust numbers of cases and controls. The analytical method chosen, Mendelian Randomization, is appropriate to the problem. Another strength is the integration of multi-omic datasets, here using protein data as well as GWAS sources to integrate genomic and proteomic data.

      Thank you for your positive feedback regarding the overall quality of our work and we greatly appreciate you taking time and making effort in reviewing our manuscript.

      Weaknesses:

      The main weaknesses of the manuscript relate to the following areas:

      (1) The failure of the study to analyse the data in the context of other closely related conditions such as benign prostatic hyperplasia (BPH) or lower urinary tract symptoms (LUTS), which have some pathways and biomarkers in common, such as inflammatory pathways (including complement) and specific markers such as KLK3. As a consequence, it is not possible for readers to know whether the findings are specific to prostate cancer or whether they are generic to prostate dysfunction. Given the prevalence of prostate dysfunction (half of men reaching their sixth decade), the potential for false positives and overtreatment from non-specific biomarkers is a major problem, resulting in the evidence presented in this manuscript being weak. Other researchers have addressed this issue using the same data sources as presented here, for example, in this paper, looking at BPH in the UK Biobank population. https://www.nature.com/articles/s41467-018-06920-9

      Thank you for your valuable comment. We fully agree that biomarker development must prioritize specificity to avoid overtreatment. While our study is a foundational step toward identifying potential therapeutic targets or complementary biomarkers for prostate cancer—not as a direct endorsement of these proteins for standalone clinical diagnosis. Mendelian randomization analysis strengthens causal inference by design, and we further ensured robustness through sensitivity analyses (e.g., MR-Egger regression for pleiotropy, Bonferroni correction for multiple testing). These methods distinguish true causal effects from nonspecific associations. Importantly, while PSA’s lack of specificity is widely recognized, its role in reducing PCa mortality underscores the value of biomarker-driven screening. Our findings align with the need to integrate multiple markers (e.g. combining a novel protein with PSA) to improve diagnostic precision. Translating these causal insights into clinical tools remains challenging but represents a necessary next step, and we emphasize that this work provides a rigorous starting point for future validation studies.

      (2) There is no discussion of Gleason scores with regard to either biomarkers or therapies, and a general lack of discussion around indolent disease as compared with more aggressive variants. These are crucial issues with regard to the triage and identification of genomically aggressive localized prostate cancers. See, for example, the work set out in: https://doi.org/10.1038/nature20788

      Thank you for pointing this out. We acknowledge that our original analysis did not directly address this critical issue due to a key data limitation: the publicly available GWAS summary statistics for PCa (from openGWAS and FinnGen) do not provide genetic associations stratified by phenotypic severity or molecular subtypes. This limitation precluded MR analysis of proteins specifically linked to aggressive disease. To partially bridge this gap, we integrate evidence from recent studies in the revised Discussion section to explore the relevance of potential biomarkers to aggressive PCa.

      (3) An additional issue is that the field of PCa research is fast-moving. The manuscript cites ~80 references, but too few of these are from recent studies, and many important and relevant papers are not included. The manuscript would be much stronger if it compared and contrasted its findings with more recent studies of PCa biomarkers and targets, especially those concerned with multi-omics and those including BPH.

      Thank you for your professional comments. We have rigorously updated the manuscript to include more recent publications and we systematically compare and contrast our findings with these recent studies in the revised Discussion section.

      (4) The Methods section provides no information on how the Controls were selected. There is no Table providing cohort data to allow the reader to know whether there were differences in age, BMI, ethnic grouping, social status or deprivation, or smoking status, between the Cases and Controls. These types of data are generally recorded in Biobank data, so this sort of analysis should be possible, or if not, the authors' inability to construct an appropriately matched set of Controls should be discussed as a Limitation.

      We thank the reviewer for raising this important methodological concern. We have expanded the Limitations section to state it.

      “Lastly, our analysis relied exclusively on publicly available GWAS summary statistics from openGWAS and FinnGen, which did not provide individual-level data on covariates, resulting in no direct assessment of demographic or clinical differences between cases and controls.”

      Reviewer #2 (Public review):

      This is potentially interesting work, but the analyses are attempted in a rather scattergun way, with little evident critical thought. The structure of the work (Results before Methods) can work in some manuscripts, but it is not ideal here. The authors discuss results before we know anything about the underlying data that the results come from. It gives the impression that the authors regard data as a resource to be exploited, without really caring where the data comes from. The methods can provide meaningful insights if correctly used, but while I don't have reasons to doubt that the analyses were conducted correctly, findings are presented with little discussion or interpretation. No follow-up analyses are performed.

      In summary, there are likely some gems here, but the whole manuscript is essentially the output from an analytic pipeline.

      We thank the reviewer for the thoughtful evaluation of our work. In response to the concerns regarding manuscript structure and interpretative depth, we have restructured the manuscript to present the Methods section before Results, ensuring transparency in data sources and analytical workflows. Additionally, the Discussion section has been substantially revised to provide mechanistic explanations for key findings (e.g., associated phenotype, causal proteins, druggable targets), contextualize results within recent multi-omics studies and highlight clinical implications.  These revisions aim to transform the work from a pipeline-driven analysis to a biologically grounded investigation, offering actionable insights into prostate cancer pathogenesis and therapeutic development.

      Taking the researchers aims in turn:

      (1) Meta-GWAS - while combining two datasets together can provide additional insights, the contribution of this analysis above existing GWAS is not clear. The PRACTICAL consortium has already reported the GWAS of 70% of these data. What additional value does this analysis provide? (Likely some, but it's not clear from the text.) Also, the presentation of results is unclear - authors state that only 5 gene regions contained variants at p<5x10-8, but Figure 1 shows dozens of hits above 5x10-8. Also, the red line in Figure 1 (supposedly at 5x10-8) is misplaced.

      Thank you very much for your feedback. Although the PRACTICAL consortium constituted the majority of PCa GWAS data, our meta-analysis integrating FinnGen data enhanced statistical power enabling robust detection of low-frequency variants with minor allele frequencies. Moreover, FinnGen's Finnish ancestry (genetic isolate) helps distinguish population-specific effects. The presentation of results showed the top 5 gene regions contained variants at p < 5×10⁻⁸. We apologize for not noticing that the red line was not displayed correctly in the original figures included in the manuscript. We have updated it in the revised manuscript.

      (2) Cross-phenotype analysis. It is not really clear what this analysis is, or why it is done. What is the iCPAGdb? A database? A statistical method? Why would we want to know cross-phenotype associations? What even are these? It seems that the authors have taken data from an online resource and have written a paragraph based on this existing data with little added value.

      We appreciate the opportunity to clarify this analysis. The cross-phenotype analysis was designed to systematically identify phenotypic traits that share genetic or molecular pathways with prostate cancer, thereby uncovering pleiotropic mechanisms or shared risk factors. Here, iCPAGdb (integrated Cross-Phenotype Association Genetics Database) is a curated repository that aggregates GWAS summary statistics and evaluates genetic pleiotropy using LD-proxy associations from the NHGRI-EBI GWAS Catalog. Prostate carcinogenesis involves multisystem interactions, including spanning endocrine dysregulation, immune microenvironment remodeling and metabolic reprogramming, rather than isolated molecular pathway disruptions. Therefore, it is indispensable for discriminating primary pathogenic drivers from secondary compensatory responses, ultimately informing the development of precision therapeutic strategies. 

      In response to your concerns, we have revised the Results section to explicitly define the rationale and methodology of cross-phenotype analysis and restructured the Discussion to interpret phenotype-PCa associations within unified biological frameworks (e.g., metabolic dysregulation, androgen signaling), rather than presenting them as isolated findings.

      (3) PW-MR. I can see the value of this work, but many details are unclear. Was this a two-sample MR using PRACTICAL + FinnGen data for the outcome? How many variants were used in key analyses? Again, the description of results is sparse and gives little added value.

      We thank you for raising this issue. Two-sample MR refers to an analytical design where genetic instruments for the exposure (plasma proteins) and genetic associations with the outcome (PCa) are derived from non-overlapping populations. This ensures complete sample independence between exposure and outcome datasets to avoid confounding biases, regardless of whether the outcome data originate from single or multiple cohorts. The meta-analysis of PRACTICAL and FinnGen GWAS generates 27,210 quality-controlled variants (p < 5×10⁻⁸, MAF ≥ 1%, LD-clumped r² < 0.1) used in key analyses. Regarding the concern about sparse interpretation, we have substantially expanded the Discussion by comparing significant protein findings (e.g., MSMB, SERPINA3) with results from existing functional studies and multi-omics datasets and unravelling new insights.

      (4) Colocalization - seems clear to me.

      (5) Additional post-GWAS analyses (pathway + druggability) - again, the analyses seem to be performed appropriately, although little additional insight other than the reporting of output from the methods.

      The post-MR druggability and pathway analyses serve two primary scientific purposes: (1) therapeutic prioritization - systematically evaluating which MR-identified proteins represent tractable drug targets (either through existing FDA-approved agents or compounds in clinical development) with direct relevance to cancer or PCa management, and (2) mechanistic hypothesis generation - mapping these candidate proteins to coherent biological pathways to guide future functional validation studies investigating their causal roles in prostate carcinogenesis. In response to your feedback, we have restructured the Discussion section under the subheading “Biological Mechanisms and Druggable Targets” to synthesize these findings, explicitly linking biological pathway to therapeutic targets.

      Minor points:

      (6) The stated motivation for this work is "early detection". But causality isn't necessary for early detection. If the authors are interested in early detection, other analysis approaches are more appropriate.

      We appreciate your insightful feedback. Early detection is one motivation for this work, meanwhile, our goal is also to identify causally implicated proteins that may serve as intervention targets for PCa prevention or therapy.  Establishing causality is critical for distinguishing biomarkers that drive disease pathogenesis from those that are secondary to disease progression, as the former holds greater specificity for early detection and prioritization of therapeutic targets. While we acknowledge that validation for early detection may require additional methodologies, MR analysis provides a foundational step by prioritizing candidate proteins with causal links to disease. This approach ensures that downstream efforts focus on biomarkers and targets with the greatest potential to alter disease trajectories, rather than merely correlative markers.

      (7) The authors state "193 proteins were associated with PCa risk", but they are looking at MR results - these analyses test for disease associations of genetically-predicted levels of proteins, not proteins themselves.

      True, in MR, the exposure of interest is the lifelong effect of genetically predicted protein levels. This approach is designed to infer causality while avoiding confounding and reverse causation, as genetic variants are fixed at conception and unaffected by disease processes. When we state “193 proteins were associated with PCa risk,” we specifically refer to proteins whose genetically predicted levels (based on instrument SNPs from protein QTLs) show causal links to PCa. Importantly, MR does not measure the direct association between observed protein concentrations and disease. Instead, it estimates the lifelong causal effect of protein levels predicted by genetics. This distinction is critical for disentangling cause from consequence. For example, a protein elevated due to tumor progression would not be identified as causal in MR if its genetic predictors are unrelated to PCa risk.

      We acknowledge that clinical translation requires further validation of these proteins in observational studies measuring actual protein levels. However, MR provides a robust first step by prioritizing candidates with causal roles, thereby reducing the risk of investing in biomarkers confounded by disease processes.

      Reviewer #1 (Recommendations for the authors):

      As outlined above, the major weakness of the manuscript is its failure to consider BPH / LUTS, and whether the markers and targets are specific to PCa or not. Specific improvements that the authors could consider might include a literature review of the features identified for their 20 high-risk proteins, and ideally also analyze whether these proteins are upregulated or downregulated in the databases they have analysed (for example it will be easy to analyze whether these proteins are dysregulated in BPH patients as these are specifically identified in the UK Biobank).

      The authors may be able to gain context for this approach by looking at papers analyzing BPH and the complement cascade and other proteins from the authors' top 10 or top 20, for example: https://doi.org/10.1002/pros.24639IF: 2.6 Q2

      Other sources can be identified by examining the literature for recent omics papers analysing BPH, especially those that analyse and compare BPH / PCa specifically.

      Thank you for highlighting the critical need to distinguish PCa-specific biomarkers from those shared with BPH. In response, we conducted a literature review of multi-omics datasets and prospective cohort studies, systematically evaluating the specificity of prioritized proteins by comparing their expression trends in PCa and BPH or benign prostate tissues. These findings are now integrated into the revised Discussion section under the subheading " Plasma Proteins Causal Links to Prostate Cancer".

      In the Discussion, the paragraph (line 288) on PSA is extremely weak. The authors state that further research is needed, and yet only reference four articles (from 2008, 2010, 2012, 2014), none of which are from the last decade. Considerable amounts of research from the last ten years have been published on PSA, for example, see this article from 2018, which specifically analyses PSA in the context of the UK Biobank. This section should be made more up-to-date with the latest literature findings. https://doi.org/10.1038/s41467-018-06920-9

      Thank you very much for your feedback. We acknowledge the need to strengthen the discussion on PSA by incorporating recent literature. In the revised manuscript, we have expanded the PSA discussion to integrate contemporary research on the prognostic role of PSA in the progression of PCa and its limitations in cancer screening, ensuring that our discussion reflected the current consensus and controversies. 

      Also in the Discussion, the analysis of phenotypic indicators is insufficiently comprehensive and should reference other recent research. For example, this recent UK Biobank study dealt with a wide range of conditions, including prostate cancer, and identified similar factors to those identified in this paper. The authors should compare and contrast their phenotypic findings with the existing literature. https://doi.org/10.1038/s41588-024-01898-1

      Thank you for addressing the comprehensiveness of phenotypic analysis. We have learned recent large-scale phenome-wide analyses (linked in your feedback) that explore multi-omics biomarkers and their associations with a range of different diseases. We have compared and contrasted our phenotypic findings with the existing literature and revised the Discussion section to interpret phenotype-PCa associations, emphasizing both shared pathways and disease-specific signals.

      Under Methods, there is too little information on how Controls were selected, whether any matching process was conducted, or whether there are fundamental differences between the cases and controls (such as smoking status, BMI, comorbidities). The authors use R, and a library such as MatchIt could be used to ensure that the Controls cohort is appropriately matched to the Cases.

      As outlined above, we acknowledge that our original analysis did not directly address this critical issue due to a key data limitation. The publicly available GWAS summary statistics for PCa (from openGWAS and FinnGen) do not provide individual-level data on covariates, resulting in no direct assessment of demographic or clinical differences between cases and controls.

      An important final point is that, as far as I can tell, no UK Biobank Application Number has been specified in the manuscript. This is vital to establish that there was an original hypothesis being investigated (as opposed to data dredging of open access resources), especially in light of the largely mechanistic flow of the manuscript and lack of PCa and relevant confounder-specific discussion. The authors may be aware of the work of Stender et al (2024) regarding formulaic papers using Mendelian randomization, especially that "[All] combinations of exposure and outcome results based on data available in IEU openGWAS (https://gwas.mrcieu.ac.uk/) can be browsed online on epigraphDB.org. In other words, these results are, in effect, already published. Reporting them again in a scientific paper adds nothing to what can be looked up online in minutes." The authors may wish to address this issue directly.

      Stender, S., Gellert-Kristensen, H. & Smith, G.D. Reclaiming Mendelian randomization from the deluge of papers and misleading findings. Lipids Health Dis 23, 286 (2024). https://doi.org/10.1186/s12944-024-02284-w

      We confirm that all data used in this study were obtained from publicly available GWAS summary statistics (e.g., PRACTICAL consortium, FinnGen) and proteomic datasets (deCODE, UKB-PPP). Our research was guided by a predefined hypothesis to investigate causal plasma protein biomarkers for prostate cancer, rather than exploratory data mining. The analytical pipelines and integrative approaches (e.g., colocalization, druggability assessment) were specifically designed to address this hypothesis, aligning with the ethical use of open-access resources.

      Reviewer #2 (Recommendations for the authors):

      There are several specific recommendations in the public review (e.g., clarify the contribution of the GWAS). Otherwise, there is nothing clearly incorrect, but translational insight is missing - the analyses are not clearly connected to the scientific literature. This is a limitation rather than a flaw - the manuscript will likely still be useful to readers.

      We thank you for highlighting the need to strengthen translational insights and contextualize our findings within existing literature. In the revised manuscript, we have expanded the Discussion section to systematically compare our results with prior mechanistic and clinical studies, including the shared pathways of associated phenotypes, the potential of significant proteins in biomarkers and therapeutic targeting. These revisions ensure our analyses are firmly rooted in the scientific literature.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      We thank the Reviewers for their thorough attention to our paper and the interesting discussion about the findings. Before responding to more specific comments, here some general points we would like to clarify:

      (1) Ecological niche models are indeed correlative models, and we used them to highlight environmental factors associated with HPAI outbreaks within two host groups. We will further revise the terminology that could still unintentionally suggest causal inference. The few remaining ambiguities were mainly in the Discussion section, where our intent was to interpret the results in light of the broader scientific literature. Particularly, we will change the following expressions:

      -  “Which factors can explain…” to  “Which factors are associated with…” (line 75);

      -  “the environmental and anthropogenic factors influencing” to “the environmental and anthropogenic factors that are correlated with” (line 273);

      -  “underscoring the influence” to “underscoring the strong association” (line 282).

      (2) We respectfully disagree with the suggestion that an ecological niche modelling (ENM) approach is not appropriate for this work and the research question addressed therein. Ecological niche models are specifically designed to estimate the spatial distribution of the environmental suitability of species and pathogens, making them well suited to our research questions. In our study, we have also explicitly detailed the known limitations of ecological niche models in the Discussion section, in line with prior literature, to ensure their appropriate interpretation in the context of HPAI.

      (3) The environmental layers used in our models were restricted to those available at a global scale, as listed in Supplementary Information Resources S1 (https://github.com/sdellicour/h5nx_risk_mapping/blob/master/Scripts_%26_data/SI_Resource_S1.xlsx). Naturally, not all potentially relevant environmental factors could be included, but the selected layers are explicitly documented and only these were assessed for their importance. Despite this limitation, the performance metrics indicate that the models performed well, suggesting that the chosen covariates capture meaningful associations with HPAI occurrence at a global scale.

      Reviewer #1 (Public review):

      The authors aim to predict ecological suitability for transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data. I thank the authors for taking the time to respond to my initial review and I provide some follow-up below.

      Detailed comments:

      In my review, I asked the authors to clarify the meaning of "spillover" within the HPAI transmission cycle. This term is still not entirely clear: at lines 409-410, the authors use the term with reference to transmission between wild birds and farmed birds, as distinct to transmission between farmed birds. It is implied but not explicitly stated that "spillover" is relevant to the transmission cycle in farmed birds only. The sentence, "we developed separate ecological niche models for wild and domestic bird HPAI occurrences ..." could have been supported by a clear sentence describing the transmission cycle, to prime the reader for why two separate models were necessary.

      We respectfully disagree that the term “spillover” is unclear in the manuscript. In both the Methods and Discussion sections (lines 387-391 and 409-414), we explicitly define “spillover” as the introduction of HPAI viruses from wild birds into domestic poultry, and we distinguish this from secondary farm-to-farm transmission. Our use of separate ecological niche models for wild and domestic outbreaks reflects not only the distinction between primary spillover and secondary transmission, but also the fundamentally different ecological processes, surveillance systems, and management implications that shape outbreaks in these two groups. We will clarify this choice in the revised manuscript when introducing the separate models. Furthermore, on line 83, we will add “as these two groups are influenced by different ecological processes, surveillance biases, and management contexts”.

      I also queried the importance of (dead-end) mammalian infections to a model of the HPAI transmission risk, to which the authors responded: "While spillover events of HPAI into mammals have been documented, these detections are generally considered dead-end infections and do not currently represent sustained transmission chains. As such, they fall outside the scope of our study, which focuses on avian hosts and models ecological suitability for outbreaks in wild and domestic birds." I would argue that any infections, whether they are in dead-end or competent hosts, represent the presence of environmental conditions to support transmission so are certainly relevant to a niche model and therefore within scope. It is certainly understandable if the authors have not been able to access data of mammalian infections, but it is an oversight to dismiss these infections as irrelevant.

      We understand the Reviewer’s point, but our study was designed to model HPAI occurrence in avian hosts only. We therefore restricted our analysis to wild birds and domestic poultry, which represent the primary hosts for HPAI circulation and the focus of surveillance and control measures. While mammalian detections have been reported, they are outside the scope of this work.

      Correlative ecological niche models, including BRTs, learn relationships between occurrence data and covariate data to make predictions, irrespective of correlations between covariates. I am not convinced that the authors can make any "interpretation" (line 298) that the covariates that are most informative to their models have any "influence" (line 282) on their response variable. Indeed, the observation that "land-use and climatic predictors do not play an important role in the niche ecological models" (line 286), while "intensive chicken population density emerges as a significant predictor" (line 282) begs the question: from an operational perspective, is the best (e.g., most interpretable and quickest to generate) model of HPAI risk a map of poultry farming intensity?

      We agree that poultry density may partly reflect reporting bias, but we also assumed it a meaningful predictor of HPAI risk. Its importance in our models is therefore expected. Importantly, our BRT framework does more than reproduce poultry distribution: it captures non-linear relationships and interactions with other covariates, allowing a more nuanced characterisation of risk than a simple poultry density map. Note also that we distinguished in our models intensive and extensive chicken poultry density and duck density. Therefore, it is not a “map of poultry farming intensity”. 

      At line 282, we used the word “influence” while fully recognising that correlative models cannot establish causality. Indeed, in our analyses, “relative influence” refers to the importance metric produced by the BRT algorithm (Ridgeway, 2020), which measures correlative associations between environmental factors and outbreak occurrences. These scores are interpreted in light of the broader scientific literature, therefore our interpretations build on both our results and existing evidence, rather than on our models alone. However, in the next version of the paper, we will revise the sentence as: “underscoring the strong association of poultry farming practices with HPAI spread (Dhingra et al., 2016)”. 

      I have more significant concerns about the authors' treatment of sampling bias: "We agree with the Reviewer's comment that poultry density could have potentially been considered to guide the sampling effort of the pseudo-absences to consider when training domestic bird models. We however prefer to keep using a human population density layer as a proxy for surveillance bias to define the relative probability to sample pseudo-absence points in the different pixels of the background area considered when training our ecological niche models. Indeed, given that poultry density is precisely one of the predictors that we aim to test, considering this environmental layer for defining the relative probability to sample pseudo-absences would introduce a certain level of circularity in our analytical procedure, e.g. by artificially increasing to influence of that particular variable in our models." The authors have elected to ignore a fundamental feature of distribution modelling with occurrence-only data: if we include a source of sampling bias as a covariate and do not include it when we sample background data, then that covariate would appear to be correlated with presence. They acknowledge this later in their response to my review: "...assuming a sampling bias correlated with poultry density would result in reducing its effect as a risk factor." In other words, the apparent predictive capacity of poultry density is a function of how the authors have constructed the sampling bias for their models. A reader of the manuscript can reasonably ask the question: to what degree are is the model a model of HPAI transmission risk, and to what degree is the model a model of the observation process? The sentence at lines 474-477 is a helpful addition, however the preceding sentence, "Another approach to sampling pseudo-absences would have been to distribute them according to the density of domestic poultry," (line 474) is included without acknowledgement of the flow-on consequence to one of the key findings of the manuscript, that "...intensive chicken population density emerges as a significant predictor..." (line 282). The additional context on the EMPRES-i dataset at line 475-476 ("the locations of outbreaks ... are often georeferenced using place name nomenclatures") is in conflict with the description of the dataset at line 407 ("precise location coordinates"). Ultimately, the choices that the authors have made are entirely defensible through a clear, concise description of model features and assumptions, and precise language to guide the reader through interpretation of results. I am not satisfied that this is provided in the revised manuscript.

      We thank the Reviewer for this important point. To address it, we compared model predictive performance and covariate relative influences obtained when pseudo-absences were weighted by poultry density versus human population density (Author response table 1). The results show that differences between the two approaches are marginal, both in predictive performance (ΔAUC ranging from -0.013 to +0.002) and in the ranking of key predictors (see below Author response images 1 and 2). For instance, intensive chicken density consistently emerged as an important predictor regardless of the bias layer used.

      Note: the comparison was conducted using a simplified BRT configuration for computational efficiency (fewer trees, fixed 5-fold random cross-validation, and standardised parameters). Therefore, absolute values of AUC and variable importance may differ slightly from those in the manuscript, but the relative ranking of predictors and the overall conclusions remain consistent.

      Given these small differences, we retained the approach using human population density. We agree that poultry density partly reflects surveillance bias as well as true epidemiological risk, and we will clarify this in the revised manuscript by noting that the predictive role of poultry density reflects both biological processes and surveillance systems. Furthermore, on line 289, we will add “We note, however, that intensive poultry density may reflect both surveillance intensity and epidemiological risk, and its predictive role in our models should be interpreted in light of both processes”.

      Author response table 1.

      Comparison of model predictive performances (AUC) between pseudo-absence sampling were weighted by poultry density and by human population density across host groups, virus types, and time periods. Differences in AUC values are shown as the value for poultry-weighted minus human-weighted pseudo-absences.

      Author response image 1.

      Comparison of variable relative influence (%) between models trained with pseudo-absences weighted by poultry density (red) and human population density (blue) for domestic bird outbreaks. Results are shown for four datasets: H5N1 (<2020), H5N1 (>2020), H5Nx (<2020), and H5Nx (>2020).

      Author response image 2.

      Comparison of variable relative influence (%) between models trained with pseudo-absences weighted by poultry density (red) and human population density (blue) for wild bird outbreaks. Results are shown for three datasets: H5N1 (>2020), H5Nx (<2020), and H5Nx (>2020).

      The authors have slightly misunderstood my comment on "extrapolation": I referred to "environmental extrapolation" in my review without being particularly explicit about my meaning. By "environmental extrapolation", I meant to ask whether the models were predicting to environments that are outside the extent of environments included in the occurrence data used in the manuscript. The authors appear to have understood this to be a comment on geographic extrapolation, or predicting to areas outside the geographic extent included in occurrence data, e.g.: "For H5Nx post-2020, areas of high predicted ecological suitability, such as Brazil, Bolivia, the Caribbean islands, and Jilin province in China, likely result from extrapolations, as these regions reported few or no outbreaks in the training data" (lines 195-197). Is the model extrapolating in environmental space in these regions? This is unclear. I do not suggest that the authors should carry out further analysis, but the multivariate environmental similarly surface (MESS; see Elith et al., 2010) is a useful tool to visualise environmental extrapolation and aid model interpretation.

      On the subject of "extrapolation", I am also concerned by the additions at lines 362-370: "...our models extrapolate environmental suitability for H5Nx in wild birds in areas where few or no outbreaks have been reported. This discrepancy may be explained by limited surveillance or underreporting in those regions." The "discrepancy" cited here is a feature of the input dataset, a function of the observation distribution that should be captured in pseudo-absence data. The authors state that Kazakhstan and Central Asia are areas of interest, and that the environments in this region are outside the extent of environments captured in the occurrence dataset, although it is unclear whether "extrapolation" is informed by a quantitative tool like a MESS or judged by some other qualitative test. The authors then cite Australia as an example of a region with some predicted suitability but no HPAI outbreaks to date, however this discussion point is not linked to the idea that the presence of environmental conditions to support transmission need not imply the occurrence of transmission (as in the addition, "...spatial isolation may imply a lower risk of actual occurrences..." at line 214). Ultimately, the authors have not added any clear comment on model uncertainty (e.g., variation between replicated BRTs) as I suggested might be helpful to support their description of model predictions.

      Many thanks for the clarification. Indeed, we interpreted your previous comments in terms of geographic extrapolations. We thank the Reviewer for these observations. We will adjust the wording to further clarify that predictions of ecological suitability in areas with few or no reported outbreaks (e.g., Central Asia, Australia) are not model errors but expected extrapolations, since ecological suitability does not imply confirmed transmission (for instance, on Line 362: “our models extrapolate environmental suitability” will be changed to “Interestingly, our models extrapolate geographical”). These predictions indicate potential environments favorable to circulation if the virus were introduced.

      In our study, model uncertainty is formally assessed when comparing the predictive performances of our models (Fig. S3, Table S1), the relative influence (Table S3) and response curves (Fig. 2) associated with each environmental factor (Table S2). All the results confirming a good converge between these replicates. Finally, we indeed did not use a quantitative tool such as a MESS to assess extrapolation but did rely on qualitative interpretation of model outputs.

      All of my criticisms are, of course, applied with the understanding that niche modelling is imperfect for a disease like HPAI, and that data may be biased/incomplete, etc.: these caveats are common across the niche modelling literature. However, if language around the transmission cycle, the niche, and the interpretation of any of the models is imprecise, which I find it to be in the revised manuscript, it undermines all of the science that is presented in this work.

      We respectfully disagree with this comment. The scope of our study and the methods employed are clearly defined in the manuscript, and the limitations of ecological niche modelling in this context are explicitly acknowledged in the Discussion section. While we appreciate the Reviewer’s concern, the comment does not provide specific examples of unclear or imprecise language regarding the transmission cycle, niche, or interpretation of the models. Without such examples, it is difficult to identify further revisions that would improve clarity.

      Reviewer #2 (Public review):

      The geographic range of highly pathogenic avian influenza cases changed substantially around the period 2020, and there is much interest in understanding why. Since 2020 the pathogen irrupted in the Americas and the distribution in Asia changed dramatically. This study aimed to determine which spatial factors (environmental, agronomic and socio-economic) explain the change in numbers and locations of cases reported since 2020 (2020--2023). That's a causal question which they address by applying correlative environmental niche modelling (ENM) approach to the avian influenza case data before (2015--2020) and after 2020 (2020--2023) and separately for confirmed cases in wild and domestic birds. To address their questions they compare the outputs of the respective models, and those of the first global model of the HPAI niche published by Dhingra et al 2016.

      We do not agree with this comment. In the manuscript, it is well established that we are quantitatively assessing factors that are associated with occurrences data before and after 2020. We do not claim to determine the causality. One sentence of the Introduction section (lines 75-76) could be confusing, so we intend to modify it in the final revision of our manuscript. 

      ENM is a correlative approach useful for extrapolating understandings based on sparse geographically referenced observational data over un- or under-sampled areas with similar environmental characteristics in the form of a continuous map. In this case, because the selected covariates about land cover, use, population and environment are broadly available over the entire world, modelled associations between the response and those covariates can be projected (predicted) back to space in the form of a continuous map of the HPAI niche for the entire world.

      We fully agree with this assessment of ENM approaches.

      Strengths:

      The authors are clear about expected bias in the detection of cases, such geographic variation in surveillance effort (testing of symptomatic or dead wildlife, testing domestic flocks) and in general more detections near areas of higher human population density (because if a tree falls in a forest and there is no-one there, etc), and take steps to ameliorate those. The authors use boosted regression trees to implement the ENM, which typically feature among the best performing models for this application (also known as habitat suitability models). They ran replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. Their code and data is provided, though I did not verify that the work was reproducible.

      The paper can be read as a partial update to the first global model of H5Nx transmission by Dhingra and others published in 2016 and explicitly follows many methodological elements. Because they use the same covariate sets as used by Dhingra et al 2016 (including the comparisons of the performance of the sets in spatial cross-validation) and for both time periods of interest in the current work, comparison of model outputs is possible. The authors further facilitate those comparisons with clear graphics and supplementary analyses and presentation. The models can also be explored interactively at a weblink provided in text, though it would be good to see the model training data there too.

      The authors' comparison of ENM model outputs generated from the distinct HPAI case datasets is interesting and worthwhile, though for me, only as a response to differently framed research questions.

      Weaknesses:

      This well-presented and technically well-executed paper has one major weakness to my mind. I don't believe that ENM models were an appropriate tool to address their stated goal, which was to identify the factors that "explain" changing HPAI epidemiology.

      Here is how I understand and unpack that weakness:

      (1) Because of their fundamentally correlative nature, ENMs are not a strong candidate for exploring or inferring causal relationships.

      (2) Generating ENMs for a species whose distribution is undergoing broad scale range change is complicated and requires particular caution and nuance in interpretation (e.g., Elith et al, 2010, an important general assumption of environmental niche models is that the target species is at some kind of distributional equilibrium (at time scales relevant to the model application). In practice that means the species has had an opportunity to reach all suitable habitats and therefore its absence from some can be interpreted as either unfavourable environment or interactions with other species). Here data sets for the response (N5H1 or N5Hx case data in domestic or wild birds ) were divided into two periods; 2015--2020, and 2020--2023 based on the rationale that the geographic locations and host-species profile of cases detected in the latter period was suggestive of changed epidemiology. In comparing outputs from multiple ENMs for the same target from distinct time periods the authors are expertly working in, or even dancing around, what is a known grey area, and they need to make the necessary assumptions and caveats obvious to readers.

      We thank the Reviewer for this observation. First, we constrained pseudo-absence sampling to countries and regions where outbreaks had been reported, reducing the risk of interpreting non-affected areas as environmentally unsuitable. Second, we deliberately split the outbreak data into two periods (2015-2020 and 2020-2023) because we do not assume a single stable equilibrium across the full study timeframe. This division reflects known epidemiological changes around 2020 and allows each period to be modeled independently. Within each period, ENM outputs are interpreted as associations between outbreaks and covariates, not as equilibrium distributions. Finally, by testing prediction across periods, we assessed both niche stability and potential niche shifts. These clarifications will be added to the manuscript to make our assumptions and limitations explicit.

      Line 66, we will add: “Ecological niche model outputs for range-shifting pathogens must therefore be interpreted with caution (Elith et al., 2010). Despite this limitation, correlative ecological niche models  remain useful for identifying broad-scale associations and potential shifts in distribution. To account for this, we analysed two distinct time periods (2015-2020 and 2020-2023).”

      Line 123, we will revise “These findings underscore the ability of pre-2020 models in forecasting the recent geographic distribution of ecological suitability for H5Nx and H5N1 occurrences” to “These results suggest that pre-2020 models captured broad patterns of suitability for H5Nx and H5N1 outbreaks, while post-2020 models provided a closer fit to the more recent epidemiological situation”.

      (3) To generate global prediction maps via ENM, only variables that exist at appropriate resolution over the desired area can be supplied as covariates. What processes could influence changing epidemiology of a pathogen and are their covariates that represent them? Introduction to a new geographic area (continent) with naive population, immunity in previously exposed populations, control measures to limit spread such as vaccination or destruction of vulnerable populations or flocks? Might those control measures be more or less likely depending on the country as a function of its resources and governance? There aren't globally available datasets that speak to those factors, so the question is not why were they omitted but rather was the authors decision to choose ENMs given their question justified? How valuable are insights based on patterns of correlation change when considering different temporal sets of HPAI cases in relation to a common and somewhat anachronistic set of covariates?

      We agree that the ecological niche models trained in our study are limited to environmental and host factors, as described in the Methods section with the selection of predictors. While such models cannot capture causality or represent processes such as immunity, control measures, or governance, they remain a useful tool for identifying broad associations between outbreak occurrence and environmental context. Our study cannot infer the full mechanisms driving changes in HPAI epidemiology, but it does provide a globally consistent framework to examine how associations with available covariates vary across time periods.

      (4) In general the study is somewhat incoherent with respect to time. Though the case data come from different time periods, each response dataset was modelled separately using exactly the same covariate dataset that predated both sets. That decision should be understood as a strong assumption on the part of the authors that conditions the interpretation: the world (as represented by the covariate set) is immutable, so the model has to return different correlative associations between the case data and the covariates to explain the new data. While the world represented by the selected covariates *may* be relatively stable (could be statistically confirmed), what about the world not represented by the covariates (see point 3)?

      We used the same covariate layers for both periods, which indeed assumes that these environmental and host factors are relatively stable at the global scale over the short timeframe considered. We believe this assumption is reasonable, as poultry density, land cover, and climate baselines do not change drastically between 2015 and 2023 at the resolution of our analysis. We agree, however, that unmeasured processes such as control measures, immunity, or governance may have changed during this time and are not captured by our covariates.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      - Line 400-401: "over the 2003-2016 periods" has an extra "s"; "two host species" (with reference to wild and domestic birds) would be more precise as "two host groups".

      - Remove comma line 404

      Many thanks for these comments, we have modified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Most of my work this round is encapsulated in the public part of the review.

      The authors responded positively to the review efforts from the previous round, but I was underwhelmed with the changes to the text that resulted. Particularly in regard to limiting assumptions - the way that they augmented the text to refer to limitations raised in review downplayed the importance of the assumptions they've made. So they acknowledge the significance of the limitation in their rejoinder, but in the amended text merely note the limitation without giving any sense of what it means for their interpretation of the findings of this study.

      The abstract and findings are essentially unchanged from the previous draft.

      I still feel the near causal statements of interpretation about the covariates are concerning. These models really are not a good candidate for supporting the inference that they are making and there seem to be very strong arguments in favour of adding covariates that are not globally available.

      We never claimed causal interpretation, and we have consistently framed our analyses in terms of associations rather than mechanisms. We acknowledge that one phrasing in the research questions (“Which factors can explain…”) could be misinterpreted, and we are correcting this in the revised version to read “Which factors are associated with…”. Our approach follows standard ecological niche modelling practice, which identifies statistical associations between occurrence data and covariates. As noted in the Discussion section, these associations should not be interpreted as direct causal mechanisms. Finally, all interpretive points in the manuscript are supported by published literature, and we consider this framing both appropriate and consistent with best practice in ecological niche modelling (ENM) studies.

      We assessed predictor contributions using the “relative influence” metric, the terminology reported by the R package “gbm” (Ridgeway, 2020). This metric quantifies the contribution of each variable to model fit across all trees, rescaled to sum to 100%, and should be interpreted as an association rather than a causal effect.

      L65-66 The general difficulty of interpreting ENM output with range-shifting species should be cited here to alert readers that they should not blithely attempt what follows at home.

      I believe that their analysis is interesting and technically very well executed, so it has been a disappointment and hard work to write this assessment. My rough-cut last paragraph of a reframed intro would go something like - there are many reasons in the literature not to do what we are about to do, but here's why we think it can be instructive and informative, within certain guardrails.

      To acknowledge this comment and the previous one, we revised lines 65-66 to: “However, recent outbreaks raise questions about whether earlier ecological niche models still accurately predict the current distribution of areas ecologically suitable for the local circulation of HPAI H5 viruses. Ecological niche model outputs for range-shifting pathogens must therefore be interpreted with caution (Elith et al., 2010). Despite this limitation, correlative ecological niche models  remain useful for identifying broad-scale associations and potential shifts in distribution.”

      We respectfully disagree with the Reviewer’s statement that “_there are many reasons in the literature not to do what we are about to do”._ All modeling approaches, including mechanistic ones, have limitations, and the literature is clear on both the strengths and constraints of ecological niche models. Our manuscript openly acknowledges these limits and frames our findings accordingly. We therefore believe that our use of an ENM approach is justified and contributes valuable insights within these well-defined boundaries.

      Reference: Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007.


      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review):

      I am concerned by the authors' conceptualisation of "niche" within the manuscript. Is the "niche" we are modelling the niche of the pathogen itself? The niche of the (wild) bird host species as a group? The niche of HPAI transmission within (wild) bird host species (i.e., an intersection of pathogen and bird niches)? Or the niche of HPAI transmission in poultry? The precise niche being modelled should be clarified in the Introduction or early in the Methods of the manuscript. The first two definitions of niche listed above are relevant, but separate from the niche modelled in the manuscript - this should be acknowledged.

      We acknowledge that these concepts were probably not enough clearly defined in the previous version of our manuscript, and we have now included an explicit definition in the fourth paragraph of the Introduction section: “We developed separate ecological niche models for wild and domestic bird HPAI occurrences, these models thus predicting the ecological suitability for the risk of local viral circulation leading to the detection of HPAI occurrences within each host group (rather than the niche of the virus or the host species alone).”

      The authors should consider the precise transmission cycle involved in each HPAI case: "index cases" in farmed poultry, caused by "spillover" from wild birds, are relevant to the wildlife transmission cycle, while the ecological conditions coinciding with subsequent transmission in farmed poultry are likely to be fundamentally different. (For example, subsequent transmission is not conditional on the presence of wild birds.) Modelling these two separate, but linked, transmission cycles together may omit important nuances from the modelling framework.

      We thank the Reviewer for highlighting the distinction between primary (wild-todomestic) and secondary (farm-to-farm) transmission cycles. Our modelling framework was designed to assess the ecological suitability of HPAI occurrences in wild and domestic birds separately. In the domestic poultry models, the response variables are the confirmed outbreaks data and do not distinguish between index cases resulting from primary or secondary infections.

      One of the aims of the study is to evaluate the spatial distribution of areas ecologically suitable for local H5N1/x circulation either leading to domestic or wild bird cases, i.e. to identify environmental conditions where the virus may have persisted or spread, whether as a result of introduction by wild birds or farm-to-farm transmission. Introducing mechanistic distinctions in the response variable would not necessarily improve or affect the ecological suitability maps, since each type of transmission is likely to be associated with different covariates that are included in the models.

      Also, the EMPRES-i database does not indicate whether each record corresponds to an index case or a secondary transmission event, so in practice it would not be possible to produce two different models. However, we agree that distinguishing between types of transmission is an interesting perspective for future research. This could be explored, for example, by mapping interfaces between wild and domestic bird populations or by inferring outbreak transmission trees using genomic data when available.

      To avoid confusion, we now explicitly clarify this aspect in the Materials and Methods section: “It is important to note that the EMPRES-i database does not distinguish between index cases (e.g., primary spillover from wild birds) and secondary farm-to-farm transmissions. As such, our ecological niche models are trained on confirmed HPAI outbreaks in poultry that may result from different transmission dynamics — including both initial introduction events influenced by environmental factors and subsequent spread within poultry systems.”

      We now also address this limitation in the Discussion section: “Finally, our models for domestic poultry do not distinguish between primary introduction events (e.g., spillover from wild birds) and secondary transmission between farms due to limitations in the available surveillance data. While environmental factors likely influence the risk of initial spillover events, secondary spread is more often driven by anthropogenic factors such as biosecurity practices and poultry trade, which are not included in our current modelling framework.”

      The authors should clarify the meaning of "spillover" within the HPAI transmission cycle: if spillover transmission is from wild birds to farmed poultry, then subsequent transmission in poultry is separate from the wildlife transmission cycle. This is particularly relevant to the Discussion paragraph beginning at line 244: does "farm to farm transmission" have a distinct ecological niche to transmission between wild birds, and transmission between wild birds and farmed birds? And while there has been a spillover of HPAI to mammals, could the authors clarify that these detections are dead-end? And not represented in the dataset? Dhingra et al., 2016 comment on the contrast between models of "directly transmitted" pathogens, such as HPAI, and vector-borne diseases: for vector-borne diseases, "clear eco-climatic boundaries of vectors can be mapped", whereas "HPAI is probably not as strongly environmentally constrained". This is an important piece of nuance in their Discussion and a comment to a similar effect may be of use in this manuscript.

      Following the Reviewer’s previous comment, we have now added clarifications in the Methods and Discussion sections defining spillover as the transmission of HPAI viruses from wild birds to domestic poultry (index cases), and secondary transmission as onward spread between farms. As mentioned in our answer above, we now emphasise that our models do not distinguish these dynamics, which are likely to be influenced by different drivers — ecological in the case of spillover, and often anthropogenic (e.g., poultry trade movement, biosecurity) in the case of farm-to-farm transmission.

      The discussion regarding farm-to-farm transmission and spillovers is indeed an interpretation derived from the covariates analysis (see the second paragraph in the Discussion section). Specifically, we observed a stronger association between HPAI occurrences and domestic bird density after 2020, which may suggest that secondary infections (e.g., farm-to-farm transmission) became more prominent or more frequently reported. We however acknowledge that our data do not allow us to distinguish primary introductions from secondary transmission events, and we have added a sentence to explicitly clarify this: “However, this remains an interpretation, as the available data do not allow us to distinguish between index cases and secondary transmission events.”

      We thank the Reviewer for raising the point of mammalian infections. While spillover events of HPAI into mammals have been documented, these detections are generally considered dead-end infections and do not currently represent sustained transmission chains. As such, they fall outside the scope of our study, which focuses on avian hosts and models ecological suitability for outbreaks in wild and domestic birds. However, we agree that future work could explore the spatial overlap between mammalian outbreak detections and ecological suitability maps for wild birds to assess whether such spillovers may be linked to localised avian transmission dynamics.

      Finally, we have added a comment about the differences between pathogens strongly constrained by the environments and HPAI: “This suggests that HPAI H5Nx is not as strongly environmentally constrained as vector-borne pathogens, for which clear eco-climatic boundaries (e.g., vector borne diseases) can be mapped (Dhingra et al., 2016).” This aligns with the interpretation provided by Dhingra and colleagues (2016) and helps contextualise the predictive limitations of ecological niche models for directly transmitted pathogens like HPAI.

      There are several places where some simple clarification of language could answer my questions related to ecological niches. For example, on line 74, "the ecological niche" should be followed by "of the pathogen", or "of HPAI transmission in wild birds", or some other qualifier that is most appropriate to the Authors' conceptualisation of the niche modelled in the manuscript. Similarly, in the following sentence, "areas at risk" could be followed by "of transmission in wild birds", to make the transmission cycle that is the subject of modelling clear to the reader. On line 83, it is not clear who or what is the owner of "their ecological niches": is this "poultry and wild birds", or the pathogen?

      We agree with that suggestion and have now modified the related part of the text  accordingly (e.g., “areas at risk for local HPAI circulation” and “of HPAI in wild or domestic birds”).

      I am concerned by the authors' treatment of sampling bias in their BRT modelling framework. If we are modelling the niche of HPAI transmission, we would expect places that are more likely to be subject to disease surveillance to be represented in the set of locations where the disease has been detected. I do not agree that pseudo-absence points are sampled "to account for the lack of virus detection in some areas" - this description is misleading and does not match the following sentence ("pseudo-absence points sampled ... to reflect the greater surveillance efforts ..."). The distribution of pseudo-absences should aim to capture the distribution of probable disease surveillance, as these data act as a stand-in for missing negative surveillance records. It is sensible that pseudo-absences for disease detection in wild birds are sampled proportionately to human population density, as the disease is detected in dead wild birds, which are more likely to be identified close to areas of human occupation (as stated on line 163). However, I do not agree that the same applies to poultry - the density of farmed poultry is likely to be a better proxy for surveillance intensity in farmed birds. Human population density and farmed poultry density may be somewhat correlated (i.e., both are low in remote areas), but poultry density is likely to be higher in rural areas, which are assumed to have relatively lower surveillance intensity under the current approach. The authors allude to this in the Discussion: "monitoring areas with high intensive chicken densities ... remains crucial for the early detection and management of HPAI outbreaks".

      We agree with the Reviewer's comment that poultry density could have potentially been considered to guide the sampling effort of the pseudo-absences to consider when training domestic bird models. We however prefer to keep using a human population density layer as a proxy for surveillance bias to define the relative probability to sample pseudoabsence points in the different pixels of the background area considered when training our ecological niche models. Indeed, given that poultry density is precisely one of the predictors that we aim to test, considering this environmental layer for defining the relative probability to sample pseudo-absences would introduce a certain level of circularity in our analytical procedure, e.g. by artificially increasing to influence of that particular variable in our models.

      Furthermore, it is also worth noting that, to better account for variations in surveillance intensity, we also adjusted the sampling effort by allocating pseudo-absences in proportion to the number of confirmed outbreaks per administrative unit (country or sub-national regions for Russia and China). This approach aimed to reduce bias caused by uneven reporting and surveillance efforts between regions. Additionally, we restricted model training to countries or regions with a minimum surveillance threshold (at least five confirmed outbreaks per administrative unit). Therefore, both presence and pseudo-absence points originated from areas with more consistent surveillance data.

      We acknowledge in the Materials and Methods section that the approach proposed by the Reviewer could have been used: “Another approach to sampling pseudo-absences would have been to distribute them according to the density of domestic poultry.” Finally, our approach is also justified in our response to the next comment of the Reviewer.

      Having written my review, including the paragraph above, I briefly scanned Dhingra et al., and found that they provide justification for the use of human population density to sample pseudoabsences in farmed birds: "the Empres-i database compiles outbreak locations data from very heterogeneous sources and in the absence of explicit GPS location data, the geo-referencing of individual cases is often through the use of place name gazetteers that will tend to force the outbreak location populated place, rather in the exact location of the farm where the disease was found, which would introduce a bias correlated with human population density." This context is entirely missing from the manuscript under review, however, I maintain the comment in the paragraph above - have the Authors trialled sampling pseudo-absences from poultry density layers?

      We agree with the Reviewer’s comment and have now added this precision in the Materials and Methods section (in the third paragraph dedicated to ecological niche modelling): “However, as pointed out by Dhingra and colleagues (2016), the locations of outbreaks in the EMPRES-i database are often georeferenced using place name nomenclatures due to a lack of accurate GPS data, which could introduce a spatial bias towards populated areas.”

      The authors indirectly acknowledge the role of sampling bias in model predictions at line 163, however, this point could be clearer: there is sampling bias in the set of locations where HPAI has been observed and failure to adequately replicate this sampling bias in pseudo-absence data could lead covariates that are correlated with the observation distribution to appear to be correlated with the target distribution. This point is alluded to but should be clearly acknowledged to allow the reader to appropriately interpret your results. I understand the point being made on line 163 is that surveillance of HPAI in wild birds has become more structured and less opportunistic over time - if this is the case, a statement to this effect could replace "which could influence earlier data sets", which is a little ambiguous. The Authors acknowledge the role of sampling bias in lines 241-242 - this may be a good place to remind the reader that they have attempted to incorporate sampling bias through the selection of their pseudoabsence dataset, particularly for wild bird models.

      We thank the Reviewer for this comment. We have now clarified in the text that observed data on HPAI occurrence are inherently influenced by heterogeneous surveillance efforts and that failure to replicate this bias in pseudo-absence sampling could effectively lead to misleading correlations with covariates associated with surveillance effort rather than true ecological suitability. We have now rephrased the related sentence as follows: “This decline may indicate a reduced bias in observation data: typically, dead wild birds are more frequently found near human-populated areas due to opportunistic detections, whereas more recent surveillance efforts have become increasingly proactive (Giacinti et al., 2024).”

      Dhingra et al. aimed to account for the effect of mass vaccination of birds in China. This does not appear to be included in the updated models - is this a relevant covariate to consider in updated models? Are the models trained on pre-2020 data predicting to post-2020 given the same presence dataset as previous models? It may be helpful to provide a comment on this if we consider the pre-2020 models in this work to be representative of pre-2020 models as a cohort. Given the framing of the manuscript as an update to Dhingra et al., it may be useful for the authors to briefly summarise any differences between the existing models and updated models. Dhingra et al., also examine spatial extrapolation, which is not addressed here. Environmental extrapolation may be a useful metric to consider: are there areas where models are extrapolating that are predicted to be at high risk of HPAI transmission? Finally, they also provide some inset panels on global maps of model predictions - something similar here may also be useful.

      We thank the Reviewer for these comments. Vaccination coverage is indeed a relevant covariate for HPAI suitability in domestic birds. However, we did not include this variable in our updated models for two reasons. First, comprehensive vaccination data were only available for China, so it is not possible to include this variable in a global model. Second, available data were outdated and vaccination strategies can vary substantially over time.

      We however agree with the Reviewer that the Materials and Methods section did not clarify clearly the differences with Dhingra et al. (2016), and we now detail these differences at the beginning of the Materials and Methods section: “Our approach is similar to the one implemented by Dhingra and colleagues (2016). While Dhingra et al. (2016) developed their models only for domestic birds over the 2003-2016 periods, our models were developed for two host species separately (wild and domestic birds) and for two time periods (2016-2020 and 2020-2023).”

      We also detail the main difference concerning the pseudo-absences sampling:  Dhingra and colleagues (2016) used human population density to sample pseudo-absences to reflect potential surveillance bias and also account for spatial filtering (min/max distances from presence). We adopted a similar strategy but also incorporated outbreak count per country or province (in the case of China and Russia) into the pseudo-absence sampling process to further account for within-country surveillance heterogeneity. We have now added these specifications in the Materials and Methods section: “To account for heterogeneity in AIV surveillance and minimise the risk of sampling pseudo-absences in poorly monitored regions, we restricted our analysis to countries (or administrative level 1 units in China and Russia) with at least five confirmed outbreaks. Unlike Dhingra et al. (2016), who sampled pseudoabsences across a broader global extent, our sampling was limited to regions with demonstrated surveillance activity. In addition, we adjusted the density of pseudo-absence points according to the number of reported outbreaks in each country or admin-1 unit, as a proxy for surveillance effort — an approach not implemented in this previous study.”

      We have now also provided a comparison between the different outputs, particularly in the Results section: “Our findings were overall consistent with those previously reported by Dhingra and colleagues (Dhingra et al., 2016), who used data from January 2004 to March 2015 for domestic poultry. However, some differences were noted: their maps identified higher ecological suitability for H5 occurrences before 2016 in North America, West Africa, eastern Europe, and Bangladesh, while our maps mainly highlight ecologically suitable regions in China, South-East Asia, and Europe (Fig. S5). In India, analyses consistently identified high ecologically suitable areas for the risk of local H5Nx and H5N1 circulation for the three time periods (pre-2016, 2016-2020, and post-2020). Similar to the results reported by Dhingra and colleagues, we observed an increase in the ecological suitability estimated for H5N1 occurrence in South America's domestic bird populations post-2020. Finally, Dhingra and colleagues identified high suitability areas for H5Nx occurrence in North America, which are predicted to be associated with a low ecological suitability in the 2016-2020 models.”

      We acknowledge that some regions predicted as highly suitable correspond to areas where extrapolation likely occurs due to limited or no recorded outbreaks. We have now added these specifications when discussing the resulting suitability maps obtained for domestic birds: “For H5Nx post-2020, areas of high predicted ecological suitability, such as Brazil, Bolivia, the Caribbean islands, and Jilin province in China, likely result from extrapolations, as these regions reported few or no outbreaks in the training data”, and, for wild birds: “Some of the areas with high predicted ecological suitability reflect the result of extrapolations. This is particularly the case in coastal regions of West and North Africa, the Nile Basin, Central Asia (Kyrgyzstan, Tajikistan, Uzbekistan), Brazil (including the Amazon and coastal areas), southern Australia, and the Caribbean, where ecological conditions are similar to those in areas where outbreaks are known to occur but where records of outbreaks are still rare.”

      For wild birds (H5Nx, post-2020), high ecological suitability was predicted along the West and North African coasts, the Nile basin, Central Asia (e.g., Kyrgyzstan, Tajikistan, Uzbekistan), the Brazilian coast and Amazon region, Caribbean islands, southern Australia, and parts of Southeast Asia. Ecological suitability estimated in these regions may directly result from extrapolations and should therefore be interpreted cautiously.

      We also added a discussion of the extrapolation for wild birds (in the Discussion section): “Interestingly, our models extrapolate environmental suitability for H5Nx in wild birds in areas where few or no outbreaks have been reported. This discrepancy may be explained by limited surveillance or underreporting in those regions. For instance, there is significant evidence that Kazakhstan and Central Asia play a role as a centre for the transmission of avian influenza viruses through migratory birds (Amirgazin et al., 2022; FAO, 2005; Sultankulova et al., 2024). However, very few wild bird cases are reported in EMPRES-i. In contrast, Australia appears environmentally suitable in our models, yet no incursion of HPAI H5N1 2.3.4.4b has occurred despite the arrival of millions of migratory shorebirds and seabirds from Asia and North America. Extensive surveillance in 2022 and 2023 found no active infections nor evidence of prior exposure to the 2.3.4.4b lineage (Wille et al., 2024; Wille and Klaassen, 2023).”

      We agree that inset panels can be helpful for visualising global patterns. However, all resulting maps are available on the MOOD platform (https://app.mood-h2020.eu/core), which provides an interactive interface allowing users to zoom in and out, identify specific locations using a background map, and explore the results in greater detail. This resource is referenced in the manuscript to guide readers to the platform.

      Related to my review of the manuscript's conceptualisation above, there are several inconsistencies in terminology in the manuscript - clearing these up may help to make the methods and their justification clearer to the reader. The "signal" that the models are estimating is variously described as "susceptibility" and "risk" (lines 179-180), "HPAI H5 ecological suitability" (line 78), "likelihood of HPAI occurrences" (line 139), "risk of HPAI circulation" (line 187), "distribution of occurrence data" (line 428). Each of these quantities has slightly different meanings and it is confusing to the reader that all of these descriptors are used for model output. "Likelihood of HPAI occurrences" is particularly misleading: ecological niche models predict high suitability for a species in areas that are similar to environments where it has previously been identified, without imposing constraints on species movement. It is intuitively far more likely that there will be HPAI occurrences in areas where the disease is already established than in areas where an introduction event is required, however, the niche models in this work do not include spatial relationships in their predictions.

      We agree with the Reviewer’s comments. We have now modified the text so that in the Results section we refer to ecological suitability when referring to the outputs of the models. In the context of our Discussion section, we then interpret this ecological suitability in terms of risk, as areas with high ecological suitability being more likely to support local HPAI outbreaks.

      I also caution the authors in their interpretation of the results of BRTs, which are correlative models, so therefore do not tell us what causes a response variable, but rather what is correlated with it. On Line 31, "correlated with" may be more appropriate than "influenced by". On Line 82, "correlated with" is more appropriate than "driving". This is particularly true given the authors' treatment of sampling bias.

      We agree with the Reviewer’s comment and have now rephrased these sentences as follows: “The spatial distribution of HPAI H5 occurrences in wild birds appears to be primarily correlated with urban areas and open water regions” and “Our results provide a better understanding of HPAI dynamics by identifying key environmental factors correlated with the increase in H5Nx and H5N1 cases in poultry and wild birds, investigating potential shifts in their ecological niches, and improving the prediction of at-risk areas.”

      The following sentences in line 201 are ambiguous: "For both H5Nx and H5N1, however, isolated areas on the risk map should be interpreted with caution. These isolated areas may result from sparse data, model limitations, or local environmental conditions that may not accurately reflect true ecological suitability." By "isolated", do the authors mean remote? Or ecologically dissimilar from the set of locations where HPAI has been detected? Or ecologically dissimilar from the set of locations in the joint set of HPAI detection locations and pseudo-absences? Or ecologically similar to the set of locations where HPAI has been detected but spatially isolated? These four descriptors are each slightly different and change the meaning of the sentences. "Model limitations" are also ambiguous - could the authors clarify which specific model limitations they are referring to here? Ultimately, the point being made is probably that a model may predict high ecological suitability for HPAI transmission in areas where the disease has not yet been identified, or where a model is extrapolating in environmental space, however, uncertainty in these predictions may be greater than uncertainty in predictions in areas that are represented in surveillance data. A clear comment on model uncertainty and how it is related to the surveillance dataset and the covariate dataset is currently missing from the manuscript and would be appropriate in this paragraph.

      We understand the Reviewer’s concerns regarding these potential ambiguities, and have now rephrased these sentences as follows: “For both H5Nx and H5N1, certain areas of predicted high ecological suitability appear spatially isolated, i.e. surrounded by regions of low predicted ecological suitability. These areas likely meet the environmental conditions associated with past HPAI occurrences, but their spatial isolation may imply a lower risk of actual occurrences, particularly in the absence of nearby outbreaks or relevant wild bird movements.”

      I am concerned by the wording of the following sentence: "The risk maps reveal that high-risk areas have expanded after 2020" (line 203). This statement could be supported by an acknowledgement of the assumptions the models make of the HPAI niche: are we saying that the niche is unchanged in environmental space and that there are now more geographic areas accessible to the pathogen, or that the niche has shifted or expanded, and that there are now more geographic areas accessible to the pathogen? The authors should review the sentence beginning on line 117: if models trained on data from the old timepoint predicting to the new timepoint are almost as good as models trained on data from the new timepoint predicting to the new timepoint, doesn't this indicate that the niche, as the models are able to capture it, has not changed too much?

      We thank the Reviewer for this comment. The statement that "high-risk areas have expanded after 2020" indeed refers to an increase in the geographic extent of areas predicted to have high ecological suitability in models trained on post-2020 data. This expansion likely reflects new outbreak data from regions that had not previously reported cases, which in turn influenced model training.

      However, models trained on pre-2020 data retain reasonable predictive performance when applied to post-2020 data (see the AUC results reported in Table S1), suggesting that the models suggest an expansion in the ecological suitability, but do not provide definitive evidence of a shift in the ecological niche. We have now added a statement at the end of this paragraph to clarify this point: “However, models trained on pre-2020 data maintained reasonable predictive performance when tested on post-2020 data, suggesting that the overall ecological niche of HPAI did not drastically shift over time.”

      The final two paragraphs of the Results might be more helpful to include at the beginning of the Results, as the data discussed there are inputs to the models. Is it possible that the "rise in Shannon index for sea birds" that "suggests a broadening of species diversity within this category from 2020 onwards" is caused by the increasingly structured surveillance of HPAI in wild birds alluded to earlier in the Results? Is the "prevalence" discussed in line 226 the frequency of the families Laridae and Sulidae being represented in HPAI detection data? Or the abundance of the bird species themselves? The language here is a little ambiguous. Discussion of particular values of Shannon/Simpson indices is slightly out of context as the meanings of the indices are in the Methods - perhaps a brief explanation of the uses of Shannon/Simpson indices may be helpful to the reader here. It may also be helpful to readers who are not acquainted with avian taxonomy to provide common names next to formal names (for example, in brackets) in the body of the text, as this manuscript is published in an interdisciplinary journal.

      We thank the Reviewer for these comments. First, we acknowledge that the paragraphs on species diversity and Shannon/Simpson indices describe important data, but we have chosen to present them after the main modelling results in order to maintain a logical narrative flow. Our manuscript first presents the ecological niche models and their predictive performance, followed by interpretations of the observed patterns, including changes in avian host diversity. Diversity indices were used primarily to support and contextualise the patterns observed in the modelling results.

      For clarity, we have revised the relevant paragraphs in the Results (i) to briefly remind readers of the interpretation of the Shannon and Simpson indices (“Note that these indices reflect the diversity of bird species detected in outbreak records, not necessarily their abundance in the wild”) and (ii) to clarify that “prevalence” refers to the frequency of HPAI detection in wild bird species of the Laridae (gulls) and Sulidae (boobies and gannets) families, and not their total abundance. Family of birds includes several species, so the “common name” of a family can sometimes refer to species from other families. We have now added the common names for each family in the manuscript (even if we indeed acknowledge that “penguins” can be ambiguous).

      In the Methods, it is stated: "To address the heterogeneity of AIV surveillance efforts and to avoid misclassifying low-surveillance areas as unsuitable for virus circulation, we trained the ecological niche models only considering countries in which five or more cases have been confirmed." However, it is not clear how this processing step prevents low-surveillance areas from being misclassified. If pseudo-absences are appropriately sampled, low-surveillance areas should be less represented in the pseudo-absence dataset, which should lead the models to be uncertain in their predictions of these areas. Perhaps "To address the heterogeneity of AIV surveillance efforts and to avoid sampling pseudo-absence data in realistically low-surveillance areas" is a more accurate introduction to the paragraph. I am not entirely convinced that it is appropriate to remove detection data where the national number of cases is low. This may introduce further sampling bias into the dataset.

      We take the opportunity of the Reviewer’s comment to further clarify this important step aiming to mitigate bias associated with countries with substantial uncertainty in reporting and/or potentially insufficient HPAI surveillance data. While we indeed acknowledge that this procedure may exclude countries that had effective surveillance but low virus detection, we argue that it constitutes a relevant conservative approach to minimising the risk of sampling a significant number of pseudo-absence points in areas associated with relatively high yet undetected local HPAI circulation due to insufficient surveillance. Furthermore, given that five cases over two decades is a relatively low threshold — particularly for a highly transmissible virus such as AIV — non-detection or non-reporting remains a more plausible explanation than true absence.

      To improve clarity, we have now revised the related sentence as follows: “To account for heterogeneity in AIV surveillance and minimise the risk of sampling pseudo-absences in poorly monitored regions, we restricted our analysis to countries (or administrative level 1 units in China and Russia) with at least five confirmed outbreaks.”

      The reporting of spatial and temporal resolution of data in the manuscript could be significantly clearer. Is there a reason why human population density is downscaled to 5 arcminutes (~10km at the equator) while environmental covariate data has a resolution of 1km? The projection used is not reported. The authors should clarify the time period/resolution of the covariate data assigned to the occurrence dataset, for example, does "day LST annual mean" represent a particular year pre- or post-2020? Or an average over a number of years? Given that disease detections are associated with observation and reporting dates, and that there may be seasonal patterns in HPAI occurrence, it would be helpful to the reader to include this information when the eco-climatic indices are described. It would also be helpful to the reader to summarise the source, spatial and temporal resolution of all covariates in a table, as in Dhingra et al. Could the Authors clarify whether the duck density layer is farmed ducks or wild ducks?

      The projection is WGS 84 (EPSG:4326) and the resolution of the output maps is around 0.0833 x 0.0833 decimal degrees (i.e. 5 arcmin, or approximately 10 km at the equator). We have now added these specifications in the text: “All maps are in a WGS84 projection with a spatial resolution of 0.0833 decimal degrees (i.e. 5 arcmin, or approximately 10 km at the equator).” In addition, we have now specified in the text that duck refers to domestic duck for clarity. 

      Environmental variables retrieved for our analyses were here available as values averaged over distinct periods of time (for further detail see Supplementary Information Resources S1 — description and source of each environmental variable included in the original sets of variables — available at https://github.com/sdellicour/h5nx_risk_mapping). In future works, this would indeed be interesting to associate the occurrences to a specific season with the variables accordingly, specially for viruses such as HPAI which have been found correlated with seasons. However, we did not conduct this type of analysis in the present study, occurrences being here associated with averaged values of environmental data only.

      In line 407, the authors state a number of pseudo-absence points used in modelling, relative to the number of presence points, without clear justification. Note that relative weights can be assigned to occurrence data in most ECN software (e.g., R package gbm), to allow many pseudo-absence points to be sampled to represent the full extent of probable surveillance effort and subsequently down-weighted.

      We thank the Reviewer for this suggestion. We acknowledge that alternative approaches such as down-weighting pseudo-absence points could offer a certain degree of flexibility in representing surveillance effort. However, we opted for a fixed 1:3 ratio of pseudoabsences to presence points within each administrative unit to ensure a consistent and conservative sampling distribution. This approach aimed to limit overrepresentation of pseudoabsences in areas with sparse presence data, while still reflecting areas of likely surveillance.

      There are a number of typographical errors and phrasing issues in the manuscript. A nonexhaustive list is provided below.

      - Line 21: "its" should be "their" - Line 25: "HPAI cases"

      Modifications have been done.

      - Line 63: sentence beginning "However" is somewhat out of context - what is it (briefly) about recent outbreaks that challenge existing models?

      We have now edited that sentence as follows: “However, recent outbreaks raise questions about whether earlier ecological niche models still accurately predict the current distribution of areas ecologically suitable for the local circulation of HPAI H5 viruses.”

      - Lines 71 and 390: "AIV" is not defined in the text - Line 73: "do" ("are" and "what" are not capitalised)

      Modifications have been done.

      - Line 115: "predictability" should be "predictive capacity"

      We have now replaced “predictability” by “predictive performance”.

      - Line 180: omit "pinpointing"

      - Line 192 sentence beginning "In India," should be re-worded: is the point that there are detections of HPAI here and the model predicts high ecological suitability?

      - Line 195 sentence beginning "Finally," phrasing could be clearer: Dhingra et al. find high suitability areas for H5Nx in North America which are predicted to be low suitability in the new model.

      - Line 237: omit "the" in "with the those"

      - Line 374: missing "."

      - Line 375: "and" should be "to" (the same goes for line 421)

      - Line 448: Rephrase "Simpson index goes" to "The Simpson index ranges"

      Modifications have been done.

      Reviewer #2 (Public Review):

      What is the justification for separating the dataset at 2020? Is it just the gap in-between the avian influenza outbreaks?

      We chose 2020 as a cut-off based on a well-documented shift in HPAI epidemiology, notably the emergence and global spread of clade 2.3.4.4b, which may affect host dynamics and geographic patterns. We have now added this precision in the Materials and Methods section: “We selected 2020 as a cut-off point to reflect a well-documented shift in HPAI epidemiology, notably the emergence and global spread of clade 2.3.4.4b. This event marked a turning point in viral dynamics, influencing both the range of susceptible hosts and the geographical distribution of outbreaks.”

      If the analysis aims to look at changing case numbers and distribution over time, surely the covariate datasets should be contemporaneous with the response?

      Thank you for raising this important point. While we acknowledge that, ideally, covariates should match the response temporally, such high-resolution spatiotemporal environmental data were not available for most environmental factors considered in our ecological niche modelling analyses. While we used predictors (e.g., land-use variables, poultry density) that reflect long-term ecological suitability, we acknowledge that rather considering short-term seasonal variation could be an interesting perspective in future works, which is now explicitly stated in the Discussion section: “In addition, aligning outbreak occurrences with seasonally matched environmental variables could further refine predictions of HPAI risk linked to migratory dynamics.”

      I would expect quite different immunity dynamics between domestic and wild birds as a function of lifespan and birth rates - though no obvious sign of that in the raw data. A statement on assumptions in that respect would be good.

      Thank you for the comment. We agree that domestic and wild birds likely exhibit different immunity dynamics due to differences in lifespan, turnover rates, and exposure. However, our analyses did not explicitly model immunity processes, and the data did not show a clear signal of these differences.

      Decisions and analytical tactics from Dhingra et al are adopted here in a way that doesn't quite convey the rationale, or justify its use here.

      We thank the Reviewer for this observation. However, we do not agree with the notion that the rationale for using Dhingra et al.’s analytical framework is insufficiently conveyed. We adapted key components of their ecological niche modelling approach — such as the use of a boosted regression tree methodology and pseudo-absences sampling procedure — to ensure comparability with their previous findings, while also extending the analysis to additional time periods and host categories (wild vs. domestic birds). This framework aligns with the main objective of our study, which is to assess shifts in ecological suitability for HPAI over time and across host species, in light of changing viral dynamics.  

      Please go over the manuscript and harmonise the language about the model target - it is usually referred to as cases, but sometimes the pathogen, and others the wild and domestic birds where the cases were discovered.

      We agree and we have now modified the text to only use the “cases” or “occurrences” terminology when referring to the model inputs.

      Is the reporting of your BRT implementation correct? The text suggests that only 10 trees were run per replicate (of which there were 10 per response (domestic/wild x H5N1 / H5Nx) x distinct covariate set), but this would suggest that the authors were scarcely benefiting from the 'boosting' part of the BRTs that allow them to accurately estimate curvilinear functions. As additional trees are added, they should still be improving the loss function, and dramatically so in the early stages. The authors seem heavily guided by Elith et al's excellent paper[1] explaining BRTs and the companion tutorial piece, but in that work, the recommended approach is to run an initial model with a relatively quick learning rate that achieves the best fit to the held-out data at somewhere over 1000 trees, and then to refine the model to that number of trees with a slower learning rate. If the authors did indeed run only 10 trees I think that should be explained.

      For each model, we used the “gbm.step” function to fit boosted regression trees, initiating the process with 10 trees and allowing up to 10,000 trees in steps of 5. The optimal number of trees was automatically determined by minimising the cross-validated deviance, following the recommended approach of Elith and colleagues (2008, J. Anim. Ecol.). This setup allows the boosting algorithm to iteratively improve model performance while avoiding overfitting. These aspects are now further clarified in the Materials and Methods section: “All BRT analyses were run and averaged over 10 cross-validated replicates, with a tree complexity of 4, a learning rate of 0.01, a tolerance parameter of 0.001, and while considering 5 spatial folds. Each model was initiated with 10 trees, and additional trees were incrementally added (in steps of 5) up to a maximum of 10,000, with the optimal number selected based on cross-validation tests.”

      I'm uncomfortable with the strong interpretation of changes in indices such as those for diversity in the case of bird species with detected cases of avian influenza, and the relative influence of covariates in the environmental niche models. In the former case, if surveillance effort is increasing it might be expected that more species will be found to be infected. In the latter, I'm just not convinced that these fundamentally correlative models can support the interpretation of changing epidemiology as asserted by authors. This strikes me as particularly problematic in light of static and in some cases anachronistic predictor sets.

      We thank the Reviewer for drawing attention to how changes in surveillance intensity might influence our diversity estimates. We have now integrated a new analysis to evaluate the increase in the number of wild birds tested and discussed the potential impact of this increase on the comparison of the bird species diversity metrics presented in our study, which is now interpreted with more caution: “To evaluate whether the post-2020 increase in species diversity estimated for infected wild birds could result from an increase in the number of tests performed on wild birds, we compared European annual surveillance test counts (EFSA et al., 2025, 2019) before and after 2020 using a Wilcoxon rank-sum test. We relied on European data because it was readily accessible and offered standardised and systematically collected metrics across multiple years, making it suitable for a comparative analysis. Although borderline significant (p-value = 0.063), the Wilcoxon rank-sum test indeed highlighted a recent increase in the number of wild bird tests (on average >11,000/year pre-2020 and >22,000 post-2020), which indicates that the comparison of bird species diversity metrics should be interpreted with caution. However, such an increase in the number of tests conducted in the context of a passive surveillance framework would thus also be in line with an increase in the number of wild birds found dead and thus tested. Therefore, while the increase in the number of tests could indeed impact species diversity metrics such as the Shannon index, it can also reflect an absolute higher wild bird mortality in line with a broadened range of infected bird species.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of uthis interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation (see Author response image 1). We, however, do not yet have data to support this and thus have not included this model in the manuscript. Yet, we have updated the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      We have updated the discussion to include a discussion on the potential consequences on posttranscriptional regulation by Bicc1.

      Author response image 1.

      Model of BICC1, PC1 and PC2 self-regulation. In this model Bicc1 acts as a positive regulator of PKD gene expression. In the presence of ‘sufficient’ amounts of PC1/PC2 complex, it is tethered to the complex and remains biologically inactive (Fig. 1A). However, once the levels of the PC1/PC2 complex are reduced, Bicc1 is now present in the cytoplasm to promote expression of the PKD proteins, thereby raising their levels (Fig. 4B), which then in turn will ‘shutdown’ Bicc1 activity by again tethering it to the plasma membrane.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require utilization of the mice described in above reference, which is beyond the scope of this manuscript. We, however, have revised the discussion to elaborate on this potential mechanism. 

      We have updated the discussion to include a statement on the potential direct regulation of Pkd1 mRNA by Bicc1.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, similar to the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed when we sacrificed the mice as late as P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing us to the reference showing the heterozygous mice exhibit glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that a better understanding of the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are beyond the timeframe for this revision. 

      No changes were made in the revised manuscript. 

      Reviewer #2 (Public review):

      (1) These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed. 

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      (2) The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been. 

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. As presented below, most of the criticisms raised by the reviewer have been easily addressed in the revised version of the manuscript. Yet, none of the critiques seems to directly impact the overall interpretation of the data. 

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript requires further editing. For example, figure panels and legends are mismatched in Figure 1

      We have corrected the labeling of Figure 1. 

      (2) Y-axis units and values are inconsistent in Figures 4b-4g, Supplementary Figures S2e and S2f are not referenced in the text, genotypes are missing in Supplementary Figure S3f, and numerous typographical errors are present.

      In respect to the y-axis in Figure 4b-g, the scale is different for each of them, but that is intentional as one would lose the differences if they were all scaled identically. But we have now mentioned this in the figure legend to make the reader aware of it. In respect to the Supplemental Figure S2e,f, we included the panels in the description of the mutant BICC1 lines, but unfortunately forgot to reference them. This has now been done.

      We have updated the labeling of the Y-axis for the cystic indices adding “[%]” as the unit and updated the figure legend of Figure 4. We have included the genotypes in Supplementary Figure S3f. The Supplementary Figure S2e,f is now mentioned in the supplemental material (page 9, 2<sup>nd</sup> paragraph). 

      Reviewer #2 (Recommendations for the authors):

      (1) Previous data from mouse, Xenopus, and zebrafish suggest a crucial role for the RNAbinding protein Bicc1 in the pathogenesis of PKD, although BICC1 mutations in human PKD have not been previously reported." The cited sources (and others that were not cited) link Bicc1 mutations to renal cysts, similar to a report by Kraus (PMID: 21922595) that the authors cite later. However, a more direct link to PKD was reported by Lian and colleagues using whole Pkd1 mice (PMID: 20219263) and by Gamberi and colleagues using Pkd1 kidneys and human microarrays (PMID: 28406902). Although relevant, neither is cited here, and only the former is cited later in the manuscript.

      Thanks for pointing this out. We have added these three citations.

      We have added these three citations (PMID: 21922595, PMID: 20219263 and PMID: 28406902) in the indicated sentence.

      (2) In Figure 1B, the lanes do not seem to correspond among panels, particularly evident in the panel with myc-mBicc1. Hence, it is difficult to agree with the presented conclusions.

      We have corrected the labeling of the lanes in Figure 1b.

      (3) In the Figure 1 legend: "(g) Western blot analysis following co-IP experiments, using an anti-mouse Bicc1 or anti-goat PC2 antibody as bait, identified protein interactions between endogenous PC2 and BICC1 in UCL93 cells. Non-immune goat and mouse IgG were included as a negative control." There is no mention of panel H, although this reviewer can imagine what the authors meant. The capitalization differs in the figure and legend. More troublingly, in panel G, a non-defined star indicates a strong band present in both immune and non-immune control.

      We have corrected the figure legend of Figure 1 and clarified the non-specific band in the figure legend.

      (4) In Figure 4, the authors do not show the matched control for the Bicc1 Pkd1 interaction in panel d, nor do they show a scale bar in either a) or d). Thus, the phenotypic severity cannot be properly assessed.

      Thanks for pointing out the missing scale bars, which have now been added. In respect to the two kidneys shown in Figure 4d, the two kidneys shown are from littermates to illustrate the kidney size in agreement with the cumulative data shown in Figure 4e. Unfortunately, this litter did not have a wildtype control. As the data analysis in Figure 4e is based on littermates, mixing and matching kidneys of different litters does not seem appropriate. Thus, we have omitted showing a wildtype control in this panel. However, the size of the wildtype kidney can be seen in Figure 4a.

      We have added the scale bar to both panels and have updated the figure legend to emphasize that the kidneys shown are from littermates and that no wildtype littermate was present in this litter.

      (5) "Surprisingly, an 8-fold stronger interaction was observed between full-length PC1 and myc-mBicc1-ΔKH compared to mycmBicc1 or myc-mBicc1-ΔSAM." Assuming all the controls for protein folding and expression levels have been carried out and not shown/mentioned, this sentence seems to contradict the previous statement that Bicc1deltaSAM reduced the interaction with PC1 by 55%. Because the full length and SAM deletion have different interaction strengths, the latter sentence makes no sense.

      The reduction in the levels of myc-mBicc1-ΔSAM compared to wildtype mycmBicc1 in respect to PC1 binding was not significant. We have clarified this in the text.

      We have corrected the sentence and modified the Figure accordingly. 

      (6) Imprecise statements make a reader wonder how to interpret the data: "More than three independent experiments were analyzed." Stating the sample size or including it in the figure would save space and improve confidence in the data presented.

      We have stated the exact number of animals per conditions above each of the bars.

      (7) "Next, we performed a similar mouse study for Pkd1 by reducing the gene dose of Pkd1 postnatally in the collecting ducts using a Pkhd1-Cre as previously described40" What did the authors mean?

      The reference was included to cite the mouse strain, but realized that it can be mis-interpreted that the exact experiments has been performed previously. We have clarified this in the text.

      We have reworded the sentence to avoid misinterpretation. 

      (8) The authors examined the additive effects of knocking down Bicc1, Pkd1, and Pkd2 with morpholinos in Xenopus and, genetically, in mice. While the Bicc1[+/-] Pkd1 or 2[+/-] double heterozygote mice did not show phenotypes, the authors report that the Bicc1[-/-] Pkd1 or 2 [+/-] did instead show enlarged kidneys. What is the phenotype of a Bicc1[+/-] Pkd1 or 2 [-/-]? What we learn from the author's findings among the PKD population suggests that the latter situation would be potentially translationally relevant.

      The mouse experiments were designed to address a cooperativity between Bicc1 and either Pkd1 or Pkd2 and whether removal of one copy of Pkd1 or Pkd2 would further worsen the Bicc1 cystic kidney phenotype. Thus, the parental crosses were chosen to maximize the number of animals obtained for these genotypes. Unfortunately, these crosses did not yield the genotypes requested by the reviewer. To address the contribution of Bicc1 towards the PKD population, we will need to perform a different cross, where we eliminate Pkd1 or Pkd2 in a floxed background of Bicc1 postnatally in adult mice. While we are gearing up to perform such an experiment, this is timewise beyond the scope of the manuscript. In addition, please note that we have addressed the question about the translation towards the PKD population already in the discussion of the original submission (page 13/14, last/first paragraph).

      No changes have been made to the revised version of the manuscript.

      (9) How do the authors interpret the milder effects of the Bicc1[-/-] Pkd1[+/-] compared to Bicc1[-/-] Pkd2[+/-] relative to the respective protein-protein interactions?

      The milder effects are due to the nature of the crosses. While the Pkd2 mutant is a germline mutation, the Pkd1 mutant is a conditional allele eliminating Pkd1 only in the collecting ducts of the kidney. As such, we spare other nephron segments such as the proximal tubules, which also significantly contribute to the cyst load. As such these mouse data support the interaction between Pkd1 and Pkd2 with Bicc1, but do not allow us to directly compare the outcomes. While this was mentioned in the previous version of the manuscript, we have expanded on this in the revised version of the manuscript.

      We have expanded the results section in the revised version of the manuscript highlighting that the two different approaches cannot be directly compared.

      (10) How do the authors interpret that the strong Bicc1[Bpk] Pkd1 or Pkd2 double heterozygote mice did not have defects and "kidneys from Bicc1+/-:Pkd2+/- did not exhibit cysts (data not shown)", when the VEO PKD patients and - although not a genetic reduction - also the morpholino-treated Xenopus did?

      VEO PKD patients are characterized by a loss of function of PKD1 or PKD2 and – as we propose in this manuscript - that BICC1 further aggravates the phenotype. Yet, we do not address either in the mouse or Xenopus experiments whether BICC1 is a genetic modifier. We are simply addressing whether the two genes show a genetic interaction. In the mouse studies, we eliminate one copy of Pkd1 or Pkd2 in the background of a hypomorphic allele of Bicc1. Similarly, in the Xenopus experiments, we employ suboptimal doses of the morpholino oligomers, i.e., concentrations that did not yield a phenotypic change and then asked whether removing both together show cooperativity. It is important to state that this is based on a biological readout and not defined based on the amount of protein. While we have described this already in the original manuscript (page 7, first paragraph), we have amended our description of the Xenopus experiment to make this even clearer. 

      Finally, we agree with the reviewer that if we were to address whether Bicc1 is a modifier of the PKD phenotype in mouse, we would need to reduce Bicc1 function in a Pkd1 or Pkd2 mutants. Yet, we have recognized this already in the initial version of the manuscript in the discussion (page 14, first paragraph).

      We have expanded the results section when discussing the suboptimal amounts of the morpholino oligos (Page 6, 1<sup>st</sup> paragraph).

      (11) Unclear: "While variants in BICC1 are very rare, we could identify two patients with BICC1 variants harboring an additional PKD2 or PKD1 variant in trans, respectively." Shortly after, the authors state in apparent contradiction that "the patients had no other variants in any of other PKD genes or genes which phenocopy PKD including PKD1, PKD2, PKHD1, HNF1s, GANAB, IFT140, DZIP1L, CYS1, DNAJB11, ALG5, ALG8, ALG9, LRP5, NEK8, OFD1, or PMM2."

      The reviewer is correct. This should have been phrased differently. We have now added “Besides the variants reported below” to clarify this more adequately.

      The sentence was changed to start with “Besides the variants reported below, […].”

      (12) "The demonstrated interaction of BICC1, PC1, and PC2 now provides a molecular mechanism that can explain some of the phenotypic variability in these families." How do the authors reconcile this statement with their reported ultra-rare occurrence of the BICC1 mutations?

      As mentioned in the manuscript and also in response to the other two reviewers, Bicc1 has been shown to regulate Pkd2 gene expression in mice and frogs via an interaction with the miR-17 family of microRNAs. Moreover, the miR-17 family has been demonstrated to be critical in PKD (PMID: 30760828, PMID: 35965273, PMID: 31515477, PMID: 30760828). In fact, both other reviewers have pointed out that we should stress this more since Bicc1 is part of this regulatory pathway. Future experiments are needed to address whether Bicc1 contributes to the variability in ADPKD onset/severity. Yet, this is beyond the scope of this study. 

      Based on the comments of the two other reviewers we have further addressed the Bicc1/miR-17 interaction.

      (13) The manuscript should use correct genetic conventions of italicization and capitalization. This is an issue affecting the entire manuscript. Some exemplary instances are listed below.

      (a) "We also demonstrate that Pkd1 and Pkd2 modifies the cystic phenotype in Bicc1 mice in a dose-dependent manner and that Bicc1 functionally interacts with Pkd1, Pkd2 and Pkhd1 in the pronephros of Xenopus embryos." Genes? Proteins?

      The data presented in this section show that a hypomorphic allele of Bicc1 in mouse and a knockdown in Xenopus yields this. As both affect the proteins, the spelling should reflect the proteins.

      No changes have been made in the revised manuscript.

      (b) The sentence seems to use both the human and mouse genetic capitalization, although it refers to experiments in the mouse system “to define the Bicc1 interacting domains for PC2 (Fig. 2d,e). Full-length PC2 (PC2-HA) interacted with full-length myc-mBICC1.”

      We agree with the review that stating the species of the molecules used is critical, we have adapted a spelling of Bicc1, where BICC1 is the human homologue, mBicc1 is the mouse homologue and xBicc1 the Xenopus one.

      We have highlighted the species spelling in the methods section and labeled the species accordingly throughout the manuscript and figures. 

      (14) “Together these data supported our biochemical interaction data and demonstrated that BICC1 cooperated with PKD1 and PKD2.” Are the authors implying that these results in mice will translate to the human protein?

      We agree that we have not formally shown that the same applies to the human proteins. Thus, we have changed the spelling accordingly.

      We have revised the capitalization of the proteins. 

      (15) The text is often unclear, terse, or inconsistent.

      (a) “These results suggested that the interaction between PC1 and Bicc1 involves the SAM but not the KH/KHL domains (or the first 132 amino acids of Bicc1). It also suggests that the N-terminus could have an inhibitory effect on PC1-BICC1 association.” How do the authors define the N-terminus? The first 132 aa? KH/KHL domains?

      This was illustrated in the original Figure 2A. The DKH constructs lack the first 351 amino acids. 

      To make this more evident, we have specified this in the text as well.

      (b) Similarly, the authors state below, "Unlike PC1, PC2 interacted with mycmBICC1ΔSAM, but not myc-mBICC1-ΔKH suggesting that PC2 binding is dependent on the N-terminal domains but not the SAM domain." It is unclear if the authors refer to the KH/KHL domains or others. Whatever the reference to the N-terminal region, it should also be consistent with the section above.

      This is now specified in the text.

      (c) Unclear: "We have previously demonstrated that Pkd2 levels are reduced in a complete Bicc1 null mice,22 performing qRT-PCR of P4 kidneys (i.e. before the onset of a strong cystic phenotype), revealed that Bicc1, Pkd1 and Pkd2 were statistically significantly down9 regulated (Fig. 4h-j)".

      We have changed the text to clarify this. 

      (d) “Utilizing recombinant GST domains of PC1 and PC2, we demonstrated that BICC1 binds to both proteins in GST-pulldown assays (Fig. 1a, b)." GST-tagged domains? Fusions?

      We have changed the text to clarify this. 

      (e) "To study the interaction between BICC1, PKD1 and PKD2 we combined biochemical approaches, knockout studies in mice and Xenopus, genetic engineered human kidney cells" > genetically engineered.

      We have changed the text to clarify this.

      (f) Capitalization (e.g., see Figure S3, ref. the Bpk allele) and annotation (e.g., Gly821Glu and G821E) are inconsistent.

      We have homogenized the labeling of the capitalization and annotations throughout the manuscript. 

      (g) What do the authors mean by "homozygous evolutionarily well-conserved missense variant"?

      We have changed this is the revised version of the manuscript. 

      Reviewer #3 (Public review/Recommendations to the authors):

      (1) A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      (2) This study should ideally include experiments in HUREC material obtained from patients/families with BICC1 mutations and studying its effects on the PKD1/2 complex in primary cell lines.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected once the two patients with the BICC1 p.Ser240Pro variant passed away.

      No changes to the revised manuscript have been made to address this point.

      (3) Please remove repeated words in the following sentence in paragraph 2 of the introduction: "BICC1 encodes an evolutionarily conserved protein that is characterized by 3 K-homology (KH) and 2 KH-like (KHL) RNA-binding domains at the N-terminus and a SAM domain at the C-terminus, which are separated by a by a disordered intervening sequence (IVS).23-28".

      This has been changed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors identified and described the transcriptional trajectories leading to CMs during early mouse development, and characterized the epigenetic landscapes that underlie early mesodermal lineage specification.

      The authors identified two transcriptomic trajectories from a mesodermal population to cardiomyocytes, the MJH and PSH trajectories. These trajectories are relevant to the current model for the First Heart Field (FHF) and the Second Heart Field (SHF) differentiation. Then, the authors characterized both gene expression and enhancer activity of the MJH and PSH trajectories, using a multiomics analysis. They highlighted the role of Gata4, Hand1, Foxf1, and Tead4 in the specification of the MJH trajectory. Finally, they performed a focused analysis of the role of Hand1 and Foxf1 in the MJH trajectory, showing their mutual regulation and their requirement for cardiac lineage specification.

      Strengths:

      The authors performed an extensive transcriptional and epigenetic analysis of early cardiac lineage specification and differentiation which will be of interest to investigators in the field of cardiac development and congenital heart disease. The authors considered the impact of the loss of Hand1 and Foxf1 in-vitro and Hand1 in-vivo.

      Weaknesses:

      The authors used previously published scRNA-seq data to generate two described transcriptomic trajectories.

      We agree that a two-route cardiac development model has been described, which is consistent with our analyses. However, the developmental origins and key events by early lineage specification is unclear. Our study provided new insights from the following aspects:

      a) Computational analyses inferred the earliest cardiac fate segregation by E6.75-7.0.

      b) Provided the new-generated E7.0 multi-omics data which revealed the transcriptomic and chromatin accessibility landscape.

      c) Utilized multi-omics and ChIP-seq data to construct a core regulatory network underlying the JCF lineage specification.

      d) Applied in vitro and in vivo analyses, which elucidated the synergistic and different roles of key transcription factors, HAND1 and FOXF1.

      Q1R1: Details of the re-analysis step should be added, including a careful characterization of the different clusters and maker genes, more details on the WOT analysis, and details on the time stamp distribution along the different pseudotimes. These details would be important to allow readers to gain confidence that the two major trajectories identified are realistic interpretations of the input data.

      R1R1: Thank you for the valuable suggestion. In the last version, we characterized the two major trajectories by identifying their common or specific gene sets, and by profiling the expression dynamics along pseudotime (Figure 1F). But we realized a careful description was not provided. In the revised manuscript, we have made the following improvements:

      a) Provided marker gene analyses based on cell types as well as developmental lineages to support the E7.0 progenitor clusters (Figure S1F).

      b) For Figure 1F: revised the text and introduced characteristic genes for the two trajectories.

      c) For WOT analysis: provided more details in the first paragraph of the ‘Results’ section.

      R2R1: The authors have also renamed the cardiac trajectories/lineages, departing from the convention applied in hundreds of papers, making the interpretation of their results challenging.

      R2R1: Agreed. We have changed the MJH as JCF lineage and PSH as SHF lineage.

      Q3R1: The concept of "reverse reasoning" applied to the Waddington-OT package for directional mass transfer is not adequately explained. While the authors correctly acknowledged Waddington-OT's ability to model cell transitions from ancestors to descendants (using optimal transport theory), the justification for using a "reverse reasoning" approach is missing. Clarifying the rationale behind this strategy would be beneficial.

      R3R1: Thank you for pointing out the unclear explanation. As mentioned in R1R1, we have clarified the rationale in the revised manuscript. 

      We would like to provide some additional details: WOT is designed for time-series scRNA-seq data where the time/stage each single cell is given. At any adjacent time points t<sub>i</sub> and t<sub>i+1</sub>, WOT estimates the transition probability of all cells at t<sub>i</sub> to all cells at t<sub>i+1</sub>. One can select a cell set of interest at any time point t<sub>i</sub> to infer their ancestors at t<sub>i-1</sub> or their descendants at t<sub>i+1</sub> by sums of the transition probabilities. As introduced in the original paper, WOT allows for both ‘forward’ and ‘reverse’ inference (DOI: 10.1016/j.cell.2019.01.006).

      Q3R1: As the authors used the EEM cell cluster as a starting point to build the MJH trajectory, it's unclear whether this trajectory truly represents the cardiac differentiation trajectory of the FHF progenitors:

      - This strategy infers that the FHF progenitors are mixed in the same cluster as the extra-embryonic mesoderm, but no specific characterization of potential different cell populations included in this cluster was performed to confirm this.

      To build the MJH trajectory, we performed a two-step analysis:

      (1) Firstly, we used E8.5 CM cells as a starting point to perform WOT computational reverse lineage tracing and identify CM progenitors at each time point.

      (2) Secondly, we selected EEM cells from the E7.5 CM progenitor pool, as a starting point to perform WOT analysis. Cells along this trajectory consist of the JCF lineage (Figure 1B).

      The reason why we chose to use this subset of E7.5 EEM cells was due to its purity. It is distinct from the SHF lineage as suggested by their separation in the UMAP. It is also different from FHF cells as no FHF/CM markers were detected by E7.5. 

      It is admitted that it is infeasible to achieve 100% purity in this single cell omics analysis, but we believe the current strategy of defining the JCF lineage is reasonable. The distinct gene expression dynamics (Figure 1F) and spatial mapping results (Figure 1C), between JCF and SHF lineages, also supported our conclusion.

      - The authors identified the EEM cluster as a Juxta-cardiac field, without showing the expression of the principal marker Mab21l2 per cluster and/or on UMAPs.

      Thank you for your suggestion. We have added Mab21l2 expression plots in the ICA layout (new Figure S1D), showing its transient expression dynamics, consistent with Tyser et al (DOI: 10.1126/science.abb2986).

      - As the FHF progenitors arise earlier than the Juxta-cardiac field cells, it must be possible to identify an early FHF progenitor population (Nkx2-5+; Mab21l2-) using the time stamp. It would be more accurate to use this FHF cluster as a starting point than the EEM cluster to infer the FHF cardiac differentiation trajectory.

      We appreciate your insights. We used the early FHF progenitor population (E7.75 Nkx2-5+; Mab21l2- CM cells) as the starting point and identified its progenitor cells by E7.0 (Figure S2A). Results suggest both JCF and SHF lineages contribute to the early FHF progenitor population, consistent with live imaging-based single cell tracing by Dominguez et al (DOI: 10.1016/j.cell.2023.01.001).

      These concerns call into question the overall veracity of the trajectory analysis, and in fact, the discrepancies with prior published heart field trajectories are noted but the authors fail to validate their new interpretation. Because their trajectories are followed for the remainder of the paper, many of the interpretations and claims in the paper may be misleading. For example, these trajectories are used subsequently for annotation of the multiomic data, but any errors in the initial trajectories could result in errors in multiomic annotation, etc, etc.

      Thank you for your valuable comments. In the revised manuscript, we have added details about the trajectory analysis including the procedure of WOT lineage inference, marker gene expression and early FHF lineage tracing. We also renamed the two trajectories to avoid confusion with prior published heart field trajectories. Generally, our trajectories are consistent with the published evidence about two major lineages contributing to the linear heart tube:

      a) Clonal analysis: two trajectories exist which demonstrate differential contribution to the E8.5 cardiac tube (Meilhac et al, DOI: 10.1016/s1534-5807(04)00133-9).

      b) Live imaging: JCF cells contribute to the forming heart (Tyser et al, DOI: 10.1126/science.abb2986; Dominguez et al, DOI: 10.1016/j.cell.2023.01.001).

      c) Genetic labelling based lineage tracing: early Hand1+ mesodermal cells differentiate and contribute to the cardiac crescent (Zhang et al, DOI: 10.1161/CIRCRESAHA.121.318943).

      Molecular events by the initial segregation of the two lineages were not characterized before, which are the main focus of our paper. Our analyses suggest that the JCF lineage segregates earlier from the nascent/mixed mesoderm status, also consistent with the clonal analysis (Meilhac et al, DOI: 10.1016/s1534-5807(04)00133-9).

      Q4R1: As mentioned in the discussion, the authors identified the MJH and PSH trajectories as nonoverlapping. But, the authors did not discuss major previously published data showing that both FHF and SHF arise from a common transcriptomic progenitor state in the primitive streak (DOI: 10.1126/science.aao4174; DOI: 10.1007/s11886-022-01681-w). The authors should consider and discuss the specifics of why they obtained two completely separate trajectories from the beginning, how these observations conflict with prior published work, and what efforts they have made at validation.

      R4R1: Thank you for the important question. For trajectory analysis, we assigned cells to the trajectory with higher fate probability, resulting in ‘non-overlapping’ cell sets. However, the statement of ‘two non-overlapping trajectories’ is inaccurate. We performed analysis of fate divergence between two trajectories (which was not shown in the first version), which suggests, before E7.0, mesodermal cells have similar probabilities to choose either trajectory (Figure S1E). We agree with you and previously published data that the JCF and SHF arise from a common progenitor pool. Correction has been made in the revised manuscript.

      Q5R1: Figures 1D and E are confusing, as it's unclear why the authors selected only cells at E7.0. Also, panels 1D 'Trajectory' and 'Pseudotime' suggest that the CM trajectory moves from the PSH cells to the MJH. This result is confusing, and the authors should explain this observation.

      R5R1: Thank you for pointing out the confusion. As mentioned in R4R1, trajectory analysis indicates JCFSHF fate segregation by E7.0 and we used Figures 1D and E to characterize the cellular status. By E7.0, JCF progenitors are at EEM or MM status, while SHF progenitors are still at the earlier differentiation stage (NM). This result is consistent with previous clonal analysis (Meilhac et al, DOI: 10.1016/s1534-5807(04)00133-9) which demonstrates an apparent earlier segregation of the first lineage. Our interpretation of the pseudotime analysis is that it represents different levels of differentiation, instead of developmental direction.

      Q6R1: Regarding the PSH trajectory, it's unclear how the authors can obtain a full cardiac differentiation trajectory from the SHF progenitors as the SHF-derived cardiomyocytes are just starting to invade the heart tube at E8.5 (DOI: 10.7554/eLife.30668).

      R6R1.1: We agree with your opinion. Our trajectory analysis covers E8.5 SHF-derived CM cells and progenitors. Cells that differentiate as CM cells after E8.5 were missed.

      The above notes some of the discrepancies between the author's trajectory analysis and the historical cardiac development literature. Overall, the discrepancies between the author's trajectory analysis and the historical cardiac development literature are glossed over and not adequately validated.

      R6R1.2: Historical cardiac development related literature provided evidence, using multiple techniques, which support the existence of two cardiac lineages with common progenitors at the beginning and overlapping contribution of the four-chamber heart. Our trajectory analysis is in agreement with this model and provides more detailed molecular insights about lineage segregation by E7.0. Thank you for pointing out our mistakes describing the observations. We have corrected the text and provided additional data (Figure S1D-F and S2), aiming to resolved the confusions.

      Q7R1: The authors mention analyzing "activated/inhibited genes" from Peng et al. 2019 but didn't specify when Peng's data was collected. Is it temporally relevant to the current study? How can "later stage" pathway enrichment be interpreted in the context of early-stage gene expression?

      R7R1: The gene sets of "activated/inhibited genes" were collected from several published perturbation datasets (Gene Expression Omnibus accession numbers GSE48092, GSE41260, GSE17879, GSE69669, GSE15268 and GSE31544) using mouse ES cells or embryos. For a specific pathway, the gene set is fixed but the gene expression levels, which change over time, reflect the pathway enrichment. This explains the differential pathway enrichment between early and late stages.

      Q8R1: Motif enrichment: cluster-specific DAEs were analyzed for motifs, but the authors list specific TFs rather than TF families, which is all that motif enrichment can provide. The authors should either list TF families or state clearly that the specific TFs they list were not validated beyond motifs.

      R8R1: Thank you for your comment. For the DAE motif analysis, we firstly inferred the motif and TF families, then tested which specific TFs are expressed in the corresponding cell cluster. We have added this information in the legend of Figure 2D.

      Q9R1: The core regulatory network is purely predictive. The authors again should refrain from language implying that the TFs in the CRN have any validated role.

      R9R1: Thank you for your kind suggestion. We have revised the manuscript to avoid any misleading implications, as follows:

      “Through single-cell multi-omics analysis, a predicted core regulatory network (CRN) in JCF is identified, consisting of transcription factors (TFs) GATA4, TEAD4, HAND1 and FOXF1.”

      Q10R1: Regarding the in vivo analysis of Hand1 CKO embryos, Figures 6 and 7:

      How can the authors explain the presence of a heart tube in the E9.5 Hand1 CKO embryos (Figure 6B) if, following the authors' model, the FHF/Juxta-cardiac field trajectory is disrupted by Hand1 CKO? A more detailed analysis of the cardiac phenotype of Hand1 CKO embryos would help to assess this question.

      R10R1: Thank you for your valuable suggestion. In the revised manuscript, we have added detailed analysis of the cardiac phenotype of Hand1 CKO embryo (Figure S8C). Data suggest that by E8.5 when heart looping initiate in control group (14/17), the hearts of Hand1 CKO embryos (3/3) still demonstrate a linear tube morphology. By E9.5 when atrium and ventricle become distinct in WT embryos, heart looping of Hand1 CKO embryos is abnormal. The cardiac defects of our MESP1CRE driven Hand1 conditional KO are consistent with those of Hand1-null mutant mice (Doi: 10.1038/ng0398-266; D oi: 10.1038/ng0398-271).

      Author response image 1.

      The bright field images of E8.5-E9.5 Ctrl and Hand1 CKO mouse embryos. The arrows indicating the embryonic heart (h) and head folds (hf). Scale bars (E8.5): 200 μm; scale bars (E9.5): 500 μm.

      Q11R1: The cell proportion differences observed between Ctrl and Hand1 CKO in Figure 6D need to be replicated and an appropriate statistical analysis must be performed to definitely conclude the impact of Hand1 CKO on cell proportions.

      R11R1: We appreciate your valuable suggestion. As Figure 6D is based on scRNA-seq experiment, where replicates were merged as one single sequencing library, statistical analysis is infeasible. To address potential concerns about cell proportions, we added IF staining experiments of EEM marker gene, Vim, in serial embryo sections (Figure S8D). Statistical analysis indicates a significant decrease of VIM+ EEM cell proportion of Hand1 CKO embryos.

      Q12R1: The in-vitro cell differentiations are unlikely to recapitulate the complexity of the heart fields invivo, but they are analyzed and interpreted as if they do.

      R12R1: We agree with your opinion. In the revised manuscript, we tuned down the interpretation of the invitro cell differentiation data. 

      Previous version:

      I.  “The analysis indicated that HAND1 and FOXF1 could dually regulate MJH specification through directly activating the MJH specific genes and inhibiting the PSH specific genes.”

      II. “Together, our data indicated that mutual regulation between HAND1 and FOXF1 could play a key role in MJH cardiac progenitor specification.”

      III. “Thus, our data further supported the specific and synergistic roles of HAND1 and FOXF1 in MJH cardiac progenitor specification.”

      Revised version:

      I.  “The analysis indicated that HAND1 and FOXF1 were able to directly activate the JCF specific genes.”

      II. “Together, our in vitro experimental data indicated that mutual regulation between HAND1 and FOXF1 could play a key role in activation of JCF specific genes.”

      III. “These results suggest that HAND1 and FOXF1 may cooperatively regulate early cardiac lineage specification by promoting JCF-associated gene expression and suppressing alternative mesodermal programs.”

      Q13R1: The schematic summary of Figure 7F is confusing and should be adjusted based on the following considerations:

      (a) the 'Wild-type' side presents 3 main trajectories (SHF, Early HT and JCF), but uses a 2-color code and the authors described only two trajectories everywhere else in the article (aka MJH and PSH). It's unclear how the SHF trajectory (blue line) can contribute to the Early HT, when the Early HT is supposed to be FHF-associated only (DOI: 10.7554/eLife.30668). As mentioned previously in Major comment 3., this model suggests a distinction between FHF and JCF trajectories, which is not investigated in the article.

      R13R1(a): Thank you for your great insights. The paper you mentioned used Nkx2.5_cre/+; Rosa26tdtomato+/- and _Nkx2.5_eGFP embryos to reconstruct the cardiac morphologies between E7.5 and E8.2. Their 3D models clearly demonstrate the transition from yolk sac to FHF and then SHF (Figure 2A’ and A’’). The location of yolk sac is defined as JCF in later literature (DOI: 10.1126/science.abb2986). However, as _Nkx2.5 mainly marks cells after the entry of the heart tube, it is unable to reflect the lineage contribution by JCF or SHF. As in R3R1, more and more evidence support the contribution of both lineages to the Early HT, which is discussed in a recent review paper (DOI: 0.1016/j.devcel.2023.01.010).

      (b) the color code suggests that the MJH (FHF-related) trajectory will give rise to the right ventricle and outflow tract (green line), which is contrary to current knowledge.

      R13R1(b): Thank you for pointing out the confusion. The coloring of outflow tract is not an indication of JCF lineage contribution. We have changed the color of JCF/SHF trajectory in the revised model.

      Minor comments:

      Q14R1: How genes were selected to generate Figure 1F? Is this a list of top differentially expressed genes over each pseudotime and/or between pseudotimes?

      R14R1: For each trajectory, we ranked genes by the correlation between expression levels and pseudotime.

      Top 1000 genes for each group were selected.

      Q15R1: Regarding Figure 1G, it's unclear how inhibited signaling can have an increased expression of underlying genes over pseudotimes. Can the authors give more details about this analysis and results?

      R15R1: The increased expression of ‘inhibited genes’ could be explained as an indication of decreasing signaling levels or compensation effect by other signaling pathways. We appreciate your kind suggestion. Details about this analysis have been added in the Method section.

      Q16R1: How do the authors explain the visible Hand1 expression in Hand1 CKO in Figure S7C 'EEM markers'? Is this an expected expression in terms of RNA which is not converted into proteins?

      R16R1: Our opinion is that the visible Hand1 expression caused by the imperfect knock-out efficiency by Mesp1-Cre driven system.

      Q17R1: The authors do not address the potential presence of doublets (merged cells) within their newly generated dataset. While they mention using "SCTransform" for normalization and artifact removal, it's unclear if doublet removal was explicitly performed.

      R17R1: We appreciate your kind reminder. Doublet removal was performed using R package ‘DoubletFinder’ (DOI: 10.1016/j.cels.2019.03.003). We have added this information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary of goals:

      The aims of the study were to identify new lineage trajectories for the cardiac lineages of the heart, and to use computational and cell and animal studies to identify and validate new gene regulatory mechanisms involved in these trajectories.

      Strengths:

      The study addresses the long-standing yet still not fully answered questions of what drives the earliest specification mechanisms of the heart lineages. The introduction demonstrates a good understanding of the relevant lineage trajectories that have been previously established, and the significance of the work is well described. The study takes advantage of several recently published data sets and attempts to use these in combination to uncover any new mechanisms underlying early mesoderm/cardiac specification mechanisms. A strength of the study is the use of an in vitro model system (mESCs) to assess the functional relevance of the key players identified in the computational analysis, including innovative technology such as CRISPR-guided enhancer modulations. Lastly, the study generates mesoderm-specific Hand1 LOF embryos and assesses the differentiation trajectories in these animals, which represents a strong complementary approach to the in vitro and computational analysis earlier in the paper. The manuscript is clearly written and the methods section is detailed and comprehensive.

      Comments and Weaknesses:

      Overall: The computational analysis presented here integrates a large number of published data sets with one new data point (E7.0 single cell ATAC and RNA sequencing). This represents an elegant approach to identifying new information using available data. However, the data presentation at times becomes rather confusing, and relatively strong statements and conclusions are made based on trajectory analysis or other inferred mechanisms while jumping from one data set to another. The cell and in vivo work on Hand1 and Foxf1 is an important part of the study. Some additional experiments in both of these model systems could strongly support the novel aspects that were identified by the computational studies leading into the work.

      We appreciate your positive comments and insightful suggestions. In the revised manuscript, we have incorporated additional analyses and experimental validations to address the concerns raised. Specifically, we added RNA velocity analysis to independently support the identification of the MJH and PSH trajectories, performed immunofluorescence staining of mesodermal and cardiac markers in Hand1 and Foxf1 knockout models, and included Vim staining-based quantification in Hand1 CKO embryos to assess developmental outcomes in vivo. Furthermore, we revised potentially overinterpreted conclusions, clarified methodological details of WOT analysis. These revisions have strengthened both the rigor and clarity of the manuscript.

      Q1R2: Definition of MJH and PSH trajectory:

      The study uses previously published data sets to identify two main new differentiation trajectories: the MJH and the PSH trajectory (Figure 1). A large majority of subsequent conclusions are based on in-depth analysis of these two trajectories. For this reason, the method used to identify these trajectories (WTO, which seems a highly biased analysis with many manually chosen set points) should be supported by other commonly used methods such as for example RNA velocity analysis. This would inspire some additional confidence that the MJH and PSH trajectories were chosen as unbiased and rigorous as possible and that any follow-up analysis is biologically relevant.

      R1R2: We appreciate your valuable comments. It is totally agreed that other commonly used methods help strengthen our conclusion about the two main trajectories. To this end, we performed RNA velocity analysis for the cardiac specification. Results support the contribution to CM along the MJH and PSH routes.

      Author response image 2.

      UMAP layout is colored by cell types. Developmental directions, shown as arrows, are inferred by RNA-velocity analysis.

      Actually, several recent studies indicated a convergence cardiac developing model where progenitors reach a myocardial state along two trajectories (DOI: 10.1016/j.devcel.2023.01.010). However, when and how specification between the two routes were unclear. Our data and analysis revealed a clear fate separation by E7.0 from transcriptomic and epigenetic perspectives, where unbiased RNA velocity analysis was performed (Figure 2C).

      We would like to clarify how we performed WOT (DOI: 10.1016/j.cell.2019.01.006) analysis: the only manually chosen cell set was the starting set, which was all cardiomyocyte cells by E8.5, of computational reverse lineage tracing. The ancestor cells were predicted in an unbiased manner among all mesodermal cells.

      Q2R2.1: Identification of MJH and PSH trajectory progenitors:

      The study defines various mesoderm populations from the published data set (Figure 1A-E), including nascent mesoderm, mixed mesoderm, and extraembryonic mesoderm. It further assigns these mesoderm populations to the newly identified MJH/PSH trajectories. Based on the trajectory definition in Figure 1A it appears that both trajectories include all 3 mesoderm populations, albeit at different proportions and it seems thus challenging to assign these as unique progenitor populations for a distinct trajectory, as is done in the epigenetic study by comparing clusters 8 (MJH) and 2 (PSH)(Figure 2). 

      R2R2.1: According to our model, the most significant difference between the two trajectories is their enrichment of EEM and PM cell types (Figure 1B), which represent the middle stages of cardiac development. Both trajectories begin as Mesp1+ Nascent mesoderm cells (Figure 1F), which is supported by Mesp1 lineage tracing (DOI: 10.1161/CIRCRESAHA.121.318943), and ends as cardiomyocytes. Our epigenetic analysis focused on the E7.0 stage when the two trajectories could be clearly separated and when JCF and SHF lineages were at mixed mesoderm and nascent mesoderm states, respectively. However, SHF lineage was predicted to bypass mixed mesoderm state later on.

      Q2R2.2: Along similar lines, the epigenetic analysis of clusters 2 and 8 did not reveal any distinct differences in H3K4m1, H3K27ac, or H3K4me3 at any of the time points analyzed (Figure 2F). While conceptually very interesting, the data presented do not seem to identify any distinct temporal patterns or differences in clones 2 and 8 (Figure 2H), and thus don't support the conclusion as stated: "the combined transcriptome and chromatin accessibility analysis further supported the early lineage segregation of MJH and the epigenetic priming at gastrulation stage for early cardiac genes".

      R2R2.2: In the epigenetic analysis, we delineated the temporal dynamics of E7.0 cluster-specific DAEs by selecting earlier (E6.5) and later (E7.5) time points. DAEs of C8 and C2 represent regulatory elements for the JCF and SHF lineages, respectively. We also included C1 DAEs as a reference to demonstrate the relative activity of C8 and C2. The overall temporal pattern suggests activation of C8 & C2, as their H3K4me1 and H3K27ac levels surpass C1 over time. Between C8 and C2, the following distinctions could be observed:

      a) H3K4me1 levels of C8 are higher by E6.5 and E7.0, with low H3K27ac levels, indicating early priming of C8 DAEs.

      b) By E7.5, H3K4me1 levels of C8 are caught up by C2 in E7.5 anterior mesoderm (E7.5_AM, Figure 2F column 3), where cardiac mesoderm is located.

      c) H3K4me1 and H3K27ac levels of C8 are similar as C1 in the posterior mesoderm (E7.5_P, Figure 2F column 4) and much higher than C2.

      d) From the perspective of chromatin accessibility, hundreds of characteristic DAEs were identified for C2 and C8 (Figure 2D), exemplified by the primed and active enhancers which were predicted to interact with cluster-specific genes (Figure 2H).

      Together with the transcriptomic analyses (Figure 2C), these data are consistent with our conclusion about early lineage segregation and epigenetic priming.

      Q3R2: Function of Hand1 and Foxf1 during early cardiac differentiation:

      The study incorporated some functional studies by generating Hand1 and Foxf1 KO mESCs and differentiated them into mesoderm cells for RNA sequencing. These lines would present relevant tools to assess the role of Hand1 and Foxf1 in mesoderm formation, and a number of experiments would further support the conclusions, which are made for the most part on transcriptional analysis. For example, the study would benefit from quantification of mesoderm cells and subsequent cardiomyocytes during differentiation (via IF, or more quantitatively, via flow cytometry analysis). These data would help interpret any of the findings in the bulk RNAseq data, and help to assess the function of Hand1 and Foxf1 in generating the cardiac lineages. Conclusions such as "the analysis indicated that HAND1 and FOXF1 could dually regulate MJH specification through directly activating the MJH specific genes and inhibiting PSH specific genes" seem rather strong given the data currently provided.

      R3R2: Thank you for your kind suggestions. We added IF staining of mesodermal (Zic3), JCF (Hand1) and cardiac markers (Tnnt2), followed by cell quantification. Results indicate that Hand1 and Foxf1 knockout leads to reduced commitment to the JCF lineage, evidenced by the loss of Hand1 expression, accumulation of undifferentiated Zic3+ mesoderm, and impaired cardiomyocyte formation (Tnnt2+), consistent with the up-regulation of JCF lineage specific genes and the downregulation of SHF lineage specific genes.

      We also revised the conclusion as “These results suggest that HAND1 and FOXF1 may cooperatively regulate early cardiac lineage specification by promoting JCF-associated gene expression and suppressing alternative mesodermal programs.”.

      (4) Analysis of Hand1 cKO embryos:

      Adding a mouse model to support the computational analysis is a strong way to conclude the study. Given the availability of these early embryos, some of the findings could be strengthened by performing a similar analysis to Figure 7B&C and by including some of the specific EEM markers found to be differentially regulated to complement the structural analysis of the embryos.

      R4R2: hank you for your positive comments and help. In the revised manuscript, we performed IF staining of EEM marker Vim in a similar fashion as Figure 7B&C (Figure S8D). In comparison with control embryos, the Hand1 CKO embryos demonstrated significant less number of Vim+ cells, further strengthening the conclusion that Hand1 CKO blocked the developmental progression toward JCF direction.

      Q5R2: Current findings in the context of previous findings:

      The introduction carefully introduces the concept of lineage specification and different progenitor pools. Given the enormous amount of knowledge already available on Hand1 and Foxf1, and their role in specific lineages of the early heart, some of this information should be added, ideally to the discussion where it can be put into context of what the present findings add to the existing understanding of these transcription factors and their role in early cardiac specification.

      R5R2: We appreciate your positive comments and kind reminder. We have added discussion about how our study could be put into the body of findings on Hand1 and Foxf1. Although these two genes have been validated to be functionally important for heart development, it is unclear when and how they affect this process. Using in-vivo and in-vitro models and single cell multi-omics analyses, we provided evidence to fill the gaps from multiple aspects, including cell state temporal dynamics, regulatory network, and epigenetic regulation underlying the very early cardiac lineage specification.

      Reviewer #3 (Public review):

      Q1R3: In Figure 1A, could the authors justify using E8.5 CMs as the endpoint for the second lineage and better clarify the chamber identities of the E8.5 CMs analysed? Why are the atrial genes in Figure 1C of the PSH trajectory not present in Table S1.1, which lists pseudotime-dependent genes for the MJH/PSH trajectories from Figure 1F?

      R1R3: Thank you for your comments. We used E8.5 CMs as the endpoint of the second (SHF) lineage because this stage represents a critical point where SHF-derived cardiomyocytes have begun distinct differentiation, allowing us to capture terminal lineage states reliably. The chamber identities of E8.5 CMs were determined based on known marker genes (DOI: 10.1186/s13059-025-03633-3). The atrial genes shown in Figure 1C reflect cluster-specific markers that may not meet the strict pseudotime-dependency criteria used to generate Table S1.1, which lists genes dynamically changing along the MJH/PSH trajectories.

      Q2R3: Could the authors increase the resolution of their trajectory and genomic analyses to distinguish between the FHF (Tbx5+ HCN4+) and the JCF (Mab21l2+/ Hand1+) within the MJH lineage? Also, clarify if the early extraembryonic mesoderm contributes to the FHF.

      R2R3: Thank you for your great suggestions. To distinguish between the FHF and JCF trajectories, we used early FHF progenitor population (E7.75 Nkx2-5+; Mab21l2- CM cells) as the starting point and performed WOT lineage inference (Figure S2A). Results suggest that both JCF and SHF progenitors contribute to the FHF, consistent with live imaging-based single cell tracing by Dominguez et al (DOI: 10.1016/j.cell.2023.01.001) and lineage tracing results by Zhang et al (DOI: 10.1161/CIRCRESAHA.121.318943). We also analyzed the expression levels of FHF marker genes (Tbx5, Hcn4) and observed their activation along both trajectories (Figure S2B).

      Q3R3: The authors strongly assume that the juxta-cardiac field (JCF), defined by Mab21l2 expression at E7.5 in the extraembryonic mesoderm, contributes to CMs. Could the authors explain the evidence for this? Could the authors identify Mab21l2 expression in the left ventricle (LV) myocardium and septum transversum at E8.5 (see Saito et al., 2013, Biol Open, 2(8): 779-788)? If such a JCF contribution to CMs exists, the extent to which it influences heart development should be clarified or discussed.

      R3R3: Thank you for the important question. For the JCF contribution to the heart tube, several lines of evidence have been published in recent years using micro-dissection of mouse embryonic heart (DOI: 10.1126/science.abb2986), live imaging (DOI: 10.1016/j.cell.2023.01.001) and lineage tracing approaches (DOI: 10.1161/CIRCRESAHA.121.318943). According to Tyser et al (DOI: 10.1126/science.abb2986), Mab21l2 expression is detected in septum transversum at E8.5 and the Mab21l2+ lineage contribute to LV, basically consistent with the literature you mentioned (Saito et al., 2013, Biol Open, 2(8): 779-788). Our lineage inference analyses further support the model and suggest earlier specification by JCF. However, the focus of our work is the transcriptional and epigenetic regulation of underlying the JCF developmental trajectory.

      Q4R3: Could the authors distinguish the Hand1+ pericardium from JCF progenitors in their single-cell data and explain why they excluded other cell types, such as the endocardium/endothelium and pericardium, or even the endoderm, as endpoints of their trajectory analysis? At the NM and MM mesoderm stages, how did the authors distinguish the earliest cardiac cells from the surrounding developing mesoderm?

      R4R3: We appreciate your insightful question. In our other study (DOI: 10.1186/s13059-025-03633-3), we tried to further divide the CM cells as subclusters and it seems that their difference is mainly driven by the segmentation of the heart tube (e.g. LV, RV, OFT etc.). By the E8.5 stage, we are unable to identify the Hand1+ pericardium cluster. 

      Also, it seems infeasible to distinguish endocardium from other endothelium cells only using singlecell data. High resolution spatial transcriptome data is required. Alternatively, we analyzed the E7.0 mesodermal lineages and determined C5/6 as hematoendothelial progenitors. Marker gene analysis indicate that their lineage segregation has started by this stage (Figure S4C and Author response image 3).

      Author response image 3.

      UMAP layout, using scRNA-seq (Reference data) and snRNA-seq (Multiome data), is colored by cell types (left). Expression of hematoendothelial progenitor marker genes is shown (right).

      We did observe the difference between the earliest cardiac cells from the surrounding developing mesoderm. As in Figure 1D, cells belonging to the JCF lineage (Hand1 high/Lefty2 low) were clustered at the EEM/MM end, in contrast to the NM cells.

      Q5R3: Could the authors contrast their trajectory analysis with those of Lescroart et al. (2018), Zhang et al., Tyser et al., and Krup et al.?

      R5R3: Thank you for the valuable suggestion. We compared our model with the suggested ones and summarized as follows:

      (1) Lescroart et al: The JCF and SHF progenitor cells match their DCT2 (Bmp4+) and DCT3 (Foxc2+) clusters, respectively.

      (2) Zhang et al: The JCF lineage matches their EEM-DC (developing CM)-CM trajectory. The SHF lineage is consistent with their NM-LPM (lateral plate mesoderm)-DC (developing CM)-CM trajectory. Notably, their EEM-DC-CM also expressed FHF marker (Tbx5) at later stages.

      (3) Tyser et al: we performed data integration analysis and found the correspondence between JCF progenitors (EEM cells from the cardiac trajectory) and their Me5, as well as SHF progenitors (PM cells from the cardiac trajectory) with Me7. In their model, both Me5 and Me7 contribute to Me4 (representing the FHF), consistent with our results (see Tyser et al., 2021 and Pijuan-Sala et al., 2019).

      (4) Krup et al also performed URD lineage inference, providing a model with CM (12) and Cardiac mesoderm (29) as cardiac end points. Their model did not seem to suggest distinct trajectories between JCF and SHF lineages, as both JCF (Hand1) and SHF (Isl1) markers co-expressed in CM.

      Q6R3: Previous studies suggest that Mesp2 expression starts at E8 in the presomitic mesoderm (Saga et al., 1997). Could the authors provide in situ hybridization or HCR staining to confirm the early E7 Mesp2 expression suggested by the pseudo-time analysis of the second lineage.

      R6R3: We validated the expression of E7 Mesp2 using Geo-seq spatial transcriptome data (Author response image 4, upper). Results suggest the high spatial enrichment of Mesp2 expression in primitive streak (T+) and/or nascent mesoderm (Mesp1+) cells, which correspond to the progenitors of the second lineage.

      In situ hybridization data (PMID: 17360776) also supports the early expression of Mesp2 by E7 (Author response image 4, lower).

      Author response image 4.

      (Upper) E7 Geo-seq data for selected genes: T, Mesp1, and Mesp2. (Lower) Mesp2 expression during early development; image acquired from Morimoto et al. (PMID: 17360776).

      Q7R3: Could the authors also confirm the complementary Hand1 and Lefty2 expression patterns at E7 using HCR or in situ hybridization? Hand1 expression in the first lineage is plausible, considering lineage tracing results from Zhang et al.

      R7R3: Thank you for your great suggestion. We observed spatially complementary expression patterns of Hand1 and Lefty2 in the Geo-seq spatial transcriptomic data. In the mesoderm layer, Hand1 is highly expressed in the proximal end. While Lefty2+ cells exhibit preference toward the distal direction.

      Author response image 5.

      E7 Geo-seq data for selected genes: Hand1 and Lefty2.

      Q8R3: Could the authors explain why Hand1 and Lefty2+ cells are more likely to be multipotent progenitors, as mentioned in the text?

      R8R3: Thank you for your question. Here, we observed E7.0 Mesp1+ and Lefty2+ nascent mesodermal cells assigned to both the JCF and SHF lineages (Figure 1D), indicating their multipotency. On the other hand, we also found low expressions of JCF markers, Hand1 and Msx2, by the early stage of the SHF trajectory (Figure 1F). Thus, we concluded that both Hand1+ and Lefty2+ E7.0 mesodermal cells are likely to be multipotent.

      Q9R3: Could the authors comment on the low Mesp1 expression in the mesodermal cells (MM) of the MJH trajectory at E7 (Figure 1D)? Is Mesp1 transiently expressed early in MJH progenitors and then turned off by E7? Have all FHF/JCF/SHF cells expressed Mesp1?

      R9R3: Thank you for the insightful questions. Zhang et al. (PMID: 34162224) performed scRNA-seq analysis of Mesp1 lineage-traced cells, which indicate the contribution of Mesp1+ cells to FHF, JCF, and SHF. This is also supported by Dominguez et al. utilizing live imaging approaches (PMID: 36736300). Our temporal dynamics analysis suggests that along the JCF trajectory, Mesp1 is turned off as JCF characteristic genes were up regulated (Figure 1F and S1D).

      Q10R3: Could the authors clarify if their analysis at E7 comprises a mixture of embryonic stages or a precisely defined embryonic stage for both the trajectory and epigenetic analyses? How do the authors know that cells of the second lineage are readily present in the E7 mesoderm they analysed (clusters 0, 1, and 2 for the multiomic analysis)?

      R10R3: Thank you for your questions. Although embryos were collected at E7.0, the developmental stages could be variable. As exemplified by Karl Theiler’s book, “The House Mouse: Atlas of Embryonic Development”, mesoderm was visible for some E7.0 egg cylinders but not in others. To test whether cells of the second lineage are present in the E7.0 mesoderm, we analyzed the WOT lineage tracing results and the cell type composition by E7.0 (Author response image 6, left panel). Most cells belong to the nascent mesoderm (NM) or mixed mesoderm (MM), while almost no cells were assigned to the primitive streak (PS). To avoid the possibility that the E7.0 embryos represented later stages, we also analyzed the E6.75 cells of the second lineage (Author response image 6, middle panel). Results suggest that NM cells were still the dominant contributors to the second lineage, although ~22.6% cells were assigned to the PS. The abovementioned analyses were performed using the scRNA-seq data. The embryos of the E7.0 single-cell multi-omics represent similar developmental stages as the scRNAseq data, as suggested by the well-aligned UMAPs (Figure S1D, right panel). Thus, we conclude that for the multi-omics data, the cells of the second lineage are also readily present in the mesoderm.

      Author response image 6.

      (Left and middle) Lineage inference and cell type composition at E7.0 and E6.75. (Right) UMAPs of E7.0 multi-omics and scRNA-seq data.

      Q11R3: Could the authors further comment on the active Notch signaling observed in the first and second lineages, considering that Notch's role in the early steps of endocardial lineage commitment, but not of CMs, during gastrulation has been previously described by Lescroart et al. (2018)?

      R11R3: We appreciate your kind suggestion. As reported by Lescroart et al. (2018), using Notch1CreERT2/Rosa-tdTomato mice and tamoxifen administration at E6.5, early expression of Notch1 mostly marked endocardial cells (ECs, 76.9-83.9%), with minor contribution to the cardiomyocytes (6.0-16.6%) and to the epicardial cells (EPs, 6.0-6.5%). The lineage specificity of Notch1 is consistent with our E7.0 multi-omics data, where its expression was mainly observed in the NM and hematoendothelial progenitors (Author response image 7). Interestingly, expression of other NOTCH receptor genes (Notch2 and Notch3) and ligand genes (Dll1 and Dll3) in the CM lineages. Notch3 demonstrate higher expression in the first lineage, while Dll1 and Dll3 were highly expressed in the second lineage. The study by Lescroart et al. (2018) emphasized the role of Notch1 as an EC lineage marker, while our analyses aimed at the activity of the NOTCH pathway.

      Author response image 7.

      Expression of representative NOTCH genes at E7.0 (multi-omics data).

      Q12R3: In cluster 8, Figure 2D, it seems that levels of accessibility in cluster 8 are relatively high for genes associated with endothelium/endocardium development in addition to MJH genes. Could the authors comment and/or provide further analysis?

      R12R3: Thanks for you for raising this interesting point. To confirm the association of these genes with endothelium (EC) and/or MJH, we analyzed their expression levels by E7.0 (progenitor stage) and E8.0 (differentiated stage) (Author response image 8). Among target genes of MJH-specific DAEs (cluster 3/7/8 in Figure 2D), Pmp22, Mest, Npr1, Pkp2, and Pdgfb were expressed in the hematoendothelial progenitors. The Nrp1 gene and PDGF pathway play critical roles in endothelial development by modulating cell migration (PMID: 15920019 and 28167492), which is also important for MJH cells. In addition, we observed common ATAC-seq peaks in both hematoendothelial and MJH clusters (Author response image 9), indicating shared regulatory elements. Interestingly, Pdgfb is not expressed by CM in vivo, it is actively expressed in the CM of the in vitro system (Author response image 9). These results indicate regulatory and functional closeness between hematoendothelial and MJH cell groups, at early stages of lineage establishment.

      Author response image 8.

      Regulatory connection between MJH and endothelial cells (ECs).

      Author response image 9.

      Representative genome browser snapshots of scATAC-seq (aggregated gene expression and chromatin accessibility for each cluster) and RNA-seq at the Pdgfb locus.

      Q13R3: Can the authors clarify why they state that cluster 8 DAEs are primed before the full activation of their target genes, considering that Bmp4 and Hand1 peak activities seem to coincide with their gene expression in Figure 2G?

      R13R3: Thanks for your great question. The overall analyses indicate low to medium levels of H3K4me1 and H3K27ac by E6.5-7.0 at cluster 8 DAEs, which were fully activated by E7.5 (Figure 2F). Further inspections suggest different epigenetic status of individual DAEs (Figure 3H), which could be active (K4me1+/K27ac+), primed (K4me1+/K27ac-), or inactive (K4me1-/K27ac-). Thus, we concluded that many DAEs could be primed before full activation. The coincidence of enhancer peak activities and gene expression was observed by aggregating single cell clusters at a single stage E7.0, which does not rule out the possibility that these enhancers are epigenetically primed at earlier stages.

      Q14R3: Did the authors extend the multiomic analysis to Nanog+ epiblast cells at E7 and investigate if cardiac/mesodermal priming exists before mesodermal induction (defined by T/Mesp1 onset of expression)?

      R14R3: We appreciate your kind suggestion. We observed low levels of T/Mesp1 expression in the E7.0 Nanog+ epiblast cells (Author response image 10). Interestingly, the T+/Mesp1+ cells were not clustered toward any specific differentiation directions in the UMAP. We also analyzed DAE activities in each single cell by averaging over the C1/C2/C8 DAE sets. The C2 and C8 DAEs were clearly less active than the C1 DAEs. But C2/C8-DAE active cells were observed among the E7.0 Nanog+ epiblast cells. These data indicate the early priming exists in epiblast cells before the commitment to cardiac/mesodermal differentiation.

      Author response image 10.

      Gene expression and DAE activity levels of E7.0 Nanog+ epiblast cells shown in UMAP layout.

      Q15R3: In the absence of duplicates, it is impossible to statistically compare the proportions of mesodermal cell populations in Hand1 wild-type and knockout (KO) embryos or to assess for abnormal accumulation of PS, NM, and MM cells. Could the authors analyse the proportions of cells by careful imaging of Hand1 wild-type and KO embryos instead?

      R15R3: Thank you for your important question. To assess the proportions of mesodermal cell populations in E7.25 wild-type and Hand1-CKO embryos, we analyzed the serial coronal sections of the extraembryonic portions and performed staining of the Vim gene, which marks the extra-embryonic mesodermal (EEM) cells (Figure S8D). We then counted the numbers of mesodermal/Vim+ EEM cells and calculated the relative proportion of Vim+ EEM cells in each section. The proportion of Vim+ EEM cells was statistically lower in the Hand1-CKO embryo, consistent with our model that Hand1 deletion led to blocked MJH specification.

      Q16R3: Could the authors provide high-resolution images for Figure 7 B-C-D as they are currently hard to interpret?

      R16R3: Thank you for your suggestion. We have replaced Figure 7B-C-D with high-resolution images.

      Recommendations for the authors:  

      Reviewing Editor Comments:

      Discussions among reviewers emphasize the importance of better addressing and validating the trajectory analysis by using more common and alternative bioinformatics and spatial approaches. Further discussion on whether there is a common transcriptional progenitor between the two trajectories is also required to enhance the significance of the study. For functional analysis, further validations are needed as the current data only partially support the claims. Please see public reviews for details.

      Reviewer #2 (Recommendations For The Authors):

      Beyond the suggestions made in the public review, below are some minor aspects for consideration:

      The manuscript is well written overall but may benefit from a thorough read-through and editing of some minor grammatical errors.

      We have carefully read through the manuscript and corrected minor grammatical errors to improve clarity and readability.

      Figure 2C: RNA velocity information gets largely lost due to the color choice of EEM and MM (black) on which the direction of arrows can't be appreciated.

      We have updated the color scheme in Figure 2C.

      Figure 6D: sample information is partially cut off in the graph.

      Sample information is completely shown now.

      The last paragraph of the discussion has some formatting issues with the references.

      We have corrected the formatting issues with the references.

      The methods and results section does not comment on if, or how many embryos were pooled for the sequencing analysis performed for this study.

      We have added the numbers of embryos for sequencing analyses in the methods section.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      In the discussion, authors could reconsider the sentence: "The process of cardiac lineage segregation is a complex one that may involve TF regulatory networks and signaling pathways," as it is not informative.

      We have re-written the sentence as: “Thus, additional regulation must exist and instructs the process of JCF-SHF lineage segregation.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript reports a descriptive study of changes in gene expression after knockdown of the nuclear envelope proteins lamin A/C and Nesprin2/SYNE2 in human U2OS cells. The readout is RNA-seq, which is analyzed at the level of gene ontology and focused investigation of isoform variants and non-coding RNAs. In addition, the mobility of telomeres is studied after these knockdowns, although the rationale in relation to the RNA-seq analyses is rather unclear.

      We sincerely thank the reviewer for the thoughtful summary and valuable feedback. Regarding the telomere mobility analyses, our intention was to provide additional evidence supporting the hypothesis that knockdown of lamins and nesprins disrupts nuclear architecture. Although the connection to the RNA-seq data was not explicitly detailed, we believe that the increased telomere mobility may reflect broader changes in chromatin organization, which could contribute to the observed differential gene expression. We have revised the manuscript to clarify this rationale and improve the integration between the two analyses.

      RNA-seq after knockdown of lamin proteins has been reported many times, and the current study does not provide significant new insights that help us to understand how lamins control gene expression. This is particularly because the vast majority of the observed effects on gene expression appear to occur in regions that are not bound by lamin A. It seems likely that these effects are indirect. There is also virtually no overlap between genes affected by laminA/C and by SYNE2, which remains unexplained; for example, it would be good to know whether laminA/C and SYNE2 bind to different genomic regions. The claim in the Title and Abstract that LMNA governs gene expression / acts through chromatin organization appears to be based only on an enrichment of gene ontology terms "DNA conformation change" and "covalent chromatin conformation" in the RNA-seq data. This is a gross over-interpretation, as no experimental data on chromatin conformation are shown in this study. The analyses of transcript isoform switching and ncRNA expression are potentially interesting but lack a mechanistic rationale: why and how would these nuclear envelope proteins regulate these aspects of RNA expression? The effects of lamin A on telomere movements have been reported before; the effects of SYNE2 on telomere mobility are novel (to my knowledge), but should be discussed in the light of previously documented effects of SUN1/2 on the dynamics of dysfunctional telomeres (Lottersberger et al, Cell 2015).

      We sincerely thank the reviewer for this thoughtful and detailed critique. We agree that RNA-seq following knockdown of lamin proteins has been previously reported and appreciate the concern regarding the novelty and mechanistic interpretation of our findings. However, For our study, we revealed novel findings that there is distinct isoform switching and lncRNA affected by lamins and nesprins, which have not been reported yet by previous studies. Furthermore, we also revealed not only lamin A, but also nesprin-2 could also affect chromatin mobility.

      For the analysis of LMNA ChIP-seq data from  human fibroblast (Kohta Ikegami, 2021). Their data revealed that Lamin A/C modulates gene expression through interactions with enhancers. The pathogenesis of disorders associated with LMNA mutations may stem primarily from disruptions in this gene regulatory function, rather than from impaired tethering of chromatin to LADs.

      We acknowledge the reviewer’s concern that gene ontology enrichment related to chromatin conformation alone is insufficient to support claims about chromatin structural changes. We have therefore revised the “Title” and “Abstract” to avoid overstating conclusions and to more accurately reflect the scope of our data.

      Regarding telomere dynamics, while Lamin A's role has indeed been previously documented, our study provides evidence that SYNE2/Nesprin-2 also regulates telomere mobility. We have now expanded the discussion to include prior work, particularly the findings of Lottersberger et al. (Cell, 2015), to better contextualize our results and distinguish the contributions of SYNE2.

      Finally, we appreciate the reviewer’s suggestion about transcript isoform and noncoding RNA expression. While our study primarily provides descriptive data, we agree that further mechanistic investigation is warranted. We have clarified this point in the “Discussion” and framed our findings as a foundation for future studies exploring the broader regulatory roles of nuclear envelope proteins.

      We are grateful for the reviewer’s comments, which have helped us improve the clarity and rigor of our manuscript. Please see the revised highlights in our revised manuscript.

      As indicated below, I have substantial concerns about the experimental design of the knockdown experiments.

      Altogether, the results presented here are primarily descriptive and do not offer a significant advance in our understanding of the roles of LaminA and SYNE2 in gene regulation or chromatin biology, because the results remain unexplained mechanistically and functionally. Furthermore, the RNAseq datasets should be interpreted with caution until off-target effects of the shRNAs can be ruled out.

      We fully acknowledge that the original version of our manuscript lacked sufficient mechanistic insight. In response, we have revised the manuscript to include additional analyses and explanations that clarify the potential functional relevance of our findings. For example, we added following text “These findings further underscore the functional relevance of lamin A in coordinating transcriptional programs through modulation of nuclear architecture. In contrast, LMNA knockdown led to differential expression of genes enriched in pathways related to chromatin organization, suggesting potential disruptions in chromatin regulatory networks. Although direct measurements of chromatin conformation were not performed, these transcriptional changes indicate that LMNA may contribute to maintaining nuclear architecture and genomic stability, which aligns with its established involvement in laminopathies and genome integrity disorders.“ More analyses could be found in the main text.

      Regarding the concern about off-target effects of the shRNA-based knockdowns, we agree that this is an important consideration. While shRNA approaches inherently carry the risk of off-target effects, we have now performed additional analyses that help address this issue. These analyses support the specificity of our observations and suggest that the majority of gene expression changes are likely to be directly related to the targeted knockdown. Nonetheless, we have clearly stated the limitations of the approach in the revised discussion and emphasized the need for future validation using complementary methods.

      We hope that these revisions strengthen the overall impact and interpretability of our study.

      Specific comments:

      (1) Knockdowns were only monitored by qPCR. Efficiency at the protein level (e.g., Western blots) needs to be determined.

      We agree that complementary protein-level validation (e.g., by Western blot) would strengthen the findings, and we are in the process of obtaining suitable reagents to address this point in future experiments. We have now clarified this limitation in the revised manuscript  

      (2) For each knockdown, only a single shRNA was used. shRNAs are infamous for offtarget effects; therefore, multiple shRNAs for each protein, or an alternative method such as CRISPR deletion or degron technology, must be tested to rule out such offtarget effects.

      We fully acknowledge the concern regarding the use of only a single shRNA per knockdown and agree that shRNAs are prone to off-target effects. We recognize the importance of validating our findings using multiple independent shRNAs or alternative knockdown strategies, such as CRISPR deletion or degron-based approaches, to ensure specificity. To address this concern, we have conducted qPCR confirmation the knockdown of target proteins from RNA-seq findings, further supporting the validity of our data. In line with this, we are currently optimizing an auxin-inducible degron system (AtAFB2) for targeted and controlled depletion of lamin C. Our preliminary results indicate approximately a 40% knockdown efficiency after 16 hours of auxin induction, highlighting the necessity for further system optimization (Author response image 1). Future experiments will integrate this improved degron technology alongside multiple independent approaches to rigorously address and mitigate concerns about off-target effects, thereby enhancing the robustness and reproducibility of our data.

      Author response image 1.

      FACS analysis of the lamin C degron system at 0, 1, 3, and 16 hours postinduction with 500 μM indole-3-acetic acid (IAA) (Sigma).

      (3) It is not clear whether the replicate experiments are true biological replicates (i.e., done on different days) or simply parallel dishes of cells done in a single experiment (= technical replicates). The extremely small standard deviations in the RT-qPCR data suggest the latter, which would not be adequate.

      We appreciate the reviewer’s insightful comment regarding the nature of our replicates. The RT-qPCR experiments were indeed performed as true biological replicates, with samples collected on different days and from independently cultured cell batches. We have added this to the manuscript Methods. While we observed some variability in the Scramble control group, the low standard deviations in the shRNAtreated samples likely reflect the consistent and efficient knockdown of target genes.

      For the RNA-seq experiments, samples were collected as two batches during RNA extraction and library preparation. The samples still represent biological replicates, as they were derived from independently prepared cultures in separate experimental setups. This approach was chosen to strike a balance between biological variation and technical consistency, thereby improving the reliability of the RNA-seq results.

      Reviewer #2 (Public review):

      Summary:

      This study focused on the roles of the nuclear envelope proteins lamin A and C, as well as nesprin-2, encoded by the LMNA and SYNE2 genes, respectively, on gene expression and chromatin mobility. It is motivated by the established role of lamins in tethering heterochromatin to the nuclear periphery in lamina-associated domains (LADs) and modulating chromatin organization. The authors show that depletion of lamin A, lamin A and C, or nesprin-2 results in differential effects of mRNA and lncRNA expression, primarily affecting genes outside established LADs. In addition, the authors used fluorescent dCas9 labeling of telomeric genomic regions combined with live-cell imaging to demonstrate that depletion of either lamin A, lamin A/C, or nesprin-2 increased the mobility of chromatin, suggesting an important role of lamins and nesprin2 in chromatin dynamics.

      We sincerely appreciate the reviewer’s thoughtful summary of our study and the key findings. Our work is indeed motivated by the well-established roles of lamin A/C in chromatin tethering at the nuclear periphery and the emerging understanding of their broader influence on chromatin organization and gene regulation. In our study, we aimed to further explore these roles by examining the consequences of depleting lamin A, lamin A/C, and nesprin-2 (SYNE2) on both gene expression and chromatin mobility.

      As the reviewer accurately notes, we observed differential effects on mRNA and lncRNA expression, with many changes occurring outside of previously defined LADs. This finding suggests that lamins and nesprin-2 may also influence transcriptional regulation through mechanisms beyond direct LAD association. Furthermore, using live-cell imaging of fluorescently labeled telomeric regions, we demonstrated that loss of these nuclear envelope components leads to increased chromatin mobility, supporting their role in maintaining chromatin stability and nuclear architecture.

      We thank the reviewer for highlighting these aspects, which we believe contribute to a more nuanced understanding of how nuclear envelope proteins modulate chromatin behavior and gene regulation.

      Strengths:

      The major strength of this study is the detailed characterization of changes in transcript levels and isoforms resulting from depletion of either lamin A, lamin A/C, or nesprin-2 in human osteosarcoma (U2OS) cells. The authors use a variety of advanced tools to demonstrate the effect of protein depletion on specific gene isoforms and to compare the effects on mRNA and lncRNA levels.

      The TIRF imaging of dCas9-labeled telomeres allows for high-resolution tracking of multiple telomeres per cell, thus enabling the authors to obtain detailed measurements of the mobility of telomeres within living cells and the effect of lamin A/C or nesprin-2 depletion.

      We are grateful that the reviewer recognized the comprehensive analysis of transcript and isoform changes upon depletion of lamin A, lamin A/C, or nesprin-2 in U2OS cells. We also thank the reviewer for acknowledging our use of advanced tools to investigate isoform-specific effects and to distinguish between changes in mRNA and lncRNA expression.

      Furthermore, we are pleased that the reviewer highlighted the strength of our TIRF imaging approach using dCas9-labeled telomeres. This technique enabled us to capture high-resolution, multi-locus dynamics within single living cells, and we agree that it is instrumental in revealing the impact of lamin A/C and nesprin-2 depletion on telomere mobility.

      Weaknesses:

      Although the findings presented by the authors overall confirm existing knowledge about the ability of lamins A/C and nesprin to broadly affect gene expression, chromatin organization, and chromatin dynamics, the specific interpretation and the conclusions drawn from the data presented in this manuscript are limited by several technical and conceptual challenges.

      One major limitation is that the authors only assess the knockdown of their target genes on the mRNA level, where they observe reductions of around 70%. Given that lamins A and C have long half-lives, the effect at the protein level might be even lower. This incomplete and poorly characterized depletion on the protein level makes interpretation of the results difficult. The description for the shRNA targeting the LMNA gene encoding lamins A and C given by the authors is at times difficult to follow and might confuse some readers, as the authors do not clearly indicate which regions of the gene are targeted by the shRNA, and they do not make it obvious that lamin A and C result from alternative splicing of the same LMNA gene. Based on the shRNA sequences provided in the manuscript, one can conclude that the shLaminA shRNA targets the 3' UTR region of the LMNA gene specific to prelamin A (which undergoes posttranslational processing in the cell to yield lamin A). In contrast, the shRNA described by the authors as 'shLMNA' targets a region within the coding sequence of the LMNA gene that is common to both lamin A and C, i.e., the region corresponding to amino acids 122-129 (KKEGDLIA) of lamin A and C. The authors confirm the isoform-specific effect of the shLaminA isoform, although they seem somewhat surprised by it, but do not confirm the effect of the shLMNA construct. Assessing the effect of the knockdown on the protein level would provide more detailed information both on the extent of the actual protein depletion and the effect on specific lamin isoforms. Similarly, given that nesprin-2 has numerous isoforms resulting from alternative splicing and transcription initiation. In the current form of the manuscript, it remains unclear which specific nesprin-2 isoforms were depleted, and to what extent (on the protein level).

      We have revised the Methods section to include a clearer and more detailed description of the shRNA design, including the specific regions of the LMNA gene targeted by each construct, as well as the relationship between lamin A and C isoforms resulting from alternative splicing. We agree that this clarification will help prevent confusion for readers.

      Regarding the shLMNA construct, we acknowledge the importance of confirming the knockdown at the protein level, especially given the long half-lives of lamin proteins. In our revised manuscript, we now refer to Supplementary Figure S2, which demonstrates that the shLMNA construct effectively reduces both lamin A and lamin C transcript levels. While we initially focused on mRNA quantification, we recognize that additional proteinlevel validation is valuable and have accordingly emphasized this point in the revised discussion.

      We also appreciate the comment on nesprin-2 isoforms. Given the complexity of nesprin-2 splicing, we are currently working to further characterize the specific isoforms affected and will aim to include protein-level data in a future study. 

      Another substantial limitation of the manuscript is that the current analysis, with the exception of the chromatin mobility measurements, is exclusively based on transcriptomic measurements by RNA-seq and qRT-PCR, without any experimental validation of the predicted protein levels or proposed functional consequences. As such, conclusions about the importance of lamin A/C on RNA synthesis and other functions are derived entirely from gene ontology terms and are not sufficiently supported by experimental data. Thus, the true functional consequences of lamin A/C or nesprin depletion remain unclear. Statements included in the manuscript such as "our findings reveal that lamin A is essential for RNA synthesis, ..." (Lines 79-80) are thus either inaccurate or misleading, as the current data do not show that lamin A is ESSENTIAL for RNA synthesis, and lamin A/C and lamin A deficient cells and mice are viable, suggesting that they are capable of RNA synthesis.

      We agree that our current data do not support the claim that lamin A is essential for RNA synthesis, and we acknowledge the importance of distinguishing between correlation and causal relations in our conclusions. In light of this, we have revised the statement in the manuscript to more accurately reflect our findings:

      “Our findings suggest that lamin A contributes to RNA synthesis, supports chromatin spatial organization through LMNA, and that SYNE2 influences chromatin modifications as reflected in transcript levels.”

      We hope this revision better aligns with the limitations of our dataset and addresses the reviewer’s concerns regarding the interpretation of functional consequences based solely on transcriptomic data.

      Another substantial weakness is that the data and analysis presented in the manuscript raise some concerns about the robustness of the findings. Given that the 'shLMNA' construct is expected to deplete both lamin A and C, i.e., its effect encompasses the depletion of lamin A, which is achieved by the 'shLaminA' construct, one would expect a substantial overlap between the DEGs in the shLMNA and shLaminA conditions, with the shLMNA depletion producing a broader effect as it targets both lamin A and C. However, the Venn Diagram in Figure 4a, the genomic loci distribution in Figure 4b, and the correlation analysis in Supplementary Figure S2 show little overlap between the shLMNA and shLaminA conditions, which is quite surprising. In the mapping of the DEGs shown in Figure 4b, it is also surprising not to see the gene targeted by the shRNA, LMNA, found on chromosome 1,  in the results for the shLMNA and shLamin A depletion.

      We have added the discussion into the revised edition: “Interestingly, although both shLMNA and shLaminA constructs target lamin A, with shLMNA additionally depleting lamin C, the DEGs identified under these two conditions show limited overlap. This unexpected finding suggests that depletion of lamin C in the shLMNA condition may trigger distinct or compensatory transcriptional responses that are not elicited by lamin A knockdown alone. Furthermore, variation in shRNA efficiency or off-target effects may contribute to these differences. Notably, despite directly targeting LMNA, the overlap in DEGs between the two conditions remained limited under our stringent threshold criteria. Together, these observations highlight the complex and non-linear regulatory roles of lamin isoforms in gene expression and underscore the need for further mechanistic studies to dissect their individual and combined contributions [28,29].”

      The correlation analysis in Supplementary Figure S2 raises further questions. The authors use doc-inducible shRNA constructs to target lamin A (shLaminA), lamin A/C (shLMNA), or nesprin-2 (shSYNE2). Thus, the no-dox control (Ctr) for each of these constructs would be expected to be very similar to the non-target scrambled controls (Ctrl.shScramble and Dox.shScramble). However, in the correlation matrix, each of the no-dox controls clusters more closely with the corresponding dox-induced shRNA condition than with the Ctrl.shScramble or Dox.shScramble conditions, suggesting either a very leaky dox-inducible system, strong effects from clonal selection, or substantial batch effects in the processing. Either of these scenarios could substantially affect the interpretation of the findings. For example, differences between different clonal cell lines used for the studies, independent of the targeted gene, could explain the limited overlap between the different shRNA constructs and result in apparent differences when comparing these clones to the scrambled controls, which were derived from different clones.

      We thank the reviewer for this thoughtful observation. We would like to clarify that the samples shown in Supplementary Figure S2 were processed and sequenced in two separate batches, and the data presented in the correlation matrix are unnormalized. As such, batch effects are indeed present and likely contribute to the clustering pattern observed, particularly the closer similarity between the dox-induced and no-dox samples for each individual shRNA construct.

      Importantly, our analyses focus on within-construct comparisons (i.e., doxycyclinetreated vs untreated samples for the same shRNA), rather than direct comparisons across different constructs or scrambled controls. Each experimental pair (dox vs nodox) was processed in parallel within its respective batch to ensure internal consistency. Thus, while the global clustering pattern may reflect batch-related differences or baseline variations between independently derived cell lines, these factors do not affect the main conclusions drawn from the within-construct differential expression analysis.

      The manuscript also contains several factually inaccurate or incorrect statements or depictions. For example, the depiction of the nuclear envelope in Figure 1 shows a single bilipid layer, instead of the actual double bi-lipid layer of the inner and outer nuclear membranes that span the nuclear lumen. The depiction further lacks SUN domain proteins, which, together with nesprins, form the LINC complex essential to transmit forces across the nuclear envelope. The statement in line 214 that "Linker of nucleoskeleton and cytoskeleton (LINC) complex component nesprin-2 locates in the nuclear envelope to link the actin cytoskeleton and the nuclear lamina" is not quite accurate, as nesprin-2 also links to microtubules via dynein and kinesin.

      We sincerely thank the reviewer for pointing out these important inaccuracies. In response, we have revised Figure 1 to accurately depict the nuclear envelope as a double bi-lipid membrane and included SUN domain proteins to better reflect the structural components of the LINC complex. Additionally, we have updated the statement and citations 

      This is the revised part that is incorporated in the manuscript “The linker of nucleoskeleton and cytoskeleton (LINC) complex component nesprin-2 is a nuclear envelope protein that connects the nucleus to the cytoskeleton by interacting not only with actin filaments but also with microtubules through motor proteins such as dynein and kinesin. This structural linkage contributes to cellular architecture and facilitates mechanotransduction between the nuclear interior and the extracellular matrix (ECM) [8,21]

      ”We appreciate the reviewer’s insights, which have helped improve the accuracy and clarity of our manuscript.

      The statement that "Our data show that Lamin A knockdown specifically reduced the usage of its primary isoform, suggesting a potential role in chromatin architecture regulation, while other LMNA isoforms remained unaffected, highlighting a selective effect" (lines 407-409) is confusing, as the 'shLaminA' shRNA specifically targets the 3' UTR of lamin A that is not present in the other isoforms. Thus, the observed effect is entirely consistent with the shRNA-mediated depletion, independent of any effects on chromatin architecture.

      We have rephrased the statement “Our data show that knockdown with shLaminA, which specifically targets the 3' UTR unique to the lamin A isoform, selectively reduced lamin A expression without affecting other LMNA isoforms.”

      The premise of the authors that lamins would only affect peripheral chromatin and genes at LADs neglects the fact that lamins A and C are also found in the nuclear interior, where they form stable structure and influence chromatin organization, and the fact that lamins A and C and nesprins additionally interact with numerous transcriptional regulators such as Rb, c-Fos, and beta-catenins, which could further modulate gene expression when lamins or nesprins are depleted.

      Based on the reviewer’s comment we have added the statement into Discussion part “Beyond their well-established role in tethering heterochromatin at the nuclear periphery through lamina-associated domains (LADs), A-type lamins (lamins A and C) also localize to the nuclear interior, where they contribute to chromatin organization and gene regulation independently of LADs [27,28]. Nuclear lamins can form intranuclear foci that associate with active chromatin and are implicated in supporting transcriptional activity. Additionally, both lamins and nesprins participate in diverse protein-protein interactions that may influence transcriptional regulation. For example, lamin A/C interacts with the retinoblastoma protein (Rb) to modulate E2F-dependent transcription [29], and with c-Fos to regulate its nuclear retention and activity [30]. While βcatenin acts as a co-activator in Wnt signaling relies on nuclear translocation and interaction with transcriptional complexes, and evidence suggests that nuclear architecture and envelope components, including nesprins, can influence this process [31]. Therefore, the observed gene expression changes following depletion of lamins or nesprins are likely not restricted to genes located within lamina-associated domains (LADs), but may also result from broader perturbations in nuclear architecture and transcriptional regulatory networks. This is consistent with our findings that lamins and nesprins influence gene expression in distal, non-LAD regions.”

      The comparison of the identified DEGs to genes contained in LADs might be confounded by the fact that the authors relied on the identification of LADs from a previous study (ref #28), which used a different human cell type (human skin fibroblasts) instead of the U2OS osteosarcoma cells used in the present study. As LADs are often highly cell-type specific, the use of the fibroblast data set could lead to substantial differences in LADs.

      DamID in various mammalian cell types has shown that some LADs are cell-type invariant (constitutive LADs [cLADs]), while others interact with the NL in only certain cell types (facultative LAD [fLADs]) (Bas van Steensel, 2017). We agree that facultative LADs (fLADs), which comprise approximately half of all LADs, are often highly cell-type specific. We acknowledge that this specificity may influence the interpretation of our findings. At present, publicly available LAD datasets for U2OS cells are limited to those associated with LMNB. We concur that generating LMNA-specific LAD maps in U2OS cells would enhance the accuracy and relevance of our analyses, and we view this as an important direction for future research.

      Another limitation of the current manuscript is that, in the current form, some of the figures and results depicted in the figures are difficult to interpret for a reader not deeply familiar with the techniques, based in part on the insufficient labeling and figure legends. This applies, for example, to the isoform use analysis shown in Figure 3d or the GenometriCorr analysis quantifying spatial distance between LADs and DEGs shown in Figure 4c.

      For Figure 3, we added text in the caption to make the figure more readable “Isoform switching analysis reveals differential expression of alternative transcript variants between conditions, highlighting a shift in predominant isoform usage.” For Figure 4c, we added text in the caption “GenometriCorr analysis was used to quantify the spatial relationship between LADs and DEGs, evaluating whether the observed genomic proximity deviates from random expectation through empirical distributionbased statistical testing of pairwise distances between genomic intervals.” And also in the ‘Methods”.

      Overall appraisal and context:

      Despite its limitations, the present study further illustrates the important roles the nuclear envelope proteins lamin A, lamin C, and nesprin-2 have in chromatin organization, dynamics, and gene expression. It thus confirms results from previous studies (not always fully acknowledged in the current manuscript) previously reported for lamin A/C depletion. For example, the effect of lamin A/C depletion on increasing mobility of chromatin had already been demonstrated by several other groups, such as Bronshtein et al. Nature Comm 2015 (PMID: 26299252) and Ranade et al. BMC Mol Cel Biol 2019 (PMID: 31117946). Additionally, the effect of lamin A/C depletion on gene and protein expression has already been extensively studied in a variety of other cell lines and model systems, including detailed proteomic studies (PMIDs 23990565 and 35896617).

      We add more discussions as below “Our findings reinforce the pivotal roles of nuclear envelope proteins lamin A, LMNA and nesprin 2 in regulating chromatin organization, chromatin mobility, and gene expression. These results are consistent with and extend prior studies investigating the consequences of lamin depletion. For instance, increased chromatin mobility following the loss of lamin A/C has been previously demonstrated using live-cell imaging approaches [26,35], supporting our observations of nuclear structural relaxation and chromatin redistribution. Additionally, proteomic profiling following lamin A depletion has been extensively documented across both cellular and mouse models, providing valuable insights into the molecular consequences of nuclear envelope disruption [36,37]. While these earlier studies provide a strong foundation, our work contributes novel insights by integrating isoform-specific perturbations with spatial chromatin measurements. This approach emphasizes contextdependent regulatory mechanisms that involve not only lamina-associated regions but also nesprin-associated domains and distal genomic loci, thereby expanding the current understanding of nuclear envelope protein function in gene regulation.”

      The finding that that lamin A/C or nesprin depletion not only affects genes at the nuclear periphery but also the nuclear interior is not particularly surprising giving the previous studies and the fact that lamins A and C are also founding within the nuclear interior, where they affect chromatin organization and dynamics, and that lamins A/C and nesprins directly interact with numerous transcriptional regulators that could further affect gene expression independent from their role in chromatin organization.

      We have added the following statement into the Discussion part “Beyond their well-established role in tethering heterochromatin at the nuclear periphery through lamina-associated domains (LADs), A-type lamins (lamins A and C) also localize to the nuclear interior, where they contribute to chromatin organization and gene regulation independently of LADs [27,28]. Nuclear lamins can form intranuclear foci that associate with active chromatin and are implicated in supporting transcriptional activity. Additionally, both lamins and nesprins participate in diverse protein-protein interactions that may influence transcriptional regulation. For example, lamin A/C interacts with the retinoblastoma protein (Rb) to modulate E2F-dependent transcription [29], and with c-Fos to regulate its nuclear retention and activity [30]. While β-catenin acts as a co-activator in Wnt signaling relies on nuclear translocation and interaction with transcriptional complexes, and evidence suggests that nuclear architecture and envelope components, including nesprins, can influence this process [31]. Therefore, the observed gene expression changes following depletion of lamins or nesprins are likely not restricted to genes located within lamina-associated domains (LADs), but may also result from broader perturbations in nuclear architecture and transcriptional regulatory networks. This is consistent with our findings that lamins and nesprins influence gene expression in distal, non-LAD regions.”

      The authors provide a detailed analysis of isoform switching in response to lamin A/C or nesprin depletion, but the underlying mechanism remains unclear. Similarly, their analysis of the genomic location of the observed DEGs shows the wide-ranging effects of lamin A/C or nesprin depletion, but lets the reader wonder how these effects are mediated. A more in-depth analysis of predicted regulator factors and their potential interaction with lamins A/C or nesprin would be beneficial in gaining more mechanistic insights.

      We agree that the current findings, while highlighting the broad impact of lamin A/C or nesprin depletion on isoform usage and gene expression, do not fully elucidate the underlying regulatory mechanisms. We acknowledge the importance of identifying upstream regulators and understanding their potential interactions with lamins and nesprins. Future investigations integrating epigenetic approaches, such as ChIP-seq for transcription factors and chromatin-associated proteins, will be essential to clarify how lamins and nesprins contribute to isoform switching and to uncover the mechanistic basis of these regulatory effects.

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes DOX inducible RNAi KD of Lamin A, LMNA coded isoforms as a group, and the LINC component SYNE2. The authors report on differentially expressed genes, on differentially expressed isoforms, on the large numbers of differentially expressed genes that are in iLADs rather than LADs, and on telomere mobility changes induced by 2 of the 3 knockdowns.

      Strengths:

      Overall, the manuscript might be useful as a description for reference data sets that could be of value to the community.

      We acknowledge that the initial version of our manuscript lacked comprehensive comparisons with previous studies. In our revised manuscript, we have included more detailed discussions highlighting how our findings complement and extend existing knowledge. Specifically, our study presents novel insights into the role of lamins and nesprins in regulating non-coding RNAs and isoform switching, areas that have not been extensively explored in prior literatures. We hope these additions will clarify the contribution of our work and demonstrate the potential value to the field.

      Weaknesses:

      The results are presented as a type of data description without formulation of models or explanations of the questions being asked and without follow-up. Thus, conceptually, the manuscript doesn't appear to break new ground.

      In our study, we proposed a conceptual model in which gene expression changes are linked to RNA synthesis, chromatin conformation alterations, and chromatin modifications, potentially mediated by lamin A, LMNA, and nesprin-2 at the transcriptional level. However, we acknowledge that this model remains preliminary and largely unexplored. We agree that additional mechanistic insights and identification of specific regulatory factors are needed to strengthen this framework. Future studies will aim to experimentally validate these hypotheses and clarify the pathways and regulators involved.

      Not discussed is the previous extensive work by others on the nucleoplasmic forms of LMNA isoforms. Also not discussed are similar experiments- for instance, gene expression changes others have seen after lamin A knockdowns or knockouts, or the effect of lamina on chromatin mobility, including telomere mobility - see, for example, a review by Roland Foisner (doi.org/10.1242/jcs.203430) on nucleoplasmic lamina. The authors need to do a thorough search of the literature and compare their results as much as possible with previous work.

      We sincerely thank the reviewer for pointing out the important body of previous work on the nucleoplasmic forms of LMNA isoforms and the impact of lamin A depletion on gene expression and chromatin mobility. In the revised version, we have now included relevant citations. Please see the highlights in the Discussion.

      The authors don't seem to make any attempt to explore the correlation of their findings with any of the previous data or correlate their observed differential gene expression with other epigenetic and chromatin features. There is no attempt to explore the direction of changes in gene expression with changes in nuclear positioning or to ask whether the genes affected are those that interact with nucleoplasmic pools of LMNA isoforms. The authors speculate that the DEG might be related to changing mechanical properties of the cells, but do not develop that further.

      We sincerely appreciate the reviewer’s insightful comments. In our revised manuscript, we have addressed this concern by comparing our telomere mobility results with previously published data (Bronshtein et al., 2015), and we observe consistent findings showing that lamin A depletion leads to increased telomere motility. Furthermore, our study provides novel evidence that nesprin-2 depletion similarly enhances telomere migration, suggesting a broader role for nuclear envelope components in chromatin dynamics.

      We acknowledge the importance of integrating gene expression data with epigenetic and chromatin features. However, to our knowledge, such datasets are currently limited for U2OS cells, particularly in the context of lamin and nesprin perturbation. We agree that understanding the correlation between differentially expressed genes and nuclear positioning or interactions with nucleoplasmic pools of LMNA isoforms is a promising direction. We are actively planning future studies that include chromatin profiling and mechanical perturbation assays to further explore these mechanisms.

      The technical concerns include: 1) Use of only one shRNA per target. Use of additional shRNAs would have reduced concern about possible off-target knockdown of other genes; 2) Use of only one cell clone per inducible shRNA construct. Here, the concern is that some of the observed changes with shRNA KDs might show clonal effects, particularly given that the cell line used is aneuploid. 3) Use of a single, "scrambled" control shRNA rather than a true scrambled shRNA for each target shRNA.

      (1) Regarding the use of a single shRNA per target, we agree that utilizing multiple independent shRNAs would strengthen the conclusions. In our study, we selected validated shRNA sequences with minimal predicted off-targets and confirmed knockdown efficiency at mRNA level (by qPCR).

      (2) As for the use of a single cell clones per inducible construct, we understand the concern that clonal variability, particularly in an aneuploid cell line, could influence the observed phenotypes. To clarify this, we have revised in the manuscript “Multiple independent clones per shRNA were screened for knockdown efficiency using reverse transcription quantitative real-time PCR (RT-qPCR). Three clones demonstrating robust and consistent knockdown were selected and expanded. These clones were subsequently pooled to minimize clonal variability and used for downstream analyses, including RNA-seq”. To mitigate this, we ensured consistent results across biological replicates and used inducible systems to reduce variability introduced by random integration. 

      (3) We also acknowledge that the use of a single scrambled shRNA control, rather than matched scrambled controls for each construct, is a limitation. While we used a standard non-targeting scrambled shRNA commonly applied in similar studies, we understand that distinct scrambled sequences might better control for construct-specific effects. .

      Reviewer #1 (Recommendations for the authors):

      Please make the processed RNA-seq data available for each individual experiment, not only the raw reads and averaged data.

      In response to your suggestion, we have now included the raw count data for each individual experiment in Supplementary Table S5 to enhance transparency and reproducibility.   

      Reviewer #2 (Recommendations for the authors):

      The current text contains numerous typos, and some of the text could benefit from additional editing for clarity and conciseness. In addition, several statements, particularly in the section encompassing lines 321-329, lack supporting references.

      In our revised version, we have carefully edited the text for clarity and conciseness.

      We have included related citations from lines 321-329: “The majority of genes located within LADs tend to be transcriptionally repressed or expressed at low levels. This is because LADs are associated with heterochromatin , a tightly packed form of DNA that is generally inaccessible to the cellular machinery required for gene expression 12,23. Lamin mutations and levels have shown to disrup LAD organization and gene expression that have been implicated in various diseases, including cancer and laminopathies 24,25.”

      The figures would benefit from better labeling, including a clear schematic of which specific regions of the LMNA and SYNE2 genes are targeted by the different shRNA constructs, and by labeling the different isoforms in Figure S1 with the common names. Furthermore, note that lamin A arises from posttranslational processing of prelamin A, not from a different transcript. Likely, the "different LMNA genes" shown in Supplementary Figure S1 are just different annotations, with the exceptions of the splice isoforms lamin C and lamin delta10.

      In the Method, we have clearly denoted the design of corresponding shRNAs as suggested “The shRNA designated as shLMNA targets a region within the coding sequence of LMNA that is shared by both lamin A and lamin C, corresponding to amino acids 122–129 (KKEGDLIA) of lamin A/C (RefSeq: NM_001406985.1). The shRNA against SYNE2 (shSYNE2) targets a sequence encoding amino acids 5133– 5140 (KRYERTEF) of the SYNE2 protein (RefSeq: NM_182914.3).”

      For Figure S1, we have added common isoform names to figure and captions. “lamin A (ENST00000368300.9), LMNA 227 (ENST00000675431.1), pre-lamin A/C (ENST00000676385.2), and lamin C (ENST00000677389.1)."

      Several statements about the novelty of the findings or approach are inaccurate. For example, the authors state in the introduction that "However, whether lamins and nesprins actively govern chromatin remodeling and isoform switching beyond their wellcharacterized functions in mechanotransduction remains an open question", as several previous studies have provided detailed characterization of lamin A/C depletion or mutations on chromatin organization, mobility, and gene expression. The authors should revise these statements and better acknowledge the previous work.

      We have added the citations of previous works and revised the text “While significant progress has been made in understanding the role of lamins in genome organization, the precise mechanisms by which lamins and nesprins regulate gene expression through distal chromatin interactions remain incompletely understood [10,11]. Notably, recent evidence suggests a reciprocal interplay between transcription and chromatin conformation, where gene activity can influence chromatin folding and vice versa [12]. However, whether lamins and nesprins actively govern chromatin remodeling and isoform switching beyond their well-characterized functions in mechanotransduction remains an open question.”

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript might be useful as a description for reference data sets that could be of value to the community. Otherwise, I did not derive meaningful biological insights from the manuscript. It was not clear to me also how much might be repeating previous work already reported in the literature (see below). For example, I cited a review on nucleoplasmic lamins by Roland Foisner at the end of the specific comments - scanning it very quickly shows that there are already papers on increased chromatin mobility after lamin perturbations, including telomeres. I know there have also been studies of changes in gene expression after lamin A and B KD. The authors need to do a thorough search of the literature and compare their results as much as possible with previous work.

      We acknowledge that the roles of lamins in regulating chromatin dynamics and gene expression, including the effects of lamin perturbations on chromatin mobility and telomere behavior, have been previously reported. In response, we have revised the manuscript to incorporate relevant citations and to better contextualize our results within the existing literature. Importantly, to our knowledge, the finding that nesprin-2 influences telomere mobility has not been previously reported, and we have highlighted this novel observation in the revised text.

      In response, we have now conducted a more comprehensive literature review and revised the manuscript accordingly to better contextualize our findings. Specifically, we have added comparisons to prior studies reporting chromatin mobility changes following lamin A/C depletion. We also now emphasize the novel aspects of our study, such as the isoform-specific perturbations and the integration of spatial chromatin organization with transcriptomic outcomes.

      We hope these revisions strengthen the manuscript’s contribution as both a useful resource and a mechanistic investigation.

      Not even acknowledged is the previous extensive work on the nucleoplasmic forms of LMNA isoforms - I know Robert Goldman published extensively on this, implicating lamin A, for example, on DNA replication in the nuclear interior as well as transcription. More recently, Roland Foisner worked on this, including with molecular approaches. For example, a 2017 review mentions previous ChIP-seq mapping of lamin A binding to iLAD genes and also describes previous work on chromatin mobility, including telomere mobility. Yet the entire writing in the manuscript seems to only discuss the role of LMNA isoforms in the nuclear lamina per se, explaining the surprise in seeing many iLAD genes differentially expressed after KD.

      We have added related studies as suggested by the reviewer and  added the following statement: “Nucleoplasmic lamins bind to chromatin and have been indicated to regulate chromatin accessibility and spatial chromatin organization [24]. Lamins in the nuclear interior regulate gene expression by dynamically binding to heterochromatic and euchromatic regions, influencing epigenetic pathways and chromatin accessibility. They also contribute to chromatin organization and may mediate mechanosignaling [25]. However, the contribution of nesprins and lamins to isoform switch and chromatin dynamics has not been fully understood [7,10,26]. ”

      Overall, I found a surprising lack of review and citation of previous work (see Specific comments below), including the lack of citations for various declarative statements about previous conclusions in the field about lamin A.

      (1) Introduction:

      "However, the contribution of nesprins and lamins to gene 220 expression has not been fully understood."

      There is a literature about changes in gene expression- at least for lamin KD and KO- both in vitro and in vivo- that the authors could and should review and summarize here.

      To address this, we have now revised the manuscript to include a more comprehensive discussion of the relevant literature and added appropriate citations in the corresponding section. We hope this addition provides better context for our current findings and clarifies the contribution of lamins and nesprins to gene regulation.

      (2) Results:

      "A fragment of shRNA that targeting 3' untranslated region (UTR) in LMNA genes was chosen to knockdown lamin A (shLaminA). A fragment of shRNA that targeting coding sequence (CDS) region in LMNA genes was chosen to knockdown LMNA (shLMNA)". The authors should explain more - does one KD both lamin A and C (shLMNA), versus the other being specific to lamin A but not lamin C? It appears so from later text, but the authors should explicitly explain their targeting strategy right at the beginning to make this clear.

      To make the method clearer, we have clear added the text “The shRNA against lamin A (shLaminA) targets the 3′ untranslated region (UTR) of the LMNA gene, specific to prelamin A, which is post-translationally processed into mature lamin A. The shRNA designated as shLMNA targets a region within the coding sequence of LMNA that is shared by both lamin A and lamin C, corresponding to amino acids 122–129 (KKEGDLIA) of lamin A/C (RefSeq: NM_001406985.1). The shRNA against SYNE2 (shSYNE2) targets a sequence encoding amino acids 5133–5140 (KRYERTEF) of the SYNE2 protein (RefSeq: NM_182914.3).”

      But more importantly, the convention with RNAi is to demonstrate consistent results with at least two different small RNAs. This is to rule out that a physiological result is due to the KD of a non-target gene(s) rather than the target gene. The scrambled shRNA controls are not sufficient for this as they test a general effect of the shRNA culture conditions, including tranfection and dox treatment, etc, rather than a specific KD of a different gene(s) than the target due to off-target RNAi.

      We fully acknowledge the concern regarding the use of only a single shRNA per knockdown and agree that shRNAs are prone to off-target effects. However, we have conducted qPCR confirmation of key RNAseq findings, which strongly supports the specificity and validity of our observed results. Additionally, we recognize the importance of validating our findings using multiple independent shRNAs or alternative knockdown strategies, such as CRISPR deletion or degron-based approaches. To address this rigorously, we are currently optimizing an auxin-inducible degron system (AtAFB2) for targeted depletion of lamin C. Our preliminary data indicate approximately 40% knockdown efficiency after 16 hours of auxin induction, highlighting ongoing optimization efforts (Author response image 1). Future experiments will integrate this improved degron system and multiple independent shRNAs to further substantiate our results and definitively rule out potential off-target effects, thereby enhancing the robustness and reproducibility of our data.

      (3) "Single-cell clones 114 were subsequently isolated and expanded in the presence of 2 μg ml-1 puromycin to 115 establish doxycycline-inducible shRNA-knockdown stable cell lines."

      The authors need to describe explicitly in the Results how exactly they did these experiments. Did they do their analysis using a single clone from each lentivirus shRNA transduction? Did they do analysis - ie RNA-seq- on several clones from the same shRNA transduction and compare? Did they pool clones together?

      In our study, single-cell clones and pooled the three independent clones were mixed following lentiviral transduction with doxycycline-inducible shRNA constructs and selected with 2 μg/ml puromycin. For each shRNA, we screened multiple clones for knockdown efficiency and selected a representative clone exhibiting robust knockdown for downstream experiments, including RNA-seq. We did pool three multiple clones; all functional analyses were performed on pooled clones. We have now revised the Method section to explicitly describe this experimental design: “Multiple independent clones per shRNA were screened for knockdown efficiency using reverse transcription quantitative real-time PCR (RT-qPCR). Three clones demonstrating robust and consistent knockdown were selected and expanded. These clones were subsequently pooled to minimize clonal variability and used for downstream analyses, including RNAseq.”

      One confounding problem is that there are clonal differences among cells cloned from a single cell line. This is particularly true for aneuploid cell lines like U2OS. Ideally, they would use mixed clones, but if not, they should at least explain what they did.

      We added the text to method “Three single-cell clones exhibiting robust knockdown efficiency were individually expanded and subsequently pooled. The pooled clones were maintained in medium containing 2 µg ml ¹ puromycin to establish stable cell lines with doxycycline-inducible shRNA expression. Multiple independent clones per shRNA were screened for knockdown efficiency using reverse transcription quantitative real-time PCR (RT-qPCR). Three clones demonstrating robust and consistent knockdown were selected and expanded. These clones were subsequently pooled to minimize clonal variability and used for downstream analyses, including RNA-seq.”

      (4) I am confused by their shScramble control. This is typically done for each shRNA- ie, a separate scrambled control for each of the different target shRNAs. This is because there are nucleotide composition effects, so the scrambled idea is to keep the nucleotide composition the same.

      However, looking at STable 1 and SFig. 2- shows they used a single scrambled control, thus not controlling for different nucleotide composition among the three shRNAs that they used.

      In our study, we used a single non-targeting shRNA (shScramble) as a control to account for potential effects of the shRNA vector and delivery system. This approach is commonly accepted in the field when the scrambled sequence is validated as non-targeting and does not share significant homology with the genes of interest. While we acknowledge that using separate scrambled controls matched in nucleotide composition for each targeting shRNA can further minimize sequence-dependent effects, we believe that the use of a single validated scramble control is appropriate for the scope of this study.

      (5) In Figure 2 - what is on the x-axis? Number of DEG? Please state this explicitly in the figure legend.

      We have added “Counts” as figure legend, and added the caption “Gene counts are displayed on the x-axis.”

      (6) More importantly, in Figure 2 they only show pathway analysis of DEG. They should show more: a) Fold-change of DEG displayed for all DEG; b) Same for genes in LADs vs iLADs. More explicitly, are the DEG primarily in LADs or iLADs, or a mix? Are the DEGs in LADs biased towards increased expression, as might be expected for LAD derepression? Conversely, what about iLADs - is there a bias towards increased or decreased expression?

      We agree that a more detailed characterization of the differentially expressed genes (DEGs) will strengthen the conclusions. In response we have revised the manuscript as following: “Furthermore, differential expression analysis revealed that the majority of DEGs following depletion of lamins and nesprins were located outside lamina-associated domains (non-LADs). Specifically, for shLaminA knockdown, 8 DEGs within LADs were downregulated and 8 were upregulated, whereas 59 non-LAD DEGs were downregulated and 79 were upregulated. For shLMNA, 7 LAD-associated DEGs were downregulated and 15 were upregulated, with 88 downregulated and 140 upregulated DEGs in non-LAD regions. In the case of shSYNE2 knockdown, 161 LAD DEGs were downregulated and 108 were upregulated, while 2,009 non-LAD DEGs were downregulated and 1,851 were upregulated (Figure 2d). These results indicate that the transcriptional changes resulting from the loss of lamins or nesprins predominantly occur at non-LAD genomic regions.”

      We appreciate the reviewer’s comments, which helped improve the clarity and depth of our analysis.

      (7) Is there a scientific rationale for the authors' focus on DE of isoforms? Is this somehow biologically meaningful and different from the overall DE of all genes? The authors should explain in the Results section what their motivation was in deciding to do this analysis.

      We have add the following statement in response to the reviewer “To uncover transcript-specific regulatory changes, we performed isoform-level differential expression analysis. Many genes produce functionally distinct isoforms, and shifts in their usage can occur without changes in total gene expression, making isoform-level analysis essential for detecting subtle but meaningful transcriptional regulation.  Our analysis demonstrated that depletion of lamins and nesprins induced significant alterations in specific transcript isoforms, indicating regulatory changes in alternative splicing or transcription initiation that are not captured by gene-level differential expression analysis.”

      (8) "Expectedly, the DEGs from 327 depletion of lamin A, LMNA, and SYNE2 seldom intersected with genes in 328 LADs (Figure 4a)."

      Why was this expected? The authors have only cited one review paper. Others have seen significant numbers of genes in LADs that are DE after KD of lamina proteins. What was the fold cutoff used for DE? Was there a cutoff for the level of expression prior to KD? The authors should cite relevant primary literature showing that there are active genes in LADs and that some perturbations of the lamina proteins do result in DE of genes in LADs.

      We acknowledge the reviewer's concerns regarding our statement: "Expectedly, the DEGs from 327 depletion of lamin A, LMNA, and SYNE2 seldom intersected with genes in 328 LADs (Figure 4a)." To clarify, this expectation stems from previous observations that LAD-associated genes are typically transcriptionally silent or expressed at very low levels (Guelen et al., 2008). However, dynamic changes in LADs and gene expression status do occur during cellular differentiation (Peric-Hupkes et al., 2010), and some LAD-resident genes can become active and transcriptionally responsive under specific conditions, such as T cell activation. We applied specific foldchange and baseline expression level thresholds in our analysis, as detailed in the Methods section. We added the following text in the “Method”: “Differential gene expression analysis was performed using thresholds of baseMean > 50, absolute log fold change > 0.5, and p-value < 0.05.”  We agree that additional relevant primary literature demonstrating active gene expression changes within LADs upon perturbation of lamina proteins should be cited and we have added the following statement:

      “LADs exhibit dynamic reorganization and changes in gene expression during cellular differentiation [30]. Although genes within LADs are generally transcriptionally silent or expressed at low levels [31], some LAD-resident genes remain active and can be transcriptionally modulated in response to specific stimuli, such as T cell activation [32].”

      (9) "Expectedly, the DEGs from 327 depletion of lamin A, LMNA, and SYNE2 were seldomly intersected with genes in 328 LADs (Figure 4a)." I disagree with the wording of "seldom" which by definition means rarely. I don't see that this applies to the significant number of genes that are in LADs that are DE as shown in the Venn diagram, Fig. 4a. For example, this includes 57 genes for the shLamin A and ~400 genes for the shSYNE2.

      Is there anything of note about which genes are DE within LADs?

      We have rephrased the text to the following “The Venn diagram analysis revealed limited overlap between DEGs resulting from knockdown of lamin A (shLaminA), LMNA (shLMNA), or SYNE2 (shSYNE2) and genes located within laminaassociated domains (LADs). Specifically, only a small subset of DEGs intersected with LAD-associated genes across all three knockdowns, suggesting that the majority of transcriptional changes occur outside LAD regions”. The DEGs in LADs and non-LADs were shown in supplementary Table S4.

      (10) "The relative distance from DE genes (query features) to LADs (reference feature) is plotted by GenometriCorr package (v 1.1.24). The color depicting deviation from the expected distribution and the line indicating the density of the data at relative distance are shown." The authors should explicitly describe what the reference "expected distribution" was based on. This is all very cryptic right now, so we can't assess the biological possible significance. Third, they should clearly explain what is plotted on the x and y axes of Figure 4C. I really don't have a clue. I assume the x-axis is some measure of "relative distance" but what on earth does that mean? I really don't understand this plot, which is crucial to the whole story. What is on the y-axis? Density of DEGs? What? And they need to explain not only what is plotted on the x and y axes but also provide units.

      We have revised the text to clarify that the GenometriCorr analysis (v1.1.24) was used to assess the spatial association between differentially expressed genes (DEGs, query features) and lamina-associated domains (LADs, reference features). Specifically, this method evaluates whether the observed distances between query and reference genomic intervals significantly deviate from a null distribution generated by random permutation of query features across the genome, while preserving size and chromosomal context.

      In the revised figure legend and main text, we now clarify that the x-axis represents the relative genomic distance between each differentially expressed gene (DEG) and the nearest LAD, scaled between –1 and 1, where values near 0 indicate close proximity, and values approaching –1 or 1 reflect greater distances on either side of the LADs. The y-axis denotes the density (or proportion) of query features (DEGs) at each relative distance bin. The color gradient overlays the plot to indicate deviation from the expected null distribution (based on randomized query positions): red indicates enrichment (closer than expected), while blue indicates depletion (further than expected).

      “GenometriCorr analysis (v1.1.24) was used to assess the spatial relationship between DEGs (query) and LADs (reference) [48]. The x-axis shows the relative genomic distance between each DEG and the nearest LAD, scaled from –1 (far upstream) to 1 (far downstream), with 0 indicating closest proximity. The y-axis represents the density of DEGs at each distance bin. A color gradient indicates deviation from a randomized null distribution: red signifies enrichment (closer than expected), and blue signifies depletion. Statistical significance was determined using the Jaccard test (p < 0.05).”

      Second, to correlate with other features and to give more meaning, the authors should show the chromosome location of the DEGs and scale this by the actual DNA sequence distances. This will be needed to correlate with other features from other studies.

      The genomic positions of DEGs have now been displayed in Figure 4b, with distances shown in base pairs to facilitate cross-reference with other features in future studies.

      Third, they should attempt some kind of analysis themselves to try to understand what might correlate with the DEGs. To begin with, they might try to correlate with lamin A ChiP-seq or other molecular proximity assays. Others in fact have shown that lamin A interacts with 5' regulatory regions of a subset of genes- presumably this is the diffuse nucleoplasmic pool of lamin A that has been studied by others in the past.

      We agree that understanding potential regulatory mechanisms underlying DEG distribution is essential. In response, we have expanded our analysis (Figure 2d) to highlight that a substantial portion of DEGs are located outside of LADs, suggesting potential regulation by the nucleoplasmic pool of lamin A. This is consistent with previous studies showing lamin A interaction with regulatory elements such as 5′ UTRs and enhancers, independent of LAD localization. We have now cited relevant literature to support this hypothesis.

      Fourth, in the table, they should go beyond just giving the fold change in expression. Particularly for genes that are expressed at very low levels, this is not particularly meaningful as it is very sensitive to noise. They should provide a metric related to levels of expression both before and after the KD.

      We acknowledge the reviewer’s concern regarding fold-change interpretation for low-abundance transcripts. To improve clarity and interpretability, we have now included Supplementary Table S4, which provides the raw counts and baseMean values (average normalized expression across all samples) for all DEGs. Additionally, we note that in our differential expression analysis, genes with baseMean < 50 and absolute log<sub>2</sub>fold change > 0.5 were filtered out to reduce potential noise from low-expression genes.

      (11) The figure legend and description in the Results section were completely inadequate. I had little understanding of what was being plotted. It is not sufficient to simply state the name of some software package that they used to measure "XYZ" and to show the results. It has no meaning for the average reader.

      Without some type of explanation of rationale, questions being asked, and conclusions made of biological relevance, this section made zero impact on me.

      Yes- details can be provided in the Methods. But conceptually, the methods and the conceptual underpinnings of the approach and as the question being asked and the rationale for the approach, with the significance of the results, need to be developed in the Results section.

      In response, we have revised the “Results” section to better articulate the rationale behind the analysis, the specific biological questions we aimed to address, and the conceptual relevance of the method used. We have also clarified the meaning of the plotted data and how it supports our conclusions.

      While technical details remain in the “Methods” section, we now provide a more accessible narrative in the Results to guide the reader through the approach and highlight the biological significance of our findings. We hope these revisions make the section more informative and impactful.

      (12) The telomere movement part of the manuscript seems to come out of nowhere. Why telomeres? Where are telomeres normally positioned, particularly relative to the nuclear lamina? Does this change with the KDs - particularly for those that increase motion? The MSD for SYNE2 appears unconstrained- they should explore longer delta time periods to see if it reaches a point of constrained movement.

      If the telomeres are simply tethered at the nuclear lamina, then is that the explanation- that they become untethered? But if they are not typically at the periphery, then where are they relative to other nuclear compartments? And why is there mobility changing? Is it related to the loss of nuclear lamina positioning of adjacent LAD regions to the telomeres? Is it an indirect, secondary effect? What would they see after an acute KD? What about other chromosome regions? Again, there is little explanation for the rationale for these observations. It is one of many possible experiments they could have done. Why did they do this one?

      We added the following explanation “Although telomeres are not uniformly tethered to the nuclear lamina, they can transiently associate with the nuclear periphery, particularly during post-mitotic nuclear reassembly, through interactions involving SUN1 and RAP1 36. Given that lamins and nesprins are key components of the nuclear envelope that regulate chromatin organization and mechanics 37,38, we examined telomere dynamics as a proxy for changes in nuclear architecture. Using EGFP-tagged dCas9 to label telomeric regions in live U2OS cells, we assessed whether knockdown of these proteins leads to increased telomere mobility, reflecting a loss of structural constraint or altered chromatin–nuclear envelope interactions 17.” And “To probe how nuclear envelope components regulate chromatin dynamics, we tracked telomeres as a representative genomic locus whose mobility reflects changes in nuclear mechanics and chromatin organization. Although telomeres are not stably tethered to the nuclear lamina, their motion can be influenced by nuclear architecture and transient peripheral associations [36]. Upon depletion of lamin A, LMNA, or SYNE2, we observed significantly increased telomere mobility and nuclear area explored, quantified by mean square displacement and net displacement (Figure 6b–c, Supplementary Movie S1). These changes likely reflect altered chromatin–lamina interactions or disrupted nuclear mechanical constraints, consistent with prior studies showing that lamins modulate chromatin dynamics and nuclear stiffness [37,38,39]. Thus, our findings support a role for lamins and nesprins in constraining chromatin motion through nuclear structural integrity.”

      (13) "Notably, Lamin A depletion led to enrichment of 392 pathways associated with RNA biosynthesis, supporting its previously suggested role 393 in transcriptional activation and ribonucleotide metabolism."

      There is a literature on this. Say more and cite the references.

      Notably, lamin A depletion led to enrichment of pathways associated with RNA biosynthesis, supporting its previously suggested role in transcriptional activation and ribonucleotide metabolism 45.  

      (14) "This aligns with prior studies indicating that Lamin A contributes to chromatin accessibility and RNA polymerase activity." Again, there is a literature on this. Say more and cite the references.

      This aligns with prior studies indicating that lamin A contributes to chromatin accessibility and RNA polymerase activity 46. These findings further underscore the functional relevance of lamin A in coordinating transcriptional programs through modulation of nuclear architecture.

      (15) "In contrast, LMNA knockdown was linked to alterations in chromatin conformation." No. The authors show gene ontology and implicate perturbed RNA levels for genes implicated in "chromatin conformation". That is not the same thing as measuring chromatin conformation, which is not done, and showing changes in conformation.

      Based on the reviewer’s comment we have revised the text as the following: “In contrast, LMNA knockdown led to differential expression of genes enriched in pathways related to chromatin organization, suggesting potential disruptions in chromatin regulatory networks. Although direct measurements of chromatin conformation were not performed, these transcriptional changes indicate that LMNA may contribute to maintaining nuclear architecture and genomic stability, which aligns with its established involvement in laminopathies and genome integrity disorders.”

      (16) "The findings that DEGs are predominantly located in non-LAD regions highlight a unique regulatory aspect of lamins and nesprins, emphasizing their spatial specificity in gene expression". Is this novel? Can the authors separate direct from indirect effects? Is the percentage of genes in LADs that are altered in expression different from the percentage of genes in iLADs that are altered in expression? There are many more active genes in iLADs, so one expects more DEGs in iLADs even if this is random. Also - how does this correlate with lamin A binding near 5' regulatory regions detected by ChIP-seq? See the following review for references to this question and also previous work on lamin A versus chromatin mobility, including telomeres. J Cell Sci (2017) 130 (13): 2087-2096. https://doi.org/10.1242/jcs.203430

      We appreciate the reviewer’s valuable comments and feedback, we have revised the manuscript as the following to address the feedback. “Furthermore, differential expression analysis revealed that the majority of DEGs following depletion of lamins and nesprins were located outside lamina-associated domains (non-LADs). Specifically, for shLaminA knockdown, 8 DEGs within LADs were downregulated and 8 were upregulated, whereas 59 non-LAD DEGs were downregulated and 79 were upregulated. For shLMNA, 7 LAD-associated DEGs were downregulated and 15 were upregulated, with 88 downregulated and 140 upregulated DEGs in non-LAD regions. In the case of shSYNE2 knockdown, 161 LAD DEGs were downregulated and 108 were upregulated, while 2,009 non-LAD DEGs were downregulated and 1,851 were upregulated (Figure 2d, Supplementary Table S4). These results indicate that the transcriptional changes resulting from the loss of lamins or nesprins predominantly occur at non-LAD genomic regions.

      The percentage of DEGs was consistently higher in non-LADs, which are gene rich and transcriptionally active, whereas LADs, known to be enriched for silent or lowly expressed genes, showed fewer expression changes. These findings are consistent with previous studies demonstrating that active genes are more prevalent in non-LADs and that LAD associated genes are generally repressed or less responsive to perturbation [27,28]. Together, these results support a model in which lamins and nesprins influence gene expression through both structural organization and promoter proximal interactions, particularly within euchromatic nuclear regions [10,26,29].”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      While scRNA-seq data clearly revealed different subsets of microglia, macrophages, and DCs in the brain, it remains somewhat challenging to distinguish DC-like cells from P2ry12- macrophages by immunohistochemistry or flow cytometry.

      Indeed, in flow cytometry analyses of adult brain samples, the p2ry12<sup>-</sup>; mpeg1<sup>+</sup> fraction could, in theory, encompass not only DC-like cells but also other macrophage subsets, as well as B cells, since B cells have been reported to express mpeg1 in zebrafish (Ferrero et al., 2020; Moyse et al., 2020). Nevertheless, our data strongly indicate that within the brain parenchyma, DC-like cells represent the predominant component of this population. This conclusion is supported by the pronounced reduction of p2ry12<sup>-</sup>; mpeg1<sup>+</sup> cells in brain sections from ba43 mutants, in which DC development is impaired. Currently, further phenotypic resolution is constrained by the limited availability of zebrafish-specific antibodies and the restricted palette of fluorescent reporter lines capable of distinguishing MNP subsets. We anticipate that future efforts, including the generation of novel transgenic lines informed by our dataset (initiatives already underway in our group), will enable more precise discrimination among these distinct subsets.

      Reviewer #2 (Public Review):

      A weakness of this study is that it is mainly based on FACS sorting, which might modify the proportion of different subtypes.

      We agree that reliance solely on FACS could potentially introduce biases in the proportions of different subtypes. To minimize this concern, we complemented our flow cytometry data with quantification performed directly on brain sections using immunohistochemistry. This approach allowed us to validate cell population distributions in situ, thereby confirming that the trends observed by FACS accurately reflect the cellular composition of microglia and DC-like cells within the brain parenchyma.

      Reviewer#3 (Public Review):

      A weakness is the lack of specific reporters or labeling of this dendritic cell population using specific genes found in their single-cell dataset. Additionally, it is difficult to remove the meningeal layers from the brain samples and thus can lead to confounding conclusions. Overall, I believe this study should be accepted contingent on sufficient labeling of this population and addressing comments.

      While the generation of DC-like specific transgenic lines is indeed a promising direction (and such efforts are currently underway in our group), creating and validating these lines is time-consuming. Importantly, although these additional tools will be valuable for future functional investigations, we believe they would not impact the main conclusions or core message of our current work, where we already provide detailed spatial information on DC-like cells, and we demonstrated their lineage identity through the use of our newly generated batf3 mutant line. 

      Recommendations for the authors:

      Major Comments: 

      The authors should discuss another recent report demonstrating DCs in the zebrafish brain, which also developed independently of Csf1ra, and compare the two datasets (Zhou et al. Cell reports, 2023).

      Thank you for highlighting the study by Zhou et al., which offers complimentary insight into the dendritic cell population in the zebrafish brain. We note that in this work, the authors reclassify ccl34b.1<sup>-</sup> mpeg1<sup>+</sup> brain-resident cells as conventional DCs, thus revising their earlier interpretation of these cells as microglia (Wu et al., 2020). This shift in interpretation is based on their transcriptional comparison between the previously characterized ccl34b.1<sup>-</sup> mpeg1<sup>+</sup> population and a new dataset of brain

      mpeg1<sup>+</sup> cells. This updated classification aligns closely with our findings. Given that our data already demonstrate the equivalence between the DC-like cells described in our study and the ccl34b.1<sup>-</sup> mpeg1<sup>+</sup> population, repeating a direct transcriptional comparison would be redundant. We have now included a discussion of this work in the revised manuscript. Specifically, we have added the following sentences in the discussion: “Importantly, since the submission of our manuscript, the Wen lab published an independent study in which they now reclassify the ccl34b.1<sup>-</sup> mpeg1<sup>+</sup> cells in the zebrafish brain as cDCs, revising their earlier interpretation of these cells as microglia (Zhou et al., 2023)”. 

      Data reported in Figure 5 should be quantified (cell numbers, how many brains analyzed). 

      Thank you for this comment. We would like to clarify that the primary purpose of Figure 5 (and Figure 5 supplement 1) is to provide an initial qualitative overview of the different MNP subsets present in the adult brain, using the currently available transgenic and immunohistochemical tools. These descriptive analyses were instrumental in identifying the most reliable combination, namely the Tg(p2ry12:p2ry12GFP; mpeg1.1:mCherry) double transgenic line in conjunction with L-plastin immunostaining, to distinguish microglia from other parenchymal MNPs. Quantitative analyses using this optimized strategy are presented in Figure 7 (Figure 7 supplement 1), where we systematically enumerate the different MNPs. We therefore believe that performing additional quantification in Figure 5 would be redundant with the more robust data already shown in Figure 7. As requested, we have now included in the Figure 5 legend that images are representative of brain tissue sections from 2-3 fish. 

      The title mentions an "atlas", but there is no searchable database or website associated with the paper. Please provide one.

      We agree and fully support the importance of data accessibility. To facilitate use of our dataset by the scientific community, we have developed a user-friendly, searchable web interface that allows users to explore gene expression pacerns within our dataset. This website is available at https://scrna-analysis zebrafish.shinyapps.io/scatlas/

      This information has now been included in the “Data availability statement” section of the manuscript.  

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      The authors should discuss another recent report demonstrating DCs in the zebrafish brain, which also developed independently of Csf1ra, and compare the two datasets (Zhou et al. Cell reports, 2023). 

      Thank you for this suggestion. Please refer to our response in the major comments section, where we address this point in detail.

      Within macrophages, the authors identified 5 clusters including 4 microglia clusters and 1 MF cluster (Figure 4). Does the laUer relate to 'BAMs' and express markers previously described in murine BAMs, including Lyve1, CD206, etc.? Or to monocytes? By flow cytometry, monocytes were detected (Figure 1B), but not by scRNA-seq.  

      You have raised an important point here. As described in lines 197-202 (“results” section), the cells in the MF cluster exhibit a macrophage identity, based on their expression of classical macrophage markers such as marco, mfap4 or csf1ra. However, we were unable to confidently annotate this cluster more specifically. We also considered whether this population might resemble mammalian BAMs or monocytes, cell types that, to our knowledge, have not yet been clearly identified in zebrafish. However, orthologous markers typically associated with murine BAMs were not detected (lyve1) or not specifically enriched (mrc1a/mrc1b) in the MF cluster (see below). Based on these findings, we can only cautiously propose that this cluster may represent blood-derived macrophages and / or monocytes.

      To further address your suggestion, we performed a cell type enrichment analysis using the marker genes of the MF cluster, following the same strategy as for the microglia and DC-like clusters presented in Figure 4 supplement 2 C,D. This analysis revealed significant for “monocytes” and “macrophages”, further supporting a general monocytic/macrophage identity (see below). At present, further characterization of this cluster is limited by the lack of zebrafish-specific antibodies and the restricted palette of fluorescent reporter lines that distinguish among MNP subsets. We anticipate that future studies, including the development of new transgenic lines guided by our dataset, will allow for a more precise analysis of this distinct population. 

      Author response image 1.

      Do all 4 DC clusters identified by scRNA-seq represent cDC1s? or are there also cDC2s, and cDC3s present?  

      In our analyses, the four dendritic cell clusters identified by scRNA-seq (DC1-DC4) exhibit transcriptional profiles consistent with a conventional type 1 dendritic cell (cDC1) identity. These clusters uniformly express hallmark cDC1-associated genes, while lacking expression of markers typically associated with mammalian cDC2 or plasmacytoid dendritic cells (pDCs). For instance, irf4, a key transcription factor required for cDC2 development, is not detected in our dataset. Similarly, we do not observe expression of genes characteristic of pDCs. 

      That said, the absence of cDC2 or pDC-like signatures in our dataset does not rule out the presence of these populations in zebrafish.  

      While they show that DC-like cells did not express Csf1rb (Figure 4D) or other macrophage/microglia genes, DC-like cells were affected in the Csf1rb mutants and in double mutants, demonstrating that their development depends on Csf1rb signaling, as known for macrophages but not DCs. Can the authors discuss this in more detail with regard to DC differentiation/precursors? 

      Thank you for pointing this out. As previously demonstrated, CSF1R signaling in zebrafish is more complex than in mammals, due to the presence of two paralogs, csf1ra and csf1rb, which exhibit partially non-overlapping functions (Ferrero et al., 2021). We and others have shown that csf1rb signaling is implicated in the regulation of definitive hematopoiesis, particularly in the regulation of hematopoietic stem cell (HSC)-derived myelopoiesis. Although the developmental origin of zebrafish brain DC-like cells remains uncharacterized, their reduced numbers in the csf1rb mutant, despite their lack of csf1rb expression, supports the current model in which csf1rb acts at the progenitor level, promoting myeloid lineage commitment. According to this, csf1rb disruption affects the differentiation of multiple myeloid subsets, which likely include DC-like cells. We have developed this point in the discussion section (lines 502506).  

      Do the DCs express Csf1ra? 

      Csf1ra transcripts are not found in DCs in our dataset. As shown below, csf1ra expression is restricted to the microglia and macrophage clusters. These observations are in line with those made by Zhou et al., 2023.

      Author response image 2.

      Fig. 5, the number of brains analyzed should be added, and also quantifications of cell numbers included. It is mentioned (line 260) that P2ry12GFP+mpeg1mCherry+ microglia are abundant across brain regions while P2ry12GFP- mpeg1mCherry+ cells particularly localize in the ventral part of the posterior brain parenchyma. It would be nice if images of the different brain regions were provided. 

      Regarding the quantification, we refer to our response in the major comments section, where we explain that detailed quantification of microglia and other MNP subsets is provided in Figure 7, using a more refined strategy for distinguishing cell types.

      As requested, we have now included representative sections from the forebrain, midbrain and hindbrain of adult Tg(mhc2dab:GFP; cd45:DsRed) fish. These images illustrate the spatial distribution of DC-like cells across brain regions. Notably, DC-like cells are most abundant in the ventral areas of the midbrain and hindbrain, and are also present in the posterior telencephalon, particularly concentrated in the region of the commissura anterior. This regional annotation is based on the zebrafish brain atlas by Wullimann et al., 1996 (Neuroanatomy of the zebrafish brain, https://doi.org/10.1007/978-3-0348-8979-7).

      These additional images have been included in Figure 5 Supplement 1 (A-E).

      It is sometimes not evident whether the Pr2y12- cells included DC-like cells and macrophages, which should be discussed. 

      Thank you for bringing this to our attention. Upon review, we agree this point required clearer explanation throughout the text, particularly beginning with the description of putative DC-like cells in Figure 5. We have now revised the manuscript to improve clarity and becer guide readers through the phenotypic identification of DC-like cells using the Tg(p2ry12:p2ry12-GFP;mpeg1:mCherry) line. Specifically, we have modified the titles in the results section from page 5 to page 9, so that readers can more easily follow the step-by-step approach we used to distinguish DC-like cells from microglia. 

      To directly address your comment: the p2ry12<sup>-</sup>; mpeg1<sup>+</sup> fraction may, in theory, include not only DC-like cells but also other macrophage subsets and B cells, as B cells have been shown to express mpeg1 in zebrafish (Ferrero et al., 2020; Moyse et al., 2020). Nevertheless, our data strongly indicate that within the brain parenchyma, DC-like cells represent the predominant component of this population. This conclusion is supported by the pronounced reduction of p2ry12<sup>-</sup>; mpeg1<sup>+</sup> cells in brain sections from ba43 mutants, in which DC development is impaired. 

      We have revised the text accordingly to clarify this point in the results section of the manuscript (line 355).

      For example, the DC-like cell population in Figure 6C appears to include two populations of cells. Thus, it is unclear whether the sorted mhc2dab:GFP+;CD45:DsRedhi population for bulk-seq also contains the MF population identified in Fig. 2. 

      Thank you for this thoughtful observation. During the course of this study, we indeed considered how best to isolate non-microglial macrophages in order to specifically recover the MF population identified in our scRNA-seq analysis. However, with the current repertoire of fluorescent transgenic zebrafish lines, it remains technically challenging to selectively isolate non-microglial macrophages from the adult brain. As a result, the mhc2dab:GFP<sub>+</sub>; cd45:DsRedhi sorted population used for bulk RNA-seq may indeed include a mixture of DC-like and other mononuclear phagocytes, potentially the MF population. In contrast, our data demonstrate that the Tg(p2ry12:p2ry12-GFP) line provides a more selective tool for isolating microglia, minimizing contamination from other mononuclear phagocyte subsets.

      In Figure 7, a reduction of GFP-mpeg+ cells can be seen in baf3 mutants. Could the remaining cells be the (non-microglia) macrophages? Or in Figure 8, could the remaining P2ry12GFP-Lcp1+ cells in Irf8 mutants be macrophages? 

      Indeed, we believe it is likely that the remaining mpeg1<sup>+</sup> cells observed in ba43 mutants include non-microglial macrophages and/or B cells, as we and others previously showed that zebrafish B cells express mpeg1.1 transcripts and are labeled in the mpeg1.1 reporters (Ferrero et al., 2020). This interpretation is further supported by the observation that the reduction in mepg1+ cells is more pronounced in brain sections than in flow cytometry samples, where non-parenchymal mpeg+ cells, such as peripheral macrophages or B cells, are likely enriched. To explore this possibility, we attempted to assess the expression of MF- and B cell-specific markers in the remaining mpeg1+ population isolated from ba43 mutants. However, due to the very low numbers of cells recovered per animal, we were limited to analyzing only a few markers. Despite multiple attempts, qPCR analyses proved unconclusive, likely due to low transcript abundance. We thank you for your understanding of the technical limitations that currently prevent a more definitive characterization of these remaining cells.  

      Regarding the irf8 mutants (Figure 8), irf8 is a well-established master regulator of mononuclear phagocyte development. In mice, deficiency results in developmental defects and functional impairments across multiple myeloid lineages, including microglia, which exhibit reduced density (Kierdorf et al., 2013) and an immature phenotype (Vanhove and al., 2019). Similarly, in zebrafish, irf8 mutants show abnormal macrophage development, with an accumulation of immature and apoptotic cells during embryonic and larval stages (Shiau et al., 2014). Based on these findings, it is plausible that the residual p2ry12:GFP<sup>-</sup> Lcp1<sup>+</sup> cells observed in the irf8 mutant brains represent immature or arrested mononuclear phagocytes, possibly including both microglia and DC-like cells. This is supported by their distinct morphology and specific localization along the ventricle borders. However, as previously noted, our current tools do not permit to conclusively identify these cells.

      Reviewer #2 (Recommendations For The Authors): 

      A few sentences are not easy to understand for a "non zebrafish specialist". 

      (1) Page 3 line 111 The sentence "Interestingly, analyses of brain cell suspensions from double transgenics showed p2ry12:GFP+ microglia accounted for half of cd45:DsRed+ cells (50.9 % {plus minus} 2.9; n=4) (Figure 1D,E). Considering that mpeg1:GFP+ cells comprised ~75% of all leukocytes, these results indicated that approximately 25% of brain mononuclear phagocytes do not express the microglial p2ry12:GFP+ transgene." is not clear. This point is significant and deserves a more detailed explanation. 

      We apologize for the lack of clarity in this section. The quantification presented in Figure 1 refers specifically to cd45:Dsred<sup>+</sup> leukocytes, meaning that the reported percentages of p2ry12:GFP<sup>+</sup> and mpeg1:GFP<sup>+</sup> cells are calculated relative to the total cd45+ population (defined as 100%). Specifically, we observed that approximately 51% of all cd45+ cells were p2r12:GFP<sup>+</sup> microglia, while around ti5% were mpeg1:GFP<sup>+</sup>. From these values, we infer that about 25% of mpeg1:GFP<sup>+</sup> leukocytes do not express the p2ry12:GFP transgene and therefore likely represent non-microglial mononuclear phagocytes. We agree that this distinction is important and have revised the text accordingly to clarify the interpretation for readers who may be less familiar with zebrafish transgenic lines or gating strategies. See page 3, lines 107 117.

      (2) Line 522; Like human and mouse ILC2s, "these cells do not express the T cell receptor cd4-1" is confusing (T cell receptor should be reserved to the ag specific TCR). Also, was TCR isotypes expression analyzed (and how was genome annotation used in this case ?) 

      Thank you for this insightful comment.  We agree that the term “T cell receptor” should be used specifically to refer to antigen-specific TCRs, and we have revised the discussion accordingly to avoid any confusion. Regarding your question on the analysis of TCR isotype expression and the use of genome annotation: due to technical limitations, we did not pursue TCR isotype-level analysis in this study. Instead, we relied on established markers such as cd4-1 and cd8a to distinguish T cell populations, acknowledging that cd4-1 is not expressed by ILC2-like cells in our dataset. We have clarified these points in the relevant sections of the manuscript (see lines 168 and 535)

      The analysis of single-cell data might be more detailed, with more explanation about possible doublet identification and normalization procedures. 

      Thank you for highlighting the need for additional clarity regarding our scRNA-seq analysis.

      As noted in the Seurat tutorial, “cell doublets or multiplets often exhibit abnormally high gene count” (https://sa7jalab.org/seurat/archive/v3.0/pbmc3k_tutorial). To evaluate this, we performed a dedicated doublet detection analysis using the scDblFinder R package (https://rdrr.io/bioc/scDblFinder/f/vigneces/2_scDblFinder.Rmd). Our results indicated that the proportion of predicted doublets is low (see Figure below), and when present, these doublets are distributed among the different clusters. This contrasts with the typical clustering of doublets into discrete groups and indicates that our single-cell sequencing workflow was sufficiently robust to predominantly capture singlets.

      Regarding normalization, we have clarified this in the manuscript. Briefly, single-cell data were normalized using Seurat’s SCTransform method with the following custom parameters: “variable.features.n=4000 and return.only.var.genes=F”. These settings are now clearly described to ensure reproducibility.

      Author response image 3.

      Reviewer #3 (Recommendations For The Authors):

      Major issues

      Though baf3 mutants were generated the manuscript will greatly benefit from in situ labeling by RNAscope or the generation of transgenic reporters to conclusively localize this dendritic cell population and address any potential contamination issues. 

      We thank you for this constructive suggestion. We agree that in situ labeling approaches such as RNAscope would offer valuable complementary insights. In our current study, however, we already provide detailed spatial information on DC-like cells, and we demonstrated their lineage identity through the use of our newly generated batf3 mutant line. 

      To address concerns regarding potential contamination, we have carefully analyzed more than two dozens adult brains to date and consistently observed abundant DC-like cells within the brain parenchyma, exhibiting a reproducible and specific spatial distribution, as described in the manuscript. This consistent localization across multiple samples strongly supports the genuine presence of these cells in the brain rather than artifactual contamination.

      While the generation of DC-like specific transgenic lines is indeed a promising direction (and such efforts are currently underway in our group) we note that creating and validating these lines is time-consuming and falls beyond the scope of the present study. Importantly, although these additional tools will be valuable for future functional investigations, we believe they would not impact the main conclusions or core message of our current work. 

      The morphological characterization of CD45:DsRed+ macrophages stained with May-Grunwald-Giemsa has been previously reported in the paper, "Characterization of the mononuclear phagocyte system in the zebrafish" Wittamer et al., 2011."Morphologic analyses revealed that the majority of cells exhibited the characteristics of monocytes/macrophages namely low nuclear to cytoplasm ratios and a high number of cytoplasmic vacuoles (Figure 3B). 

      We thank you for pointing out the reference to Wittamer et al., 2011. In that study, we indeed provided the first morphological characterization of mononuclear phagocytes (MNPs) in various adult zebrafish organs using the cd45:DsRed line in combination with the mhc2dab:GFP reporter. The focus was primarily on MNPs across peripheral tissues. In the current study, our aim is broader: we investigate the full diversity of brain immune cells, using cd45 as a general marker for leukocytes. As part of this comprehensive characterization, we applied MGG staining, a widely accepted cytological technique, to gain morphological insight into the sorted CD45:DsRed+ population. This method remains a valuable and rapid approach to visually assess cell type heterogeneity, especially when evaluating samples where multiple immune cell lineages may be present. 

      While there is some overlap with the methodology used in Wittamer et al., the context, scope, and tissue examined differ substantially. Thus, the inclusion of MGG staining in this study serves to complement our broader transcriptomic analyses by providing supporting morphological evidence specific to brain-resident immune cells.

      We have now clarified this distinction in the revised manuscript to better differentiate the current work from our previous findings (see line 85).

      Figure 5 data should be quantified.

      Please refer to our response in the major comments section, where we address this question in detail.

      Figure 7- Figure Supplement 1. J, K has no CD45:DsRed positive cells in baf3 mutants, which is counterintuitive because CD45:DsRed should capture all hematopoietic cells and is not specific to dendritic cells. 

      It is correct that cd45 is a general leukocyte marker, labeling all immune cells, including dendritic cells. In this Figure, we used the Tg(cd45:DsRed) transgenic line to visualize the phenotype because it offers an alternative to IHC, with the advantage of strong endogenous fluorescence and easier screening of vibratome sections. However, this technique has limitations: due to fixation, only cells with high fluorescence (e.g. cd45<sup>high</sup>dendritic cells) are captured, while those with medium/low expression (e.g. cd45<sup>low</sup> microglia) are often not visible. This explains why fewer cells are observed in both wild-type and ba43 mutant brains (Figure 5 KN, Figure 7 – supplement 1 JK). While this approach is quicker and allows for thicker sections, IHC remains the preferred method for the rest of the analyses, including the use of additional markers to identify all relevant cell populations. 

      Thank you for bringing this point of confusion to our attention. To improve clarity, we have amended the text in the relevant sections (see lines 704-706, and legend of Figure 7 Supplement 1)

      Minor issues: 

      The terms in the title, "A single-cell transcriptomic atlas..." are used. What is meant by "atlas"? A searchable database or website is not provided.

      Please refer to our response in the major comments section, where we explain that we have made our dataset accessible through a searchable web interface (https://scrna-analysiszebrafish.shinyapps.io/scatlas/) which is now referenced in the Data Availability Statement.

      This reviewer considers that it is offensive to use terminology such as "poorly characterized" in reference to others' work. 

      Thank you for pointing this out. We understand the concern and have revised the wording to ensure it remains respectful and neutral when referring to previous work. The changes are reflected in lines 20 and 49.

      The introduction of this manuscript should consider restructuring and editing. Example: Lines 51-57 introduce the importance of immune cells in zebrafish regeneration studies. However, this study does not investigate such processes. Additionally, the authors focus on the concept of immune heterogeneity in the brain throughout the text however, these studies have been conducted previously by others (Silva et al., 2021) at single-cell level.

      The novelty of this manuscript is the identification of "dendritic-like cells" and yet the introduction and text are limited to 68-71 lines. The introduction would benefit by introducing this cell type "dendritic-like cells" and differences between vertebrates. 

      Thank you for these valuable comments. In response, we have revised the introduction to better align with the focus of the study (see edited text in page 2). We now emphasize that, while macrophages have been extensively studied in zebrafish, dendritic cells remain much less well characterized in this model.  Also, while we acknowledge that Silva et al. addressed aspects of immune heterogeneity in the zebrafish brain, their study primarily focused on mononuclear phagocytes. In contrast, our work provides a broader and more detailed characterization of the brain immune landscape, integrating transcriptomic data with multiple fluorescent reporter lines and hematopoietic mutants to strengthen cell identity assignments. Importantly, we note that Silva et al. classified DC-like cells within the microglial compartment, whereas our findings support that these cells represent a distinct population. While our data challenge this specific aspect of their conclusions, we believe both studies offer complementary insights that collectively advance our understanding of zebrafish brain immunity. 

      Though Figure 6 is a great conformation of scRNA sequencing, it seems redundant and should be supplemental data.

      We respectfully disagree with the reviewer’s suggestion. We believe that presenting the data in Figure 6 as the main figure enhances its visibility and impact, particularly highlighting the distinction between microglia and DC-like cells, an aspect we consider highly valuable information for the zebrafish research community. This is especially important given that our conclusions challenge two previous independent reports, further underscoring the relevance of these findings to the field.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Li and coworkers addresses the important and fundamental question of replication initiation in Escherichia coli, which remains open, despite many classic and recent works. It leverages single-cell mRNA-FISH experiments in strains with titratable DnaA and novel DnaA activity reporters to monitor DNA activity peaks versus size. The authors find oscillations in DnaA activity and show that their peaks correlate well with the estimated population-average replication initiation volume across conditions and imposed dnaA transcription levels. The study also proposes a novel extrusion model where DNA-binding proteins regulate free DnaA availability in response to biomass-DNA imbalance. Experimental perturbations of H-NS support the model validity, addressing key gaps in current replication control frameworks.

      Strengths:

      I find the study interesting and well conducted, and I think its main strong points are:

      (1) the novel reporters obtained with systematic synthetic biology methods, and combined with a titratable dnaA strain.

      (2) the interesting perturbations (titration, production arrest, and H-NS).

      (3) the use of single-cell mRNA FISH to monitor transcripts directly.

      The proposed extrusion model is also interesting, though not fully validated, and I think it will contribute positively to the future debate.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses and Limitations:

      (1) A relevant limitation in novelty is that DnaA activity and concentration oscillations have been reported by the cited Iuliani and coworkers previously by dynamic microscopy, and to a smaller extent by the other cited study by Pountain and coworkers using mRNA FISH.

      (2) An important limitation is that the study is not dynamic. While monitoring mRNA is interesting and relevant, the current study is based on concentrations and not time variations (or nascent mRNA). Conversely, the study by Iuliani and coworkers, while having the drawback of monitoring proteins, can directly assess production rates. It would be interesting for future studies or revisions to monitor the strains and reporters dynamically, as well as using (as a control) the technique of this study on the chromosomal reporters used by Iuliani et al.

      We acknowledge the value of dynamic measurements and clarify our methodological rationale.

      While luliani et al. provided valuable temporal resolution through protein dynamics, our mRNA FISH approach achieves direct decoupling of transcriptional vs. post-translational regulation (Fig 4F-H), and condition flexibility across 7 growth rates (30-66 min doubling times). This trade-off sacrifices temporal resolution for enhanced population-scale resolution and perturbation flexibility. To directly address temporal coupling, future work will implement dual-color live imaging of DnaA activity concurrent with replication initiation events.

      (3) Regarding the mathematical models, a lot of details are missing regarding the definitions and the use of such models, which are only presented briefly in the Methods section. The reader is not given any tools to understand the predictions of different models, and no analytical estimates are used. The falsification procedures are not clear. More transparency and depth in the analysis are needed, unless the models are just used as a heuristic tool for qualitative arguments (but this would weaken the claims). The Berger model, for example, has many parameters and many regimes and behaviors. When models are compared to data (e.g., in Figure 2G), it is not clear which parameters were used, how they were fixed, and whether and how the model prediction depends on parameters.

      We agree that model transparency is essential for quantitative validation. To address this, all model parameters (DnaA synthesis rate, activation/deactivation rates etc.) are explicitly tabulated in Supplementary Information Table S6. For the titration (Hansen et al. 1991) and extrusion models, we derive analytical expressions for initiation mass (IM) sensitivity to DnaA expression in Supplementary Note 1. For Figure 2G/S6, we used published parameters (Berger & Wolde 2022 SI Table 2) with experiment growth conditions (μ = 1.54 h<sup>-1</sup>).

      The extrusion model's validation relies primarily on its ability to resolve paradoxical initiation events under dnaA shutdown (Fig 6C), a test where other models fail categorically. While the Berger titration-switch hybrid can fit steady-state IM trends (Fig S6A), it cannot reproduce post-shutdown dynamics without ad hoc modifications (Fig S6B). We acknowledge that comprehensive analysis of all model regimes exceeds this study's scope but provide full simulation code for independent verification: https://github.com/BaiYangBqdq/dynamics_of_biomass_DNA_coordination

      (4) Importantly, the main statement about tight correlations of peak volumes and average estimated initiation volume does not establish coincidence, and some of the claims by the authors are unclear in these respects (e.g., when they say "we resolve a 1:1 coupling between DnaA activity thresholds and replication initiation", the statement could be correct but is ambiguous). Crucially, the data rely on average initiation volumes (on which there seems to be an eternally open debate, also involving the authors), and the estimate procedure relies on assumptions that could lead to biases and uncertainties added to the population variability (in any case, error bars are not provided).

      We acknowledge the limitations of population-level inference and have refined our claims: "Replication initiation volume scales proportionally with peak DnaA activity volume with a slope of 1.0 (R<sub>2</sub>=0.98, Fig 7G), indicating predictive correspondence rather than absolute coincidence. While population-level  𝑉<sub>𝑖</sub> estimation cannot resolve single-cell stochasticity, the consistent 𝑉*: 𝑉<sub>𝑖</sub> relationship across 20 conditions suggest DnaA activity thresholds predict initiation timing within physiological error margins”. Future work will implement simultaneously DnaA activity and replication forks by using microfluidic single-cell tracking.

      (5) The delays observed by the authors (in both directions) between the peaks of DnaAactivity conditional averages with respect to volume and the average estimated initiation volumes are not incompatible with those observed dynamically by Iuliani and coworkers. The direct experiment to prove the authors' point would be to use a direct proxy of replication initiation, such as SeqA or DnaN, and monitor initiations and quantify DnaA activity peaks jointly, with dynamic measurements.

      We acknowledge the observed temporal deviations between DnaA activity peaks (𝑉*) and population-derived volumes at initiation ( 𝑉<sub>𝑖</sub>) in certain conditions, in line with the findings of Iuliani et al. This might be mechanistically consistent with the time required for orisome assembly or oriC sequestration. They do not contradict our core finding that initiation occurs at a defined DnaA activity threshold (slope=1.0, R<sub>2</sub>=0.98 in 𝑉*: 𝑉<sub>𝑖</sub> correlation).

      (6) While not being an expert, I had some doubt that the fact that the reporters are on plasmid (despite a normalization control that seems very sensible) might affect the measurements. Also, I did not understand how the authors validated the assumptions that the reporters are sensitive to DnaA-ATP specifically. It seems this assumption is validated by previous studies only.

      We employed a plasmid-based reporter system to circumvent the significant confounding effects of chromosomal position on promoter activity, as extensively documented by Pountain et al., where local genomic context (e.g., nucleoid occlusion, supercoiling gradients, and neighboring operons) introduces uncontrolled variability. By housing the P<sub>syn66</sub> test promoter and P<sub>con</sub> normalization control in identical low-copy pSC101 vectors (<8 copies/ cell, Peterson & Phillips, Plasmid 2008), we ensured they experience equivalent physical and biochemical environments. This ratiometric design, where DnaA activity is calculated, actively corrects for global fluctuations in RNA polymerase availability, nucleotide pools, and plasmid copy number. Critically, P<sub>syn66</sub>’s architecture emulates natural DnaA-responsive elements: its strong DnaAboxes report free DnaA concentration, while its weak box is preferentially bound by DnaA-ATP (Speck et al., EMBO journal 1999), mirroring the nucleotide-state sensitivity of oriC and the native dnaA promoter. This system was indispensable for our central finding, as it uniquely enabled the decoupling of DnaA activity oscillations from transcriptional feedback (Fig. 4F-H), an experiment fundamentally impossible with chromosomally integrated reporters due to autoregulatory interference.

      Overall Appraisal:

      In summary, this appears as a very interesting study, providing valuable data and a novel hypothesis, the extrusion model, open to future explorations. However, given several limitations, some of the claims appear overstated. Finally, the text contains some selfevaluations, such as "our findings redefine the paradigm for replication control", etc., that appear exaggerated.

      We thank the reviewer for highlighting the need for precise language in framing our conclusions. We have implemented the following substantive revisions throughout the manuscript to ensure claims align strictly with empirical evidence:

      (1) Changed "redefine the paradigm for replication control" into "advance the paradigm for replication control" (Introduction)

      (2) Changed "redefine bacterial cell cycle control" into "refine bacterial cell cycle control as a dynamic interplay..." (Discussion)

      (3) Removed the term "spatial" from the Discussion's description of DnaA-chromosome interactions (Discussion, first paragraph).

      (4) Changed "provides a blueprint" into "provides a valuable tool for dissecting spatial regulation..." (Discussion, final paragraph)

      (5) Scrutinized all superlatives (e.g., "critical feat" into "important capability"; "fundamental principle of cellular organization" into "potential organizational strategy")

      (6) Replaced the instances of "robust" with evidence-backed descriptors (e.g., "sensitive," "consistent")

      (7) We agree that the extrusion model requires further validation and have emphasized this in Discussion: "While H-NS perturbation supports extrusion mechanism, future work should identify the full extruder interactome and elucidate how metabolic signals modulate their activity" (final paragraph)

      This calibrated language more accurately represents our study as a conceptual advance with testable mechanisms, not a complete paradigm shift.

      Reviewer #2 (Public review):

      Summary:

      The authors show that in E. coli, the initiator protein DnaA oscillates post-translationally: its activity rises and peaks exactly when DNA replication begins, even if dnaA transcription is held constant. To explain this, they propose an "extrusion" mechanism in which nucleoidassociated proteins such as H-NS, whose amount grows with cell volume, dislodge DnaA from chromosomal binding sites; modelling and H-NS perturbations reproduce the observed drop in initiation mass and extra initiations seen after dnaA shut-down. Together, the data and model link biomass growth to replication timing through chromosome-driven, posttranslational control of DnaA, filling gaps left by classic titration and ATP/ADP-switch models.

      Strengths:

      (1) Introduces an "extrusion" model that adds a new post-translational layer to replication control and explains data unexplained by classic titration or ATP/ADP-switch frameworks.

      (2) A major asset of the study is that it bridges the longstanding gap between DnaA oscillations and DNA-replication initiation, providing direct single-cell evidence that pulses of DnaA activity peak exactly at the moment of initiation across multiple growth conditions and genetic perturbations.

      (3) A tunable dnaA strain and targeted H-NS manipulations shift initiation mass exactly as the model predicts, giving model-driven validation across growth conditions.

      (4) A purpose-built Psyn66 reporter combined with mRNA-FISH captures DnaA-activity pulses with cell-cycle resolution, providing direct, compelling data.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) What happens to the (C+D) period and initiation time as the dnaA mRNA level changes? This is not discussed in the text or figure and should be addressed.

      We thank the reviewer for this important observation. Our data demonstrate that increased dnaA mRNA levels induce two compensatory changes in cell cycle progression:

      (1) Earlier replication initiation, manifested as a reduced initiation mass: the initiation mass decreased from 5.6 to 2.6 (OD<sub>600</sub>·ml per 10<sup>10</sup> cells) as the relative dnaA mRNA level increased from 0.2 to 7.2 (normalized to the wild-type level) (Fig. 2F, red).

      (2) Prolonged C+D period: Increased by approximately 60% (from 1.05 to 1.66 hours, Fig. 2F blue).

      The complete quantitative relationship is now explicitly described in the Results section: “Concurrently, the initiation mass was reduced by 50%, and the period from initiation to division (C+D) was increased by ~60% (Fig. 2F)”

      (2) It is unclear what is meant by "relative dnaA mRNA level." Relative to what? Wild-type expression? Maximum expression? This should be explicitly defined.

      The relative dnaA mRNA level was obtained by normalizing to that in wild-type MG1655 cells grown in the same medium. To clarify this point, we have now marked the wild-type level in Fig. 1B, and a clear description of this has also been included in the figure caption.

      (3) It would be helpful to provide some intuition for why an increase in dnaA mRNA level leads to a decrease in initiation mass per ori and an increase in oriC copy number.

      Thank you for your valuable suggestion. Increased dnaA mRNA accelerates DnaA accumulation, causing cells to reach the initiation threshold at a smaller cell size (reducing initiation mass, Fig. 2F red). This earlier initiation increases oriC copies per cell at populational level (Fig. 2E). This mechanistic interpretation now appears in the Results: “As the DnaA expression level increases, DnaA activity reaches the initiation threshold earlier. Given that cell mass remained nearly unchanged, this earlier initiation led to an increase in population-averaged cellular oriC numbers (Fig. 2E).”

      (4) The titration and switch models do not explicitly include dnaA mRNA in the dynamics of DnaA protein. Yet, in Figure 2G, initiation mass is shown to decrease linearly with dnaA mRNA level in these models. How was dnaA mRNA level represented or approximated in these simulations?

      All models presented in this article omit explicit modeling of dnaA mRNA dynamics for simplicity. However, at steady state, the relative level of dnaA mRNA can be approximated by the relative expression rate of DnaA protein, as both reflect the expression level of DnaA. This detail is now clarified in the caption of Figure 2G.

      (5) Is Schaechter's law (i.e., exponential scaling of average cell size with growth rate) still valid under the different dnaA mRNA expression conditions tested?

      Schaechter's law describes the exponential scaling of average cell size with growth rate in bacteria. In our prior work (Zheng et al., Nature Microbiology 2020), where we demonstrated that Schaechter's law fails in slow-growth regimes. However, in current study, growth rate remained constant across different dnaA expression levels (Fig. 2C), and cell mass showed no significant change (Fig. 2D). Since Schaechter's law specifically addresses how cell size scales with growth rate, it does not apply here, as growth rate was invariant in our perturbations, which selectively alter replication initiation dynamics, not growth rate or size scaling.

      (6) The manuscript should explain more explicitly how the extrusion model implements posttranslational control of DnaA and, in particular, how this yields the nonlinear drop in relative initiation mass versus dnaA mRNA seen in Figure 6E. Please provide the governing equation that links total DnaA, the volume-dependent "extruder" pool, and the threshold of free DnaA at initiation, and show - briefly but quantitatively - how this equation produces the observed concave curve.

      The governing equations linking initiation mass and DnaA expression level is now provided in Supplementary Note S1 for both the titration and the extrusion model. In general, the dependence of initiation mass (𝑉<sub>𝐼</sub>) on dnaA expression level (𝛼<sub>𝐴</sub>) dependency takes an inverse 1 proportionality form: . In the extrusion model, the incorporated extruder protein is assumed to have similar synthesis dynamics as DnaA and can release DnaA from DnaA-box. After denoting the synthesis rate of the extruder as 𝛼<sub>𝐻</sub>, the combined effect of DnaA and the extruder on replication initiation can be briefly described as: . Then the additive contribution of 𝛼<sub>𝐻</sub> dampens the sensitivity of initiation mass to changes in 𝛼<sub>𝐴</sub>, resulting in a significantly flattened curve. As a result, the predicted 𝑉<sub>𝐼</sub> − 𝛼<sub>𝐴</sub> relationship has a concave shape in the semi-log plots.

      (7) Does this Extrusion model give well well-known adder per origin, i.e., initiation to initiation is an adder.

      Yes, the extrusion model can provide the initiation-to-initiation adder phenomenon, this information was provided in fig. S3C.

      (8) DnaA protein or activity is never measured; mRNA is treated as a linear proxy. Yet the authors' own narrative stresses post-translational (not transcriptional) control of DnaA. Without parallel immunoblots or activity readouts, it is impossible to know whether a sixfold mRNA increase truly yields a proportional rise in active DnaA.

      We acknowledge the reviewer's valid concern regarding the indirect nature of our DnaA activity measurements. While mRNA levels alone cannot resolve active DnaA dynamics, our approach integrates functional replication outcomes with a validated synthetic reporter to infer activity. Crucially, elevated dnaA mRNA causes demonstrable biological effects: earlier replication initiation (Fig. 2F) and increased oriC copies (Fig. 2E), directly confirming enhanced functional DnaA activity at the oriC locus. The P<sub>syn66</sub> reporter, engineered with DnaA-boxes mirroring oriC's architecture, provides orthogonal validation, showing progressive repression to dnaA induction (Fig. 3C). Our operational metric , bases on P<sub>syn66</sub> responds sensitively to DnaA-chromosome interactions within its characterized 8-fold dynamic range (Fig. 3C). Immunoblots would be inadequate here, as they cannot distinguish functionally critical pools: free versus chromosome-bound DnaA, or DnaA-ATP versus DnaAADP, precisely the post-translational states our study implicates in regulation. We therefore prioritize functional readouts (initiation timing) and the P<sub>syn66</sub> reporter, which probes the biologically active fraction relevant to replication control.

      (9) Figure 2 infers both initiation mass and oriC copy number from bulk measurements (OD<sub>600</sub> per cell and rifampicin-cephalexin run-out) instead of measuring them directly in single cells. Any DnaA-dependent changes in cell size, shape, or antibiotic permeability could skew these bulk proxies, so the plotted relationships may not accurately reflect true initiation events.

      We acknowledge the reviewer's valid methodological concern and clarify that while bulk measurements carry inherent limitations, our approach is grounded in established techniques with demonstrated reliability. Cell mass was inferred from OD600/cell, which correlates strongly with direct dry weight measurements and microscopic cell volumes across diverse growth conditions, as validated in our prior work (Zheng et al., Nature Microbiology 2020). Crucially, cell mass remained invariant across dnaA expression levels (Fig. 2D).

      Regarding oriC quantification, the rifampicin-cephalexin run-out assay is a wildly applied for replication initiation studies. Our data shows expected 2<sup>n</sup> oriC distributions without abnormal ploidy (as shown below). While single-cell methods offer superior resolution, our bulk approach provides accurate population-level trends.

      Author response image 1.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers felt that the mathematical modeling was not adequately explained in the paper, and that this affected the readability of the manuscript. The authors are encouraged to elaborate on this aspect of the paper (in addition to strengthening other claims, if possible, per the reviewers' comments).

      We thank the editor and reviewers for their constructive feedback. We have comprehensively strengthened the mathematical modeling framework to enhance clarity and rigor.

      Reviewer #1 (Recommendations for the authors):

      The only revision I would do is a recalibration of the claims and a major effort to clarify the modeling part (including a detailed SI appendix), without necessarily performing additional work.

      To enhance mathematical modeling transparency, we have completed model description in the method section and a parameter table with literature-sourced values in Supplementary Information Table S6. Moreover, analytical derivations of initiation mass dependencies are performed and presented in the Supplementary Information Note S1.

      Of course, there are extra experiments (mentioned in the public review) that would help support some of the big claims, but that can be considered a different project.

      Thank you for your suggestion. This will be addressed in our future work.

      Minor suggestion: please put signposts or plot jointly to compare the maxima/minima in Figures 4D, E, G, and H.

      We added dashed lines in Figures 4D, and E, to synchronize visualization of DnaA activity peaks and transcriptional minima across panels, facilitating direct biological comparisons.

      Reviewer #2 (Recommendations for the authors):

      (1) Should define what DNA activity is.

      We have explicitly defined DnaA activity in the Introduction as “the capacity to initiate replication…” and noted that it is “governed by free DnaA concentration, DnaA-ATP/-ADP ratio, and orisome assembly competence”.

      (2) Word repetition - “...grown in in Luria-Bertani (LB) medium...”.

      Corrected.

      (3) Typographical error - “FISH ... was preformed" should be "performed”.

      Corrected.

      (4) The manuscript alternates between “ng ml<sup>-1</sup>” and “ng·ml<sup>-1</sup>”; choose one style and apply it uniformly.

      Standardized the units to ng·ml<sup>-1</sup> throughout.

      (5) Reference duplicates - Some citations appear twice in the bibliography (e.g., "Bintu et al., 2005a/b" and "Bintu et al., 2005b" listed again later).

      The studies by Bintu et al. (2005a, 2005b) represent separate works: 2005a details applications, and 2005b develops models.

    1. Author response:

      Response to Reviewer #1:

      We plan to extend the discussion section to discuss the clinical implications of this new function. We will note the algorithm's applicability to broader genetic counseling contexts beyond cancer risk assessment.

      Response to Reviewer #2:

      We will clarify the four points raised:

      (1) "Close-to-optimal" definition: We will explain that in multiple-mating cases, finding the global optimum is NP-hard (equivalent to the Weighted Feedback Vertex Set problem). We will clarify that our greedy algorithm provides practically efficient solutions suitable for clinical use, though without theoretical optimality guarantees.

      (2) Example clarity: We will improve Figure 1's caption to explain the cost calculations and note that with equal weights, both shown solutions are equivalent.

      (3) Non-optimal examples: We will describe scenarios where the greedy algorithm may not achieve the global optimum, particularly in multiple-mating cases with heterogeneous weights.

      (4) Warning message: The current version not provide a warning when the solution might be non-optimal. This may be added in the future to the function.

      We appreciate your feedback and suggestions to help improve the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability, accuracy, and robustness; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters even in the presence of noise, which is important to ensure results from future hypothesis testing are meaningful.

      We appreciate the positive feedback on our open-source tool that delivers rapid forward simulations and flexible Bayesian model inversion for a broad range of whole-brain models, with extensive in-silico validation, including scenarios with dynamical/additive noise.

      Weaknesses

      The paper still lacks appropriate quantitative benchmarking relative to non-Bayesian-based inference tools, especially with respect to performance accuracy and computational complexity and efficiency. Without this benchmarking, it is difficult to fully comprehend the power of the software or its ability to be extended to contexts beyond large-scale computational brain modelling.

      Non-Bayesian inference methods were beyond the scope of this study, as we focused on full posterior estimation to enable uncertainty quantification and detection of degeneracy. Their advantages and disadvantages are briefly discussed in the Introduction and Discussion sections.

      Reviewer #2 (Public review):

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strength:

      Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.

      Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.

      The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of models. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.

      In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network is well-trained.

      We thank the reviewer for the careful consideration of important aspects of the VBI tool, such as uncertainty quantification rather than point estimation, degeneracy detection, features selection, parallelization, and amortization strategy.

      Weaknesses:

      The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy. - A lot of simulations are required to train the posterior estimator, which is computationally more expensive than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. The payoff is that once the estimator is well-trained, the parameter inversion will be very fast given new data. However, it is not clear to me how often such use cases would be encountered. It would be very helpful if the authors could provide a few more concrete examples of using trained models for hypothesis testing, e.g., in various disease conditions.

      We agree with the reviewer that for some parameters the z-score is large, which could be due to the limited number of simulations, the informativeness of the data features, or non-identifiability, and we do address these possible limitations in the Discussion. In line with our previous study, we stick to Bayesian metrics such as posterior z-scores and shrinkage. The application of an amortized strategy needs to be demonstrated in future work, for example in anonymized personalization of virtual brain twins (Baldy et al., 2025).

      Ref: Baldy N, Woodman MM, Jirsa VK. Amortizing personalization in virtual brain twins. arXiv preprint arXiv:2506.21155.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors want to keep the term "spatio-temporal" data features to make it consistent with the language they use in their code, even though they only refer to statistical and temporal features of the time series. I stand by my previous comment that this is misleading and should be avoided as much as possible because it doesn't take into account the actual spatial characteristics of the data. At the very least, the authors should recognize this in the text.

      We have now recognized this point.

      (2) There are still some things that need further clarification and/or explanation:

      (a) It remains unclear why PCA needs to be applied to the FC/FCD matrices. It was also unclear how many PCs were kept as data features.

      We aim to use as many features as possible as a battery of metrics to reduce the number of simulations. The role of each feature can be investigated in future studies.  For instance, PCA is used in the LEiDA approach (Cabral et al., 2017) to enhance robustness to high-frequency noise, thereby overcoming a limitation common to all quasi-instantaneous measures of FC. In this work, the default setting was two PCA components. 

      Ref:  Cabral J, Vidaurre D, Marques P, Magalhães R, Silva Moreira P, Miguel Soares J, Deco G, Sousa N, Kringelbach ML. Cognitive performance in healthy older adults relates to spontaneous switching between states of functional connectivity during rest. Scientific reports. 2017 Jul 11;7(1):5135.

      (b) It was also unclear which features were used for each model. This is important for reproducibility and to make the users of the software aware of which features are most likely to work best for each model.

      We have done our best to indicate the class of features used in each case. This is illustrated more clearly in the notebook examples provided in the repository.

      Reviewer #2 (Recommendations for the authors):

      Thanks for responding to my suggestions. Here is only one remaining point:

      Section 2.1: Please mention the atlas used to parcellate the brain; without this information, readers won't know what area 88 is in Figure 1, for example. 

      We have now mentioned this point. In this study we used AAL Atlas.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In recent years, our understanding of the nuclear steps of the HIV-1 life cycle has made significant advances. It has emerged that HIV-1 completes reverse transcription in the nucleus and that the host factor CPSF6 forms condensates around the viral capsid. The precise function of these CPSF6 condensates is under investigation, but it is clear that the HIV-1 capsid protein is required for their formation. This study by Tomasini et al. investigates the genesis of the CPSF6 condensates induced by HIV-1 capsid, what other co-factors may be required, and their relationship with nuclear speckels (NS). The authors show that disruption of the condensates by the drug PF74, added post-nuclear entry, blocks HIV-1 infection, which supports their functional role. They generated CPSF6 KO THP-1 cell lines, in which they expressed exogenous CPSF6 constructs to map by microscopy and pull down assays of the regions critical for the formation of condensates. This approach revealed that the LCR region of CPSF6 is required for capsid binding but not for condensates whereas the FG region is essential for both. Using SON and SRRM2 as markers of NS, the authors show that CPSF6 condensates precede their merging with NS but that depletion of SRRM2, or SRRM2 lacking the IDR domain, delays the genesis of condensates, which are also smaller. 

      The study is interesting and well conducted and defines some characteristics of the CPSF6-HIV-1 condensates. Their results on the NS are valuable. The data presented are convincing. 

      I have two main concerns. Firstly, the functional outcome of the various protein mutants and KOs is not evaluated. Although Figure 1 shows that disruption of the CPSF6 puncta by PF74 impairs HIV-1 infection, it is not clear if HIV-1 infection is at all affected by expression of the mutant CPSF6 forms (and SRRM2 mutants) or KO/KD of the various host factors. The cell lines are available, so it should be possible to measure HIV-1 infection and reverse transcription. Secondly, the authors have not assessed if the effects observed on the NS impact HIV-1 gene expression, which would be interesting to know given that NS are sites of highly active gene transcription. With the reagents at hand, it should be possible to investigate this too. 

      We thank the reviewer for her/his valuable feedback on our manuscript. We are pleased to see her/his appreciation of our results, and we did our utmost to address the highlighted points to further improve our work.

      To correctly perform the infectivity assay, we generated stable cell clones—a process that required considerable time, particularly during the selection of clones expressing protein levels comparable to wild-type (WT) cells. To accurately measure infectivity, it was essential to use stable clones expressing the most important deletion mutant, ∆FG CPSF6, at levels similar to those of CPSF6 in WT cells (new Fig.5 A-B). Importantly, we assessed the reproducibility of our experiments by freezing and thawing these clones.

      Regarding SRRM2, in THP-1 cells we were only able to achieve a knockdown, which still retains residual SRRM2 protein, albeit at much lower levels. Due to the essential role of SRRM2 in cell survival, obtaining a complete knockout in this cell line is not feasible, making it difficult to draw definitive conclusions from these experiments.

      In contrast, 293T cells carrying the endogenous SRRM2 deletion mutant (ΔIDR) cannot be infected with replication-competent HIV-1, as they lack expression of CD4 and either CCR4 or CCR5. These cells were instead used to monitor the dynamics of CPSF6 puncta assembly within nuclear speckles. However, they are not a suitable model for studying the impact of the depletion of SRRM2 in viral infection.

      Thus, we performed infectivity assays in a more relevant cell line for HIV-1 infection, THP-1 macrophage-like cells, using both a single-round virus and a replication-competent virus. The new results, shown in Figure 5 C-D, indicate that complete depletion of CPSF6 reduces infectivity, as measured by luciferase expression in a single-round infection (KO: ~65%; ΔFG: ~74%; compared to WT: 100% on average). Notably, a more pronounced defect in viral particle production was observed when WT virus was used for infection (KO: ~21%; ΔFG: ~16%; compared to WT: 100% on average). These findings support the referee’s insightful suggestion that the absence of CPSF6 could also impair HIV-1 gene expression. 

      Reviewer #2 (Public review): 

      Summary: 

      HIV-1 infection induces CPSF6 aggregates in the nucleus that contain the viral protein CA. The study of the functions and composition of these nuclear aggregates have raised considerable interest in the field, and they have emerged as sites in which reverse transcription is completed and in the proximity of which viral DNA becomes integrated. In this work, the authors have mutated several regions of the CPSF6 protein to identify the domains important for nuclear aggregation, in addition to the alreadyknown FG region; they have characterized the kinetics of fusion between CPSF6 aggregates and SC35 nuclear speckles and have determined the role of two nuclear speckle components in this process (SRRM2, SUN2). 

      Strengths: 

      The work examines systematically the domains of CPSF6 of importance for nuclear aggregate formation in an elegant manner in which these mutants complement an otherwise CPSF6-KO cell line. In addition, this work evidences a novel role for the protein SRRM2 in HIV-induced aggregate formation, overall advancing our comprehension of the components required for their formation and regulation. 

      Weaknesses: 

      Some of the results presented in this manuscript, in particular the kinetics of fusion between CPSF6aggregates and SC35 speckles have been published before (PMID: 32665593; 32997983). 

      The observations of the different effects of CPSF6 mutants, as well as SRRM2/SUN2 silencing experiments are not complemented by infection data which would have linked morphological changes in nuclear aggregates to function during viral infection. More importantly, these functional data could have helped stratify otherwise similar morphological appearances in CPSF6 aggregates. 

      Overall, the results could be presented in a more concise and ordered manner to help focus the attention of the reader on the most important issues. Most of the figures extend to 3-4 different pages and some information could be clearly either aggregated or moved to supplementary data. 

      First, we thank the reviewer for her/his appreciation of our study and to give to us the opportunity to better explain our results and to improve our manuscript. We appreciate the reviewer’s positive feedback on our study, and we will do our best to address her/his concerns. In the meantime, we would like to clarify the focus of our study. Our research does not aim to demonstrate an association between CPSF6 condensates (we use the term "condensates" rather than "aggregates," as aggregates are generally non-dynamic (Alberti & Hyman, 2021; Banani et al., 2017; Scoca et al., JMCB 2022), and our work specifically examines the dynamic behavior of CPSF6 puncta formed during infection and nuclear speckles. The association between CPSF6 puncta and NS has already been established in previous studies, as noted in the manuscript (PMID: 32665593; 32997983). The previous studies (PMID: 32665593; 32997983) showed that CPSF6 puncta colocalize with SC35 upon HIV infection and in the submitted study we study their kinetics.

      About the point highlighted by the reviewer: "Kinetics of fusion between CPSF6-aggregates and SC35 speckles have been published before."  

      Our study differs from prior work PMID 32665593 because we utilize a full-length HIV genome, and we did not follow the integrase (IN) fluorescence in trans and its association with CPSF6 but we specifically assess if CPSF6 clusters form in the nucleus independently of NS factors and next to fuse with them. In the current study we evaluated the dynamics of formation of CPSF6/NS puncta, which it has not been explored before. Given this focus, we believe that our work offers a novel perspective on the molecular interactions that facilitate HIV / CPSF6-NS fusion.

      We calculated that 27% of CPSF6 clusters were independent from NS at 6 h post-infection, compared to only 9% at 30 h. This likely reflects a reduction in individual clusters as more become fused with nuclear speckles over time. At the same time, these data suggest that the fusion process can begin even earlier. Indeed, it has been reported that in macrophages, the peak of viral nuclear import occurs before 6 h post-infection (doi: 10.1038/s41564-020-0735-8).

      In addition, we have incorporated new experiments assessing viral infectivity in the absence of CPSF6, or in CPSF6-knockout cells expressing either a CPSF6 mutant lacking the FG peptide or the WT protein. As shown in our new Figure 5, these results demonstrate that the FG peptide is critical for viral replication in THP-1 cells.

      For better clarity, we would like to specify that our study focuses on the role of SON, a scaffold factor of nuclear speckles, rather than SUN2 (SUN domain-containing protein 2), which is a component of the LINC (Linker of Nucleoskeleton and Cytoskeleton) complex.

      As suggested by the reviewer, we have revised the text and combined figures to improve clarity and facilitate reader comprehension. We appreciate the constructive comment of the reviewer.

      Reviewer #3 (Public review): 

      In this study, the authors investigate the requirements for the formation of CPSF6 puncta induced by HIV-1 under a high multiplicity of infection conditions. Not surprisingly, they observe that mutation of the Phe-Gly (FG) repeat responsible for CPSF6 binding to the incoming HIV-1 capsid abrogates CPSF6 punctum formation. Perhaps more interestingly, they show that the removal of other domains of CPSF6, including the mixed-charge domain (MCD), does not affect the formation of HIV-1-induced CPSF6 puncta. The authors also present data suggesting that CPSF6 puncta form individual before fusing with nuclear speckles (NSs) and that the fusion of CPSF6 puncta to NSs requires the intrinsically disordered region (IDR) of the NS component SRRM2. While the study presents some interesting findings, there are some technical issues that need to be addressed and the amount of new information is somewhat limited. Also, the authors' finding that deletion of the CPSF6 MCD does not affect the formation of HIV-1-induced CPSF6 puncta contradicts recent findings of Jang et al. (doi.org/10.1093/nar/gkae769). 

      We thank the reviewer for her/his thoughtful feedback and the opportunity to elaborate on why our findings provide a distinct perspective compared to those of Jang et al. (doi.org/10.1093/nar/gkae769).

      One potential reason for the differences between our findings and those of Jang et al. could be the choice of experimental systems. Jang et al. conducted their study in HEK293T cells with CPSF6 knockouts, as described in Sowd et al., 2016 (doi.org/10.1073/pnas.1524213113). In contrast, our work focused on macrophage-like THP-1 cells, which share closer characteristics with HIV-1’s natural target cells. 

      Our approach utilized a complete CPSF6 knockout in THP-1 cells, enabling us to reintroduce untagged versions of CPSF6, such as wild-type and deletion mutants, to avoid potential artifacts from tagging. Jang et al. employed HA-tagged CPSF6 constructs, which may lead to subtle differences in experimental outcomes due to the presence of the tag.

      Finally, our investigation into the IDR of SRRM2 relied on CRISPR-PAINT to generate targeted deletions directly in the endogenous gene (Lester et al., 2021, DOI: 10.1016/j.neuron.2021.03.026). This approach provided a native context for studying SRRM2’s role.

      We will incorporate these clarifications into the discussion section of the revised manuscript.  

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 2E: The statistical analysis should be extended to the comparison between the "+HIV" samples. 

      We showed the statistics between only HIV+ cells now new Fig. 2D.  

      (2) Figure 4A top panel is out of focus. 

      We modified the figure now figure 6A.

      Reviewer #2 (Recommendations for the authors): 

      (1) Some of the sentences could be rewritten for the sake of simplicity, also taking care to avoid overstatement. 

      We modified the sentences as best as we could.

      (2) For instance: There is no evidence that "viral genomes in nuclear niches may be contributing to the formation of viral reservoirs" (lines 33-35). 

      We changed the sentence as follows: “Despite antiretroviral treatment, viral genomes can persist in these nuclear niches and reactivate upon treatment interruption, raising the possibility that they could play a role in the establishment of viral reservoirs.”

      (3) Line 53: unclear sentence. "The initial stages of the viral life cycle have been understood....." The authors certainly mean reverse transcription, but as formulated this is not clear. The authors should also bear in mind that reverse transcription starts already in budding/just released virions. 

      We clarified the concept as follows: “the initial stages of the viral life cycle, such as the reverse transcription (the conversion of the viral RNA in DNA) and the uncoating (loss of the capsid), have been understood to mainly occur within the host cytoplasm.”

      (4) Line 124: the results in Figure 1 are not at all explained in the text. PF74 does not act on CPSF6, it acts on CA and this in turn leads to CPS6 puncta disappearance. 

      PF74 binds the same hydrophobic pocket of the viral core as CPSF6. However, when viral cores are located within CPSF6 puncta, treatment with a high dose of PF74 leads to a rapid disassembly of these puncta, while viral cores remain detectable up to 2 hours post-treatment (Ay et al., EMBO J. 2024). Here, we simply describe what we observed by confocal microscopy. Said that HIV-Induced CPSF6 Puncta include both CPSF6 proteins and viral cores as we have now specified.

      (5) Line 130; 'hinges into two key ...' should be 'hinges on'. 

      Thanks we modified it.

      (6) Supplementary Figures are not cited sequentially in the text. 

      We have now modified the numbers of the supplementary figures according to their appearance in the text.

      (7) Line 44: define FG. 

      We defined it.

      Reviewer #3 (Recommendations for the authors): 

      Specific comments that the authors should address are outlined below. 

      (1) As mentioned in the summary above, the authors' findings seem to be in direct contradiction with recent work published by Alan Engelman's lab in NAR. The authors should address the possible reason(s) for this discrepancy. 

      We mention the potential reasons for the differences in the results between our study and Engelman’s lab study in the discussion.

      (2) The major finding here that deletion of the CFSF6 FG repeat prevents the formation of CFSP6 puncta is unsurprising, as the FG repeat is responsible for capsid binding. This has been reported previously and such mutants have been used as controls in other studies. 

      Our study demonstrates that the FG domain is the sole region responsible for the formation of CPSF6 puncta, rather than the LCR or MCD domains. The unique role of the FG domain in CPSF6 that promotes the formation of CPSF6 puncta without the help of the other IDRs during viral infection is a finding particularly novel, as it has not yet been reported in the literature.

      (3) Line 339, the authors state: "incoming viral RNA has been observed to be sequestered in nuclear niches in cells treated with the reversible reverse transcriptase inhibitor, NEV. When macrophage-like cells are infected in the presence of NEV, the incoming viral RNA is held within the nucleus (Rensen et al., 2021; Scoca et al., 2023). This scenario is comparable to what is observed in patients undergoing antiretroviral therapy". In what way is this comparable to what is observed in individuals on ART? I see no basis for this statement. Sequestration of viral RNA in the nucleus is not the basis for maintaining the viral reservoir in individuals on therapy. 

      Thanks, we rephrased the sentence.

      (4) General comment: analyzing single-cell-derived KO clones is very risky because of random clonal variability between individual cells in the population. If single-cell-derived clones are used, phenotypes could be confirmed with multiple, independent clones. 

      We used a clone completely KO for CPSF6 mainly to investigate the role of a specific domain in condensate formation and it will be difficult that clone selection could have introduced artifacts in this context. Other available clones retain residual endogenous protein, which prevents us from accurately assessing CPSF6 cluster formation in the various deletion mutants. A complete CPSF6 knockout is essential for studying puncta formation, as it eliminates potential artifacts arising from protein tags that could alter the phase separation properties of the protein under investigation.

      (5) Line 214. "It is predicted to form two short α helices and a ß strand, arranged as: α helix - FG - ß strand - α helix". What is this based on? No citation is provided and no data are shown. 

      In fact, the statement "It is predicted to form two short α helices and a ß strand, arranged as: α helix - FG - ß strand - α helix" is based on the data shown in Figure 4E presenting data generated by PSIPRED. 

      (6) Figure 1B. "Luciferase values were normalized by total proteins revealed with the Bradford kit". What does this mean? I couldn't find anything explaining how the viral inputs were normalized. 

      The amount of the virus used is the same for all samples, we used MOI 10 as described in the legend of Figure 1. It is important to normalize the RLU (luciferase assay) with the total amount of proteins to be sure that we are comparing similar number of cells. Obviously, the cells were plated on the same amount on each well, the normalization in our case it is just an additional important control.

      (7) I can't interpret what is being shown in the movies. 

      We updated the movie 1B and rephrased the movie legends and we added a new suppl. Fig.4B.

      (8) Figure 5B. The differences seen are very small and of questionable significance. The data suggest that by 6 hpi, around 75% of HIV-induced CPSF6 puncta are already fused with NSs. 

      We calculated that 27% of CPSF6 clusters were independent from NS at 6 h post-infection, compared to only 9% at 30 h. This likely reflects a reduction in individual clusters as more become fused with nuclear speckles over time. At the same time, these data suggest that the fusion process can begin even earlier. Indeed, it has been reported that in macrophages, the peak of viral nuclear import occurs before 6 h post-infection (doi: 10.1038/s41564-020-0735-8).

      (9) Figure 6. Immunofluorescence is not a good method for quantifying KD efficiency. The authors should perform western blotting to measure KD efficiency. This is an important point, because the effect sizes are small, quite likely due to incomplete KD. 

      We performed WB and quantified the results, which correlated with the IF data and their imaging analysis. These new findings have been incorporated into Figure 8A. Of note, deletion of the IDR of SRRM2 does not affect the number of SON puncta (Fig.8C), but significantly reduces the number of CPSF6 puncta in infected cells compared to those expressing full-length SRRM2 (Fig.8D).

      (10) There are a variety of issues with the text that should be corrected. 

      The authors use "RT" to mean both the enzyme (reverse transcriptase) and the process (reverse transcription). This is incorrect and will confuse the reader. RT refers to the enzyme (noun, not verb). 

      The commonly used abbreviation for nevirapine is NVP, not NEV. 

      In line 60, it is stated that the capsid contains 250 hexamers. This number is variable, depending on the size and shape of the capsid. By contrast, the capsid has exactly 12 pentamers. 

      Line 75. Typo: "nuclear niches containing, such as like". 

      Line 82. Typo: "the mechanism behinds". 

      Line 102. Typo: "we aim to elucidate how these HIV-induced CPSF6 form". 

      Line 107. Type: "CPSF6 is responsible for tracking the viral core" ("trafficking the viral core"?). 

      Thanks, we corrected all of them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Zhu and colleagues used high-density Neuropixel probes to perform laminar recordings in V1 while presenting either small stimuli that stimulated the classical receptive field (CRF) or large stimuli whose border straddled the RF to provide nonclassical RF (nCRF) stimulation. Their main question was to understand the relative contribution of feedforward (FF), feedback (FB), and horizontal circuits to border ownership (Bown), which they addressed by measuring crosscorrelation across layers. They found differences in cross-correlation between feedback/horizontal (FH) and input layers during CRF and nCRF stimulation. 

      Although the data looks high quality and analyses look mostly fine, I had a lot of difficulty understanding the logic in many places. Examples of my concerns are written below. 

      (1) What is the main question? The authors refer to nCRF stimulation emerging from either feedback from higher areas or horizontal connections from within the same area (e.g. lines 136 to 138 and again lines 223-232). I initially thought that the study would aim to distinguish between the two. However, the way the authors have clubbed the layers in 3D, the main question seems to be whether Bown is FF or FH (i.e., feedback and horizontal are clubbed). Is this correct? If so, I don't see the logic, since I can't imagine Bown to be purely FF. Thus, just showing differences between CRF stimulation (which is mainly expected to be FF) and nCRF stimulation is not surprising to me. 

      We thank the reviewer for their thoughtful comments. As explained in the discussion, we grouped cortical layers to reduce uncertainty in precisely assigning laminar boundaries and to increase statistical power. Consequently, this limits our ability to distinguish the relative contributions of feedback inputs, primarily targeting layers 1 and 6, and horizontal connections, mainly within layers 2/3 and 5. Nevertheless, previous findings, especially regarding the rapid emergence of B<sub>own</sub> signals, suggest that feedback is more biologically plausible than horizontal-based mechanisms.

      Importantly, the emergence of B<sub>own</sub> signals in the primate brain should not be taken for granted. Direct physiological evidence that distinguishes feedforward from feedback/horizontal mechanisms has been lacking. While we agree it is unlikely that B<sub>own</sub> is mediated solely by feedforward processing, we felt it was necessary to test this empirically, particularly using highresolution laminar recordings.

      As discussed, feedforward models of B<sub>own</sub> have been proposed (e.g., Super, Romeo, and Keil, 2010; Saki and Nishimura, 2006). These could, in theory, be supported by more general nCRF modulations arising through early feedforward inhibitions, such as those observed in the retinogeniculate pathway (e.g., Webb, Tinsley, Vincent and Derrington, 2005; Blitz and Regehr, 2005; Alitto and Usrey, 2008). However, most B<sub>own</sub> models rely heavily on response latency, yet very few studies have recorded across layers or areas simultaneously to address this directly. Notably, recent findings in area V4 show that B<sub>own</sub> signals emerge earlier in deep layers than in granular (input) layers, suggesting a non-feedforward origin (Franken and Reynolds, 2021).

      Furthermore, although previous studies have shown that the nCRF can modulate firing rates and the timing of neuronal firing across layers, our findings go beyond these effects. We provide clear evidence that nCRF modulation also alters precise spike timing relationships and interlaminar coordination, and that the magnitude of nCRF modulation depends on these interlaminar interactions. This supports the idea that B<sub>own</sub> , or more general nCRF modulation, involves more than local rate changes, reflecting layer-specific network dynamics consistent with feedback or lateral integration.

      (2) Choice of layers for cross-correlation analysis: In the Introduction, and also in Figure 3C, it is mentioned that FF inputs arrive in 4C and 6, while FB/Horizontal inputs arrive at "superficial" and "deep", which I take as layer 2/3 and 5. So it is not clear to me why (i) layer 4A/B is chosen for analysis for Figure 3D (I would have thought layer 6 should have been chosen instead) and (ii) why Layers 5 and 6 are clubbed. 

      We thank the reviewer for raising this important point. The confusion likely stems from our use of the terms “superficial” and “deep” layers when describing the targets of feedback/horizontal inputs. To clarify, by “superficial” and “deep,” we specifically refer to layers 1–3 and layers 5–6, respectively, as illustrated in Figure 3C. Feedback and horizontal inputs relatively avoid entire layer 4, including both 4C and 4A/B.

      We also emphasize that the classification of layers as feedforward or feedback/horizontal recipients is relative rather than absolute. For example, although layer 6 receives both feedforward and feedback/horizontal inputs, it contains a higher proportion of feedback/horizontal inputs compared to layers 4C and 4A/B. 

      We had addressed this rationale in the Discussion, but recognize it may not have been sufficiently emphasized. We have revised the main text accordingly to clarify this point for readers in the final manuscript version.

      (3) Addressing the main question using cross-correlation analysis: I think the nice peaks observed in Figure 3B for some pairs show how spiking in one neuron affects the spiking in another one, with the delay in cross-correlation function arising from the conduction delay. This is shown nicely during CRF stimulation in Figure 3D between 4C -> 2/3, for example. However, the delay (positive or negative) is constrained by anatomical connectivity. For example, unless there are projections from 2/3 back to 4C which causes firing in a 2/3 layer neuron to cause a spike in a layer 4 neuron, we cannot expect to get a negative delay no matter what kind of stimulation (CRF versus nCRF) is used. 

      We thank the reviewer for the insightful comment. The observation that neurons within FH<sub>i</sub> laminar compartments (layers 2/3, 5/6) can lead those in layer 4 (4C, 4A/B) during nCRF stimulation may indeed seem unexpected. However, several anatomical pathways could mediate the propagation of B<sub>own</sub> signals from FH<sub>i</sub> compartments to layer 4. We have revised the Discussion section in the final version of the manuscript to address this point explicitly.

      In Macaque V1, projections from layers 2/3 to 4A/B have been documented (Blasdel et al., 1985; Callaway and Wiser, 1996), and neurons in 4A/B often extend apical dendrites into layers 2/3 (Lund, 1988; Yoshioka et al., 1994). Although direct projections from layers 2/3 to 4C are generally sparse (Callaway, 1998), a subset of neurons in the lower part of layer 3 can give off collateral axons to 4C (Lund and Yoshioka, 1991). Additionally, some 4C neurons extend dendrites into 4B, enabling potential dendritic integration of inputs from more superficial layers (Somogyi and Cowey, 1981; Mates and Lund, 1983; Yabuta and Callaway, 1998). Sparse connections from 2/3 to layer 4 have also been reported in cat V1 (Binzegger, Douglas and Martin, 2004). Moreover, layers 2/3 may influence 4C neurons disynaptically, without requiring dense monosynaptic connections. 

      Importantly, while CCGs can suggest possible circuit arrangements, functional connectivity may arise through mechanisms not fully captured by traditional anatomical tracing. Indeed, the apparent discrepancy between anatomical and functional data is not uncommon. For example, although 4B is known to receive anatomical input primarily from 4Cα, but not 4Cβ, photostimulation experiments have shown that 4B neurons can also be functionally driven by 4Cβ (Sawatari and Callaway, 1996). Our observation of functional inputs from layers 2/3 to layer 4 is also consistent with prior findings in rodent V1, where CCG analysis (e.g., Figure 7 in Senzai, Fernandez-Ruiz and Buzsaki, 2019) or photostimulation (Xu et al., 2016) revealed similar pathways. 

      Layers 5/6 provide dense projections to layers 4A/B (Lund, 1988; Callaway, 1998). In particular, layer 6 pyramidal neurons, especially the subset classified as Type 1 cells, project substantially to layer 4C (Wiser and Callaway, 1996; Fitzpatrick et al., 1985). 

      Reviewer #2 (Public review): 

      Summary: 

      The authors present a study of how modulatory activity from outside the classical receptive field (cRF) differs from cRF stimulation. They study neural activity across the different layers of V1 in two anesthetized monkeys using Neuropixels probes. The monkeys are presented with drifting gratings and border-ownership tuning stimuli. They find that border-ownership tuning is organized into columns within V1, which is unexpected and exciting, and that the flow of activity from cellto-cell (as judged by cross-correlograms between single units) is influenced by the type of visual stimulus: border-ownership tuning stimuli vs. drifting-grating stimuli. 

      Strengths: 

      The questions addressed by the study are of high interest, and the use of Neuropixels probes yields extremely high numbers of single-units and cross-correlation histograms (CCHs) which makes the results robust. The study is well-described. 

      Weaknesses: 

      The weaknesses of the study are (a) the use of anesthetized animals, which raises questions about the nature of the modulatory signal being measured and the underlying logic of why a change in visual stimulus would produce a reversal in information flow through the cortical microcircuit and (b) the choice of visual stimuli, which do not uniquely isolate feedforward from feedback influences. 

      (1) The modulation latency seems quite short in Figure 2C. Have the authors measured the latency of the effect in the manuscript and how it compares to the onset of the visually driven response? It would be surprising if the latency was much shorter than 70ms given previous measurements of BO and figure-ground modulation latency in V2 and V1. On the same note, it might be revealing to make laminar profiles of the modulation (i.e. preferred - non-preferred border orientation) as it develops over time. Does the modulation start in feedback recipient layers? 

      (2) Can the authors show the average time course of the response elicited by preferred and nonpreferred border ownership stimuli across all significant neurons? 

      We thank the reviewer for the insightful comment—this is indeed an important and often overlooked point. As noted in the Discussion, B<sub>own</sub> modulation differs from other forms of figure-ground modulation (e.g., Lamme et al., 1998) in that it can emerge very rapidly in early visual cortex—within ~10–35 ms after response onset (Zhou et al., 2000; Sugihara et al., 2011). This rapid emergence has been interpreted as evidence for the involvement of fast feedback inputs, which can propagate up to ten times faster than horizontal connections (Girard et al., 2001). Moreover, interlaminar interactions via monosynaptic or disynaptic connections can occur on very short timescales (a few milliseconds), further complicating efforts to disentangle feedback influences based solely on latency.

      Thus, while the early onset of modulation in our data may appear surprising, it is consistent with prior B<sub>own</sub> findings, and likely reflects a combination of fast feedback and rapid interlaminar processing. This makes it challenging to use conventional latency measurements to resolve laminar differences in B<sub>own</sub> modulation. Latency comparisons are well known to be susceptible to confounds such as variability in response onset, luminance, contrast, stimulus size, and other sensory parameters. 

      Although we did not explicitly quantify the latency of B<sub>own</sub> modulation in this manuscript, our cross-correlation analysis provides a more sensitive and temporally resolved measure of interlaminar information flow. We therefore focused on this approach rather than laminar modulation profiles, as it more directly addresses our primary research question.

      (3) The logic of assuming that cRF stimulation should produce the opposite signal flow to borderownership tuning stimuli is worth discussing. I suspect the key difference between stimuli is that they used drifting gratings as the cRF stimulus, the movement of the stimulus continually refreshes the retinal image, leading to continuous feedforward dominance of the signals in V1. Had they used a static grating, the spiking during the sustained portion of the response might also show more influence of feedback/horizontal connections. Do the initial spikes fired in response to the borderownership tuning stimuli show the feedforward pattern of responses? The authors state that they did not look at cross-correlations during the initial response, but if they do, do they see the feedforward-dominated pattern? The jitter CCH analysis might suffice in correcting for the response transient. 

      We thank the reviewer for the insightful comment. As noted in the final Results section, our CRF and nCRF stimulation paradigms differ in respects beyond the presence or absence of nonclassical modulation, including stimulus properties within the CRF.

      We agree with the reviewer’s speculation that drifting gratings may continually refresh the retinal image, promoting sustained feedforward dominance in V1, whereas static gratings might allow greater influence from feedback/horizontal inputs during the sustained response. Likewise, the initial response to the B<sub>own</sub> stimulus could be dominated by feedforward activity before feedback/horizontal influences arrive. 

      This contrast was a central motivation for our experimental design: we deliberately used two stimulus conditions — drifting gratings to emphasize feedforward processing, and B<sub>own</sub> stimuli, which are known to engage feedback modulation — to test whether these two conditions yield different patterns of interlaminar information flow. Our results confirm that they do. While we did not separately analyze the very initial spike period, our focus is on interlaminar information flow during the sustained response, which serves as the primary measure of feedback/horizontal engagement in this study.

      Finally, beyond this direct comparison, we show in Figure 5 that under nCRF stimulation alone, the direction and strength of interlaminar information flow correlate with the magnitude of B<sub>own</sub> modulation, further supporting the idea that our cross-correlation approach reveals functionally meaningful differences in cortical processing.

      (4) The term "nCRF stimulation" is not appropriate because the CRF is stimulated by the light/dark edge. 

      We thank the reviewer for the comment. As noted in the Introduction, nCRF effects described in the literature invariably involve stimulation both inside and outside the CRF. Our use of the term “nCRF stimulation” refers to this experimental paradigm, rather than suggesting that the CRF itself is unstimulated. We hope this clarifies our use of the term.

      Reviewer #3 (Public review): 

      Summary: 

      The paper by Zhu et al is on an important topic in visual neuroscience, the emergence in the visual cortex of signals about figures and ground. This topic also goes by the name border ownership. The paper utilizes modern recording techniques very skillfully to extend what is known about border ownership. It offers new evidence about the prevalence of border ownership signals across different cortical layers in V1 cortex. Also, it uses pairwise cross-correlation to study signal flow under different conditions of visual stimulation that include the border ownership paradigm. 

      Strengths: 

      The paper's strengths are its use of multi-electrode probes to study border ownership in many neurons simultaneously across the cortical layers in V1, and its innovation of using crosscorrelation between cortical neurons -- when they are viewing border-ownership patterns or instead are viewing grating patterns restricted to the classical receptive field (CRF). 

      Weaknesses: 

      The paper's weaknesses are its largely incremental approach to the study of border ownership and the lack of a critical analysis of the cross-correlation data. The paper as it is now does not advance our understanding of border ownership; it mainly confirms prior work, and it does not challenge or revise consensus beliefs about mechanisms. However, it is possible that, in the rich dataset the authors have obtained, they do possess data that could be added to the paper to make it much stronger. 

      Critique: 

      The border ownership data on V1 offered in the paper replicates experimental results obtained by Zhou and von der Heydt (2000) and confirms the earlier results using the same analysis methods as Zhou. The incremental addition is that the authors found border ownership in all cortical layers extending Zhou's results that were only about layer 2/3. 

      The cross-correlation results show that the pattern of the cross-correlogram (CCG) is influenced by the visual pattern being presented. However, the results are not analyzed mechanistically, and the interpretation is unclear. For instance, the authors show in Figure 3 (and in Figure S2) that the peak of the CCG can indicate layer 2/3 excites layer 4C when the visual stimulus is the border ownership test pattern, a large square 8 deg on a side. But how can layer 2/3 excite layer 4C? The authors do not raise or offer an answer to this question. Similar questions arise when considering the CCG of layer 4A/B with layer 2/3. What is the proposed pathway for layer 2/3 to excite 4A/B? Other similar questions arise for all the interlaminar CCG data that are presented. What known functional connections would account for the measured CCGs? 

      We thank the reviewer for raising this important point. As noted in our response to a previous comment, several anatomical pathways could mediate apparent functional inputs from layers 2/3 to 4C and 4A/B. In macaque V1, projections from layers 2/3 to 4A/B have been documented (Blasdel et al., 1985; Callaway and Wiser, 1996), and neurons in 4A/B often extend apical dendrites into layers 2/3 (Lund, 1988; Yoshioka et al., 1994). Although direct projections from layers 2/3 to 4C are generally sparse (Callaway, 1998), a subset of lower layer 3 neurons can give off collateral axons to 4C (Lund and Yoshioka, 1991). Some 4C neurons also extend dendrites into 4B, potentially allowing dendritic integration of inputs from more superficial layers (Somogyi and Cowey, 1981; Mates and Lund, 1983; Yabuta and Callaway, 1998). Sparse connections from 2/3 to layer 4 have also been reported in cat V1 (Binzegger et al., 2004).

      Moreover, layers 2/3 may influence 4C neurons disynaptically, without requiring dense monosynaptic connections. While CCGs suggest possible circuit arrangements, functional connectivity may arise through mechanisms not fully captured by anatomical tracing, and apparent discrepancies between anatomical and functional data are not uncommon. For example, although 4B is known to receive anatomical input primarily from 4Cα, 4B neurons can also be functionally driven by 4Cβ using photostimulation (Sawatari and Callaway, 1996). Our observation of functional inputs from layers 2/3 to layer 4 is also consistent with prior findings in rodent V1, where CCG analysis (e.g., Figure 7 in Senzai, Fernandez-Ruiz and Buzsaki, 2019) or photostimulation (Xu et al., 2016) revealed similar pathways. 

      Layers 5/6 also provide dense projections to layers 4A/B (Lund, 1988; Callaway, 1998). In particular, layer 6 pyramidal neurons, especially the subset classified as Type 1 cells, project substantially to layer 4C (Wiser and Callaway, 1996; Fitzpatrick et al., 1985). 

      We have revised the Discussion section to explicitly address these points and clarify the potential anatomical and functional pathways underlying the measured interlaminar CCGs, highlighting how inputs from layers 2/3 and 5/6 to layer 4 can be mediated via both direct and indirect connections.

      The problems in understanding the CCG data are indirectly caused by the lack of a critical analysis of what is happening in the responses that reveal the border ownership signals, as in Figure 2. Let's put it bluntly - are border ownership signals excitatory or inhibitory? The reason I raise this question is that the present authors insightfully place border ownership as examples of the action of the non-classical receptive field (nCRF) of cortical cells. Most previous work on the nCRF (many papers cited by the authors) reveal the nCRF to be inhibitory or suppressive. In order to know whether nCRF signals are excitatory or inhibitory, one needs a baseline response from the CRF, so that when you introduce nCRF signals you can tell whether the change with respect to the CRF is up or down. As far as I know, prior work on border ownership has not addressed this question, and the present paper doesn't either. This is where the rich dataset that the present authors possess might be used to establish a fundamental property of border ownership. 

      Then we must go back to consider what the consequences of knowing the sign of the border ownership signal would mean for interpreting the CCG data. If the border ownership signals from extrastriate feedback or, alternatively, from horizontal intrinsic connections, are excitatory, they might provide a shared excitatory input to pairs of cells that would show up in the CCG as a peak at 0 delay. However, if the border ownership manuscript signals are inhibitory, they might work by exciting only inhibitory neurons in V1. This could have complicated consequences for the CCG.The interpretation of the CCG data in the present version of the m is unclear (see above). Perhaps a clearer interpretation could be developed once the authors know better what the border ownership signals are. 

      We thank the reviewer for raising this fundamental and thought-provoking question. As noted, B<sub>own</sub> signals arise from nCRF, which has often been associated with suppressive effects. However, Zhang and von der Heydt (2010) provided important insight into this issue by systematically varying the placement of figure fragments outside the CRF while keeping an edge centered within the CRF. They found that contextual fragments on the preferred side of B<sub>own</sub> produce facilitation, while those on the non-preferred side produce suppression. Thus, the nCRF contribution to B<sub>own</sub> reflects both excitatory and inhibitory modulation, depending on the spatial configuration of the figure.

      These effects were well explained by their model in which feedback from grouping cells in higher areas selectively enhances or suppresses V1/V2 neuron responses, depending on their B<sub>own</sub> preference. In this framework, the B<sub>own</sub> signal itself is not inherently excitatory or inhibitory; rather, it results from the net effect of feedback, which can be either facilitative or suppressive. Importantly, it is the input that is modulated — not that the receiving neurons are necessarily inhibitory themselves.

      In the current study, our analysis focused on CCGs showing excessive coincident spiking, i.e., positive peaks, which are typically interpreted as evidence for shared excitatory input or excitatory connections. Due to the limited number of connections, we did not analyze inhibitory interactions, such as anti-correlations or delayed suppression in the CCGs, which would be expected if the reference neuron were inhibitory. Therefore, the CCGs we report here likely reflect the excitatory component of the B<sub>own</sub> signal, and possibly its upstream drive via feedback. While a full separation of excitatory and inhibitory components remains an important goal for future work, our data suggest that B<sub>own</sub> modulation is at least partially mediated through excitatory feedback input.

      My critique of the CCG analysis applies to Figure 5 also. I cannot comprehend the point of showing a very weak correlation of CCG asymmetry with Border Ownership Index, especially when what CCG asymmetry means is unclear mechanistically. Figure 5 does not make the paper stronger in my opinion. 

      We thank the reviewer for this comment. As described in the Results section for Figure 5, the observation that interlaminar information flow correlates with B<sub>own</sub> modulation is important because it demonstrates that these flow patterns are specifically related to the magnitude of B<sub>own</sub> signals, independent of the comparisons between CRF and nCRF stimulation. 

      In Figure 3, the authors show two CCGs that involve 4C--4C pairs. It would be nice to know more about such pairs. If there are any 6--6 pairs, what they look like also would be interesting. The authors also in Figure 3 show CCG's of two 4C--4A/B pairs and it would be quite interesting to know how such CCGs behave when CRF and nCRF stimuli are compared. In other words, the authors have shown us they have many data but have chosen not to analyze them further or to explain why they chose not to analyze them. It might help the paper if the authors would present all the CCG types they have. This suggestion would be helpful when the authors know more about the sign of border ownership signals, as discussed at length above. 

      We thank the reviewer for the insightful comment. The rationale for selecting specific laminar pairs is described in the Results section after Figure 3C and further discussed in the Discussion. In brief, we focused on CCGs computed from pairs in which one neuron resided in laminar compartments receiving feedback/horizontal inputs (layers 2/3 and 5/6) and the other within compartments relatively devoid of these inputs (layers 4C and 4A/B).

      To mitigate uncertainty in defining exact laminar boundaries and to maximize statistical power, we combined some anatomical layers into distinct laminar compartments. This approach allowed us to compare the relative spike timing between neuronal pairs during CRF and nCRF stimulation. If feedback/horizontal inputs contribute more during nCRF than CRF stimulation, we expect this to be reflected in the lead-lag relationships of the CCGs. While other pairs (e.g., 5/6–5/6 or 4C– 4A/B) could in principle be analyzed, the hypothesized patterns for these pairs are less clear, and thus they were not the focus of our study. Nonetheless, these additional pairs represent interesting directions for future work.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank all the reviewers for their constructive comments. We have carefully considered your feedback and revised the manuscript accordingly. The major concern raised was the applicability of SegPore to the RNA004 dataset. To address this, we compared SegPore with f5c and Uncalled4 on RNA004, and found that SegPore demonstrated improved performance, as shown in Table 2 of the revised manuscript.

      Following the reviewers’ recommendations, we updated Figures 3 and 4. Additionally, we added one table and three supplementary figures to the revised manuscript:

      · Table 2: Segmentation benchmark on RNA004 data

      · Supplementary Figure S4: RNA translocation hypothesis illustrated on RNA004 data

      · Supplementary Figure S5: Illustration of Nanopolish raw signal segmentation with eventalign results

      · Supplementary Figure S6: Running time of SegPore on datasets of varying sizes

      Below, we provide a point-by-point response to your comments.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore-direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo, and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.

      Strengths:

      SegPore addresses an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single-read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.

      Weaknesses:

      In addition to Nanopolish and Tombo, f5c and Uncalled4 can also be used for segmentation, however, the comparison to these methods is not shown.

      The method was only applied to data from the RNA002 direct RNA-Sequencing version, which is not available anymore, currently, it remains unclear if the methods still work on RNA004.

      Thank you for your comments.

      To clarify the background, there are two kits for Nanopore direct RNA sequencing: RNA002 (the older version) and RNA004 (the newer version). Oxford Nanopore Technologies (ONT) introduced the RNA004 kit in early 2024 and has since discontinued RNA002. Consequently, most public datasets are based on RNA002, with relatively few available for RNA004 (as of 30 June 2025).

      Nanopolish and Tombo were developed for raw signal segmentation and alignment using RNA002 data, whereas f5c and Uncalled4are the only two software supporting RNA004 data.  Since the development of SegPore began in January 2022, we initially focused on RNA002 due to its data availability. Accordingly, our original comparisons were made against Nanopolish and Tombo using RNA002 data.

      We have now updated SegPore to support RNA004 and compared its performance against f5c and Uncalled4 on three public RNA004 datasets.

      As shown in Table 2 of the revised manuscript, SegPore outperforms both f5c and Uncalled4 in raw signal segmentation. Moreover, the jiggling translocation hypothesis underlying SegPore is further supported, as shown in Supplementary Figure S4.

      The overall improvement in accuracy appears to be relatively small.

      Thank you for the comment.

      We understand that the improvements shown in Tables 1 and 2 may appear modest at first glance due to the small differences in the reported standard deviation (std) values. However, even small absolute changes in std can correspond to substantial relative reductions in noise, especially when the total variance is low.

      To better quantify the improvement, we assume that approximately 20% of the std for Nanopolish, Tombo, f5c, and Uncalled4 arises from noise. Using this assumption, we calculate the relative noise reduction rate of SegPore as follows:

      Noise reduction rate = (baseline std − SegPore std) / (0.2 × baseline std) ​​

      Based on this formula, the average noise reduction rates across all datasets are:

      - SegPore vs Nanopolish: 49.52%

      - SegPore vs Tombo: 167.80%

      - SegPore vs f5c: 9.44%

      - SegPore vs Uncalled4: 136.70%

      These results demonstrate that SegPore can reduce the noise level by at least 9% given a noise level of 20%, which we consider a meaningful improvement for downstream tasks, such as base modification detection and signal interpretation. The high noise reduction rates observed in Tombo and Uncalled4 (over 100%) suggest that their actual noise proportion may be higher than our 20% assumption.

      We acknowledge that this 20% noise level assumption is an approximation. Our intention is to illustrate that SegPore provides measurable improvements in relative terms, even when absolute differences appear small.

      The run time and resources that are required to run SegPore are not shown, however, it appears that the GPU version is essential, which could limit the application of this method in practice.

      Thank you for your comment.

      Detailed instructions for running SegPore are provided in github (https://github.com/guangzhaocs/SegPore). Regarding computational resources, SegPore currently requires one CPU core and one Nvidia GPU to perform the segmentation task efficiently.

      We present SegPore’s runtime for typical datasets in Supplementary Figure S6 in the revised manuscript.  For a typical 1 GB fast5 file, the segmentation takes approximately 9.4 hours using a single NVIDIA DGX‑1 V100 GPU and one CPU core.

      Currently, GPU acceleration is essential to achieve practical runtimes with SegPore. We acknowledge that this requirement may limit accessibility in some environments. To address this, we are actively working on a full C++ implementation of SegPore that will support CPU-only execution. While development is ongoing, we aim to release this version in a future update.

      Reviewer #2 (Public review):

      Summary:

      The work seeks to improve the detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level.

      As such, the title, abstract, and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved.

      The work itself shows minor improvements in m6Anet when replacing Nanopolish eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.

      Strengths:

      A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced the assignment of all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as data points could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.

      For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools.<br /> Additionally, the GMM used for calling the m6A modifications provides a useful, simple, and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.

      Weaknesses:

      The work seems limited in applicability largely due to the focus on the R9's 5mer models. The R9 flow cells are phased out and not available to buy anymore. Instead, the R10 flow cells with larger kmer models are the new standard, and the applicability of this tool on such data is not shown. We may expect similar behaviour from the raw sequencing data where the noise and transition states are still helpful, but the increased kmer size introduces a large amount of extra computing required to process data and without knowledge of how SegPore scales, it is difficult to tell how useful it will really be. The discussion suggests possible accuracy improvements moving to 7mers or 9mers, but no reason why this was not attempted.

      Thank you for pointing out this important limitation. Please refer to our response to Point 1 of Reviewer 1 for SegPore’s performance on RNA004 data. Notably, the jiggling behavior is also observed in RNA004 data, and SegPore achieves better performance than both f5c and Uncalled4.

      The increased k-mer size in RNA004 affects only the training phase of SegPore (refer to Supplementary Note 1, Figure 5 for details on the training and testing phases). Once the baseline means and standard deviations for each k-mer are established, applying SegPore to RNA004 data proceeds similarly to RNA002. This is because each k-mer in the reference sequence has, at most, two states (modified and unmodified). While the larger k-mer size increases the size of the parameter table, it does not increase the computational complexity during segmentation. Although estimating the initial k-mer parameter table requires significant time and effort on our part, it does not affect the runtime for end users applying SegPore to RNA004 data.

      Extending SegPore from 5-mers to 7-mers or 9-mers for RNA002 data would require substantial effort to retrain the model and generate sufficient training data. Additionally, such an extension would make SegPore’s output incompatible with widely used upstream and downstream tools such as Nanopolish and m6Anet, complicating integration and comparison. For these reasons, we leave this extension for future work.

      The manuscript suggests the eventalign results are improved compared to Nanopolish. While this is believably shown to be true (Table 1), the effect on the use case presented, downstream differentiation between modified and unmodified status on a base/kmer, is likely limited as during actual modification calling the noisy distributions are usually 'good enough', and not skewed significantly in one direction to really affect the results too terribly.

      Thank you for your comment. While current state-of-the-art (SOTA) methods perform well on benchmark datasets, there remains significant room for improvement. Most SOTA evaluations are based on limited datasets, primarily covering DRACH motifs in human and mouse transcriptomes. However, m6A modifications can also occur in non-DRACH motifs, where current models may underperform. Additionally, other RNA modifications—such as pseudouridine, inosine, and m5C—are less studied, and their detection may benefit from improved signal modeling.

      We would also like to emphasize that raw signal segmentation and RNA modification detection are distinct tasks. SegPore focuses on the former, providing a cleaner, more interpretable signal that can serve as a foundation for downstream tasks. Improved segmentation may facilitate the development of more accurate RNA modification detection algorithms by the community.

      Scientific progress often builds incrementally through targeted improvements to foundational components. We believe that enhancing signal segmentation, as SegPore does, contributes meaningfully to the broader field—the full impact will become clearer as the tool is adopted into more complex workflows.

      Furthermore, looking at alternative approaches where this kind of segmentation could be applied, Nanopolish uses the main segmentation+alignment for a first alignment and follows up with a form of targeted local realignment/HMM test for modification calling (and for training too), decreasing the need for the near-perfect segmentation+alignment this work attempts to provide. Any tool applying a similar strategy probably largely negates the problems this manuscript aims to improve upon.

      We thank the reviewer for this insightful comment.

      To clarify, Nanopolish provides three independent commands: polya, eventalign, and call-methylation.

      - The polya command identifies the adapter, poly(A) tail, and transcript region in the raw signal.

      - The eventalign command aligns the raw signal to a reference sequence, assigning a signal segment to individual k-mers in the reference.

      - The call-methylation command detects methylated bases from DNA sequencing data.

      The eventalign command corresponds to “the main segmentation+alignment for a first alignment,” while call-methylation corresponds to “a form of targeted local realignment/HMM test for modification calling,” as mentioned in the reviewer’s comment. SegPore’s segmentation is similar in purpose to Nanopolish’s eventalign, while its RNA modification estimation component is similar in concept to Nanopolish’s call-methylation.

      We agree the general idea may appear similar, but the implementations are entirely different. Importantly, Nanopolish’s call-methylation is designed for DNA sequencing data, and its models are not trained to recognize RNA modifications. This means they address distinct research questions and cannot be directly compared on the same RNA modification estimation task. However, it is valid to compare them on the segmentation task, where SegPore exhibits better performance (Table 1).

      We infer the reviewer may suggest that because m6Anet is a deep neural network capable of learning from noisy input, the benefit of more accurate segmentation (such as that provided by SegPore) might be limited. This concern may arise from the limited improvement of SegPore+m6Anet over Nanopolish+m6Anet in bulk analysis (Figure 3). Several factors may contribute to this observation:

      (i) For reads aligned to the same gene in the in vivo data, alignment may be inaccurate due to pseudogenes or transcript isoforms.

      (ii) The in vivo benchmark data are inherently more complex than in vitro datasets and may contain additional modifications (e.g., m5C, m7G), which can confound m6A calling by altering the signal baselines of k-mers.

      (iii) m6Anet is trained on events produced by Nanopolish and may not be optimal for SegPore-derived events.

      (iv) The benchmark dataset lacks a modification-free (IVT) control sample, making it difficult to establish a true baseline for each k-mer.

      In the IVT data (Figure 4), SegPore shows a clear improvement in single-molecule m6A identification, with a 3~4% gain in both ROC-AUC and PR-AUC. This demonstrates SegPore’s practical benefit for applications requiring higher sensitivity at the molecule level.

      As noted earlier, SegPore’s contribution lies in denoising and improving the accuracy of raw signal segmentation, which is a foundational step in many downstream analyses. While it may not yet lead to a dramatic improvement in all applications, it already provides valuable insights into the sequencing process (e.g., cleaner signal profiles in Figure 4) and enables measurable gains in modification detection at the single-read level. We believe SegPore lays the groundwork for developing more accurate and generalizable RNA modification detection tools beyond m6A.

      We have also added the following sentence in the discussion to highlight SegPore’s limited performance in bulk analysis:

      “The limited improvement of SegPore combined with m6Anet over Nanopolish+m6Anet in bulk in vivo analysis (Figure 3) may be explained by several factors: potential alignment inaccuracies due to pseudogenes or transcript isoforms, the complexity of in vivo datasets containing additional RNA modifications (e.g., m5C, m7G) affecting signal baselines, and the fact that m6Anet is specifically trained on events produced by Nanopolish rather than SegPore. Additionally, the lack of a modification-free control (in vitro transcribed) sample in the benchmark dataset makes it difficult to establish true baselines for each k-mer. Despite these limitations, SegPore demonstrates clear improvement in single-molecule m6A identification in IVT data (Figure 4), suggesting it is particularly well suited for in vitro transcription data analysis.”

      Finally, in the segmentation/alignment comparison to Nanopolish, the latter was not fitted(/trained) on the same data but appears to use the pre-trained model it comes with. For the sake of comparing segmentation/alignment quality directly, fitting Nanopolish on the same data used for SegPore could remove the influences of using different training datasets and focus on differences stemming from the algorithm itself.

      In the segmentation benchmark (Table 1), SegPore uses the fixed 5-mer parameter table provided by ONT. The hyperparameters of the HHMM are also fixed and not estimated from the raw signal data being segmented. Only in the m6A modification task,  SegPore does perform re-estimation of the baselines for the modified and unmodified states of k-mers. Therefore, the comparison with Nanopolish is fair, as both tools rely on pre-defined models during segmentation.

      Appraisal:

      The authors have shown their method's ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.

      Impact:

      With the current developments for Nanopore-based modification largely focusing on Artificial Intelligence, Neural Networks, and the like, improvements made in interpretable approaches provide an important alternative that enables a deeper understanding of the data rather than providing a tool that plainly answers the question of whether a base is modified or not, without further explanation. The work presented is best viewed in the context of a workflow where one aims to get an optimal alignment between raw signal data and the reference base sequence for further processing. For example, as presented, as a possible replacement for Nanopolish eventalign. Here it might enable data exploration and downstream modification calling without the need for local realignments or other approaches that re-consider the distribution of raw data around the target motif, such as a 'local' Hidden Markov Model or Neural Networks. These possibilities are useful for a deeper understanding of the data and further tool development for modification detection works beyond m6A calling.

      Reviewer #3 (Public review):

      Summary:

      Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however, many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.

      Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.

      Strengths:

      This method is well-described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.

      Weaknesses:

      However, the manuscript has a significant drawback in its current version. The most recent nanopore RNA base callers can distinguish between different ribonucleotide modifications, however, SegPore has not been benchmarked against these models.

      I recommend that re-submission of the manuscript that includes benchmarking against the rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0 dorado models, which are reported to detect m5C, m6A_DRACH, inosine_m6A and PseU.<br /> A clear demonstration that SegPore also outperforms the newer RNA base caller models will confirm the utility of this method.

      Thank you for highlighting this important limitation. While Dorado, the new ONT basecaller, is publicly available and supports modification-aware basecalling, suitable public datasets for benchmarking m5C, inosine, m6A, and PseU detection on RNA004 are currently lacking. Dorado’s modification-aware models are trained on ONT’s internal data, which is not publicly released. Therefore, it is not currently feasible to evaluate or directly compare SegPore’s performance against Dorado for m5C, inosine, m6A, and PseU detection.

      We would also like to emphasize that SegPore’s main contribution lies in raw signal segmentation, which is an upstream task in the RNA modification detection pipeline. To assess its performance in this context, we benchmarked SegPore against f5c and Uncalled4 on public RNA004 datasets for segmentation quality. Please refer to our response to Point 1 of Reviewer 1 for details.

      Our results show that the characteristic “jiggling” behavior is also observed in RNA004 data (Supplementary Figure S4), and SegPore achieves better segmentation performance than both f5c and Uncalled4 (Table 2).

      Recommendations for the authors:

      Reviewing Editor:

      Please note that we also received the following comments on the submission, which we encourage you to take into account:

      took a look at the work and for what I saw it only mentions/uses RNA002 chemistry, which is deprecated, effectively making this software unusable by anyone any more, as RNA002 is not commercially available. While the results seem promising, the authors need to show that it would work for RNA004. Notably, there is an alternative software for resquiggling for RNA004 (not Tombo or Nanopolish, but the GPU-accelerated version of Nanopolish (f5C), which does support RNA004. Therefore, they need to show that SegPore works for RNA004, because otherwise it is pointless to see that this method works better than others if it does not support current sequencing chemistries and only works for deprecated chemistries, and people will keep using f5C because its the only one that currently works for RNA004. Alternatively, if there would be biological insights won from the method, one could justify not implementing it in RNA004, but in this case, RNA002 is deprecated since March 2024, and the paper is purely methodological.

      Thank you for the comment. We agree that support for current sequencing chemistries is essential for practical utility. While SegPore was initially developed and benchmarked on RNA002 due to the availability of public data, we have now extended SegPore to support RNA004 chemistry.

      To address this concern, we performed a benchmark comparison using public RNA004 datasets against tools specifically designed for RNA004, including f5c and Uncalled4. Please refer to our response to Point 1 of Reviewer 1 for details. The results show that SegPore consistently outperforms f5c and Uncalled4 in segmentation accuracy on RNA004 data.

      Reviewer #2 (Recommendations for the authors):

      Various statements are made throughout the text that require further explanation, which might actually be defined in more detail elsewhere sometimes but are simply hard to find in the current form.

      (1) Page 2, “In this technique, five nucleotides (5mers) reside in the nanopore at a time, and each 5mer generates a characteristic current signal based on its unique sequence and chemical properties (16).”

      5mer? Still on R9 or just ignoring longer range influences, relevant? It is indeed a R9.4 model from ONT.

      Thank you for the observation. We apologize for the confusion and have clarified the relevant paragraph to indicate that the method is developed for RNA002 data by default. Specifically, we have added the following sentence:

      “Two versions of the direct RNA sequencing (DRS) kits are available: RNA002 and RNA004. Unless otherwise specified, this study focuses on RNA002 data.”

      (2) Page 3, “Employ models like Hidden Markov Models (HMM) to segment the signal, but they are prone to noise and inaccuracies.”

      That's the alignment/calling part, not the segmentation?

      Thank you for the comment. We apologize for the confusion. To clarify the distinction between segmentation and alignment, we added a new paragraph before the one in question to explain the general workflow of Nanopore DRS data analysis and to clearly define the task of segmentation. The added text reads:

      “The general workflow of Nanopore direct RNA sequencing (DRS) data analysis is as follows. First, the raw electrical signal from a read is basecalled using tools such as Guppy or Dorado, which produce the nucleotide sequence of the RNA molecule. However, these basecalled sequences do not include the precise start and end positions of each ribonucleotide (or k-mer) in the signal. Because basecalling errors are common, the sequences are typically mapped to a reference genome or transcriptome using minimap2 to recover the correct reference sequence. Next, tools such as Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish. Based on this alignment, Nanopolish extracts various features—such as the start and end positions, mean, and standard deviation of the signal segment corresponding to a k-mer. This signal segment or its derived features is referred to as an "event" in Nanopolish.”

      We also revised the following paragraph describing SegPore to more clearly contrast its approach:

      “In SegPore, we first segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM), where each fragment corresponds to a sub-state of a k-mer. Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence, SegPore aligns the mean values of these small fragments to the reference. After alignment, we concatenate all fragments that map to the same k-mer into a larger segment, analogous to the "eventalign" output in Nanopolish. For RNA modification estimation, we use only the mean signal value of each reconstructed event.”

      We hope this revision clarifies the difference between segmentation and alignment in the context of our method and resolves the reviewer’s concern.

      (3) Page 4, Figure 1, “These segments are then aligned with the 5mer list of the reference sequence fragment using a full/partial alignment algorithm, based on a 5mer parameter table. For example, 𝐴𝑗 denotes the base "A" at the j-th position on the reference.”

      I think I do understand the meaning, but I do not understand the relevance of the Aj bit in the last sentence. What is it used for?

      When aligning the segments (output from Step 2) to the reference sequence in Step 3, it is possible for multiple segments to align to the same k-mer. This can occur particularly when the reference contains consecutive identical bases, such as multiple adenines (A). For example, as shown in Fig. 1A, Step 3, the first two segments (μ₁ and μ₂) are aligned to the first 'A' in the reference sequence, while the third segment is aligned to the second 'A'. In this case, the reference sequence AACTGGTTTC...GTC, which contains exactly two consecutive 'A's at the start. This notation helps to disambiguate segment alignment in regions with repeated bases.

      Additionally, this figure and its subscript include mapping with Guppy and Minimap2 but do not mention Nanopolish at all, while that seems an equally important step in the preprocessing (pg5). As such it is difficult to understand the role Nanopolish exactly plays. It's also not mentioned explicitly in the SegPore Workflow on pg15, perhaps it's part of step 1 there?

      We thank the reviewer for pointing this out. We apologize for the confusion. As mentioned in the public response to point 3 of Reviewer 2, SegPore uses Nanopolish to identify the poly(A) tail and transcript regions from the raw signal. SegPore then performs segmentation and alignment on the transcript portion only. This step is indeed part of Step 1 in the preprocessing workflow, as described in Supplementary Note 1, Section 3.

      To clarify this in the main text, we have updated the preprocessing paragraph on page 6 to explicitly describe the role of Nanopolish:

      “We begin by performing basecalling on the input fast5 file using Guppy, which converts the raw signal data into ribonucleotide sequences. Next, we align the basecalled sequences to the reference genome using Minimap2, generating a mapping between the reads and the reference sequences. Nanopolish provides two independent commands: "polya" and "eventalign".
The "polya" command identifies the adapter, poly(A) tail, and transcript region in the raw signal, which we refer to as the poly(A) detection results. The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read. The "eventalign" command aligns the raw signal to a reference sequence, assigning a signal segment to individual k-mers in the reference. It also computes summary statistics (e.g., mean, standard deviation) from the signal segment for each k-mer. Each k-mer together with its corresponding signal features is termed an event. These event features are then passed into downstream tools such as m6Anet and CHEUI for RNA modification detection. For full transcriptome analysis (Figure 3), we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events for each read by using the first and last events as start and end points. For in vitro transcription (IVT) data with a known reference sequence (Figure 4), we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish’s poly(A) detection results.”

      Additionally, we revised the legend of Figure 1A to explicitly include Nanopolish in step 1 as follows:

      “The raw current signal fragments are paired with the corresponding reference RNA sequence fragments using Nanopolish.”

      (4) Page 5, “The output of Step 3 is the "eventalign," which is analogous to the output generated by the Nanopolish "eventalign" command.”

      Naming the function of Nanopolish, the output file, and later on (pg9) the alignment of the newly introduced methods the exact same "eventalign" is very confusing.

      Thank you for the helpful comment. We acknowledge the potential confusion caused by using the term “eventalign” in multiple contexts. To improve clarity, we now consistently use the term “events” to refer to the output of both Nanopolish and SegPore, rather than using "eventalign" as a noun. We also added the following sentence to Step 3 (page 6) to clearly define what an “event” refers to in our manuscript:

      “An "event" refers to a segment of the raw signal that is aligned to a specific k-mer on a read, along with its associated features such as start and end positions, mean current, standard deviation, and other relevant statistics.”

      We have revised the text throughout the manuscript accordingly to reduce ambiguity and ensure consistent terminology.

      (5) Page 5, “Once aligned, we use Nanopolish's eventalign to obtain paired raw current signal segments and the corresponding fragments of the reference sequence, providing a precise association between the raw signals and the nucleotide sequence.”

      I thought the new method's HHMM was supposed to output an 'eventalign' formatted file. As this is not clearly mentioned elsewhere, is this a mistake in writing? Is this workflow dependent on Nanopolish 'eventalign' function and output or not?

      We apologize for the confusion. To clarify, SegPore is not dependent on Nanopolish’s eventalign function for generating the final segmentation results. As described in our response to your comment point 2 and elaborated in the revised text on page 4, SegPore uses its own HHMM-based segmentation model to divide the raw signal into small fragments, each corresponding to a sub-state of a k-mer. These fragments are then aligned to the reference sequence based on their mean current values.

      As explained in the revised manuscript:

      “In SegPore, we first segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM), where each fragment corresponds to a sub-state of a k-mer. Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence, SegPore aligns the mean values of these small fragments to the reference. After alignment, we concatenate all fragments that map to the same k-mer into a larger segment, analogous to the "eventalign" output in Nanopolish. For RNA modification estimation, we use only the mean signal value of each reconstructed event.”

      To avoid ambiguity, we have also revised the sentence on page 5 to more clearly distinguish the roles of Nanopolish and SegPore in the workflow. The updated sentence now reads:

      “Nanopolish provides two independent commands: "polya" and "eventalign".
The "polya" command identifies the adapter, poly(A) tail, and transcript region in the raw signal, which we refer to as the poly(A) detection results. The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read. The "eventalign" command aligns the raw signal to a reference sequence, assigning a signal segment to individual k-mers in the reference. It also computes summary statistics (e.g., mean, standard deviation) from the signal segment for each k-mer. Each k-mer together with its corresponding signal features is termed an event. These event features are then passed into downstream tools such as m6Anet and CHEUI for RNA modification detection. For full transcriptome analysis (Figure 3), we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events for each read by using the first and last events as start and end points. For in vitro transcription (IVT) data with a known reference sequence (Figure 4), we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish’s poly(A) detection results.”

      (6) Page 5, “Since the polyA tail provides a stable reference, we normalize the raw current signals across reads, ensuring that the mean and standard deviation of the polyA tail are consistent across all reads.”

      Perhaps I misread this statement: I interpret it as using the PolyA tail to do the normalization, rather than using the rest of the signal to do the normalization, and that results in consistent PolyA tails across all reads.

      If it's the latter, this should be clarified, and a little detail on how the normalization is done should be added, but if my first interpretation is correct:

      I'm not sure if its standard deviation is consistent across reads. The (true) value spread in this section of a read should be fairly limited compared to the rest of the signal in the read, so the noise would influence the scale quite quickly, and such noise might be introduced to pores wearing down and other technical influences. Is this really better than using the non-PolyA tail part of the reads signal, using Median Absolute Deviation to scale for a first alignment round, then re-fitting the signal scaling using Theil Sen on the resulting alignments (assigned read signal vs reference expected signal), as Tombo/Nanopolish (can) do?

      Additionally, this kind of normalization should have been part of the Nanopolish eventalign already, can this not be re-used? If it's done differently it may result in different distributions than the ONT kmer table obtained for the next step.

      Thank you for this detailed and thoughtful comment. We apologize for the confusion. The poly(A) tail–based normalization is indeed explained in Supplementary Note 1, Section 3, but we agree that the motivation needed to be clarified in the main text.

      We have now added the following sentence in the revised manuscript (before the original statement on page 5 to provide clearer context:

      “Due to inherent variability between nanopores in the sequencing device, the baseline levels and standard deviations of k-mer signals can differ across reads, even for the same transcript. To standardize the signal for downstream analyses, we extract the raw current signal segments corresponding to the poly(A) tail of each read. Since the poly(A) tail provides a stable reference, we normalize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads. This step is crucial for reducing…..”

      We chose to use the poly(A) tail for normalization because it is sequence-invariant—i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.

      In our newly added RNA004 benchmark experiment, we used the default normalization provided by f5c, which does not include poly(A) tail normalization. Despite this, SegPore was still able to mask out noise and outperform both f5c and Uncalled4, demonstrating that our segmentation method is robust to different normalization strategies.

      (7) Page 7, “The initialization of the 5mer parameter table is a critical step in SegPore's workflow. By leveraging ONT's established kmer models, we ensure that the initial estimates for unmodified 5mers are grounded in empirical data.”

      It looks like the method uses Nanopolish for a first alignment, then improves the segmentation matching the reference sequence/expected 5mer values. I thought the Nanopolish model/tables are based on the same data, or similarly obtained. If they are different, then why the switch of kmer model? Now the original alignment may have been based on other values, and thus the alignment may seem off with the expected kmer values of this table.

      Thank you for this insightful question. To clarify, SegPore uses Nanopolish only to identify the poly(A) tail and transcript regions from the raw signal. In the bulk in vivo data analysis, we use Nanopolish’s first event as the start and the last event as the end to extract the aligned raw signal chunk and its corresponding reference sequence. Since SegPore relies on Nanopolish solely to delineate the transcript region for each read, it independently aligns the raw signals to the reference sequence without refining or adjusting Nanopolish’s segmentation results.

      While SegPore's 5-mer parameter table is initially seeded using ONT’s published unmodified k-mer models, we acknowledge that empirical signal values may deviate from these reference models due to run-specific technical variation and the presence of RNA modifications. For this reason, SegPore includes a parameter re-estimation step to refine the mean and standard deviation values of each k-mer based on the current dataset.

      The re-estimation process consists of two layers. In the outer layer, we select a set of 5mers that exhibit both modified and unmodified states based on the GMM results (Section 6 of Supplementary Note 1), while the remaining 5mers are assumed to have only unmodified states. In the inner layer, we align the raw signals to the reference sequences using the 5mer parameter table estimated in the outer layer (Section 5 of Supplementary Note 1). Based on the alignment results, we update the 5mer parameter table in the outer layer. This two-layer process is generally repeated for 3~5 iterations until the 5mer parameter table converges.This re-estimation ensures that:

      (1) The adjusted 5mer signal baselines remain close to the ONT reference (for consistency);

      (2) The alignment score between the observed signal and the reference sequence is optimized (as detailed in Equation 11, Section 5 of Supplementary Note 1);

      (3) Only 5mers that show a clear difference between the modified and unmodified components in the GMM are considered subject to modification.

      By doing so, SegPore achieves more accurate signal alignment independent of Nanopolish’s models, and the alignment is directly tuned to the data under analysis.

      (8) Page 9, “The output of the alignment algorithm is an eventalign, which pairs the base blocks with the 5mers from the reference sequence for each read (Fig. 1C).”

      “Modification prediction

      After obtaining the eventalign results, we estimate the modification state of each motif using the 5mer parameter table.”

      This wording seems to have been introduced on page 5 but (also there) reads a bit confusingly as the name of the output format, file, and function are now named the exact same "eventalign". I assume the obtained eventalign results now refer to the output of your HHMM, and not the original Nanopolish eventalign results, based on context only, but I'd rather have a clear naming that enables more differentiation.

      We apologize for the confusion. We have revised the sentence as follows for clarity:

      “A detailed description of both alignment algorithms is provided in Supplementary Note 1. The output of the alignment algorithm is an alignment that pairs the base blocks with the 5mers from the reference sequence for each read (Fig. 1C). Base blocks aligned to the same 5-mer are concatenated into a single raw signal segment (referred to as an “event”), from which various features—such as start and end positions, mean current, and standard deviation—are extracted. Detailed derivation of the mean and standard deviation is provided in Section 5.3 in Supplementary Note 1. In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis or the segmentation task. ”

      (9) Page 9, “Since a single 5mer can be aligned with multiple base blocks, we merge all aligned base blocks by calculating a weighted mean. This weighted mean represents the single base block mean aligned with the given 5mer, allowing us to estimate the modification state for each site of a read.”

      I assume the weights depend on the length of the segment but I don't think it is explicitly stated while it should be.

      Thank you for the helpful observation. To improve clarity, we have moved this explanation to the last paragraph of the previous section (see response to point 8), where we describe the segmentation process in more detail.

      Additionally, a complete explanation of how the weighted mean is computed is provided in Section 5.3 of Supplementary Note 1. It is derived from signal points that are assigned to a given 5mer.

      (10) Page 10, “Afterward, we manually adjust the 5mer parameter table using heuristics to ensure that the modified 5mer distribution is significantly distinct from the unmodified distribution.”

      Using what heuristics? If this is explained in the supplementary notes then please refer to the exact section.

      Thank you for pointing this out. The heuristics used to manually adjust the 5mer parameter table are indeed explained in detail in Section 7 of Supplementary Note 1.

      To clarify this in the manuscript, we have revised the sentence as follows:

      “Afterward, we manually adjust the 5mer parameter table using heuristics to ensure that the modified 5mer distribution is significantly distinct from the unmodified distribution (see details in Section 7 of Supplementary Note 1).”

      (11) Page 10, “Once the table is fixed, it is used for RNA modification estimation in the test data without further updates.”

      By what tool/algorithm? Perhaps it is your own implementation, but with the next section going into segmentation benchmarking and using Nanopolish before this seems undefined.

      Thank you for pointing this out. We use our own implementation. See Algorithm 3 in Section 6 of Supplementary Note 1.

      We have revised the sentence for clarity:

      “Once a stabilized 5mer parameter table is estimated from the training data, it is used for RNA modification estimation in the test data without further updates. A more detailed description of the GMM re-estimation process is provided in Section 6 of Supplementary Note 1.”

      (12) Page 11, “A 5mer was considered significantly modified if its read coverage exceeded 1,500 and the distance between the means of the two Gaussian components in the GMM was greater than 5.”

      Considering the scaling done before also not being very detailed in what range to expect, this cutoff doesn't provide any useful information. Is this a pA value?

      Thank you for the observation. Yes, the value refers to the current difference measured in picoamperes (pA). To clarify this, we have revised the sentence in the manuscript to include the unit explicitly:

      “A 5mer was considered significantly modified if its read coverage exceeded 1,500 and the distance between the means of the two Gaussian components in the GMM was greater than 5 picoamperes (pA).”

      (13) Page 13, “The raw current signals, as shown in Figure 1B.”

      Wrong figure? Figure 2B seems logical.

      Thank you for catching this. You are correct—the reference should be to Figure 2B, not Figure 1B. We have corrected this in the revised manuscript.

      (14) Page 14, Figure 2A, these figures supposedly support the jiggle hypothesis but the examples seem to match only half the explanation. Any of these jiggles seem to be followed shortly by another in the opposite direction, and the amplitude seems to match better within each such pair than the next or previous segments. Perhaps there is a better explanation still, and this behaviour can be modelled as such instead.

      Thank you for your comment. We acknowledge that the observed signal patterns may appear ambiguous and could potentially suggest alternative explanations. However, as shown in Figure 2A, the red dots tend to align closely with the baseline of the previous state, while the blue dots align more closely with the baseline of the next state. We interpret this as evidence for the "jiggling" hypothesis, where k-mer temporarily oscillates between adjacent states during translocation.

      That said, we agree that more sophisticated models could be explored to better capture this behavior, and we welcome suggestions or references to alternative models. We will consider this direction in future work.

      (15) Page 15, “This occurs because subtle transitions within a base block may be mistaken for transitions between blocks, leading to inflated transition counts.”

      Is it really a "subtle transition" if it happens within a base block? It seems this is not a transition and thus shouldn't be named as such.

      Thank you for pointing this out. We agree that the term “subtle transition” may be misleading in this context. We revised the sentence to clarify the potential underlying cause of the inflated transition counts:

      “This may be due to a base block actually corresponding to a sub-state of a single 5mer, rather than each base block corresponding to a full 5mer, leading to inflated transition counts. To address this issue, SegPore’s alignment algorithm was refined to merge multiple base blocks (which may represent sub-states of the same 5mer) into a single 5mer, thereby facilitating further analysis.”

      (16) Page 15, “The SegPore "eventalign" output is similar to Nanopolish's "eventalign" command.”

      To the output of that command, I presume, not to the command itself.

      Thank you for pointing out the ambiguity. We have revised the sentence for clarity:

      “The final outputs of SegPore are the events and modification state predictions. SegPore’s events are similar to the outputs of Nanopolish’s "eventalign" command, in that they pair raw current signal segments with the corresponding RNA reference 5-mers. Each 5-mer is associated with various features — such as start and end positions, mean current, and standard deviation — derived from the paired signal segment.”

      (17) Page 15, “For selected 5mers, SegPore also provides the modification rate for each site and the modification state of that site on individual reads.”

      What selection? Just all kmers with a possible modified base or a more specific subset?

      We revised the sentence to clarify the selection criteria:

      “For selected 5mers that exhibit both a clearly unmodified and a clearly modified signal component, SegPore reports the modification rate at each site, as well as the modification state of that site on individual reads.”

      (18) Page 16, “A key component of SegPore is the 5mer parameter table, which specifies the mean and standard deviation for each 5mer in both modified and unmodified states (Figure 2A).”

      Wrong figure?

      Thank you for pointing this out. You are correct—it should be Figure 1A, not Figure 2A. We intended to visually illustrate the structure of the 5mer parameter table in Figure 1A, and we have corrected this reference in the revised manuscript.

      (19) Page 16, Table 1, I can't quite tell but I assume this is based on all kmers in the table, not just a m6A modified subset. A short added statement to make this clearer would help.

      Yes, you are right—it is averaged over all 5mers. We have revised the sentence for clarity as follows:

      " As shown in Table 1, SegPore consistently achieved the best performance averaged on all 5mers across all datasets..…."

      (20) Page 16, “Since the peaks (representing modified and unmodified states) are separable for only a subset of 5mers, SegPore can provide modification parameters for these specific 5mers. For other 5mers, modification state predictions are unavailable.”

      Can this be improved using some heuristics rather than the 'distance of 5' cutoff as described before? How small or big is this subset, compared to how many there should be to cover all cases?

      We agree that more sophisticated strategies could potentially improve performance. In this study, we adopted a relatively conservative approach to minimize false positives by using a heuristic cutoff of 5 picoamperes. This value was selected empirically and we did not explore alternative cutoffs. Future work could investigate more refined or data-driven thresholding strategies.

      (21) Page 16, “Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the polyA tail to ensure a fair comparison.”

      I don't know what or how something is "standardized" here.

      Standardized’ refers to the poly(A) tail–based signal normalization described in our response to point 6. We applied this normalization to Tombo’s output to ensure a fair comparison across methods. Without this standardization, Tombo’s performance was notably worse. We revised the sentence as follows:

      “Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the poly(A) tail to ensure a fair comparison (See preprocessing section in Materials and Methods).”

      (22) Page 16, “To benchmark segmentation performance, we used two key metrics: (1) the log-likelihood of the segment mean, which measures how closely the segment matches ONT's 5mer parameter table (used as ground truth), and (2) the standard deviation (std) of the segment, where a lower std indicates reduced noise and better segmentation quality. If the raw signal segment aligns correctly with the corresponding 5mer, its mean should closely match ONT's reference, yielding a high log-likelihood. A lower std of the segment reflects less noise and better performance overall.”

      Here the segmentation part becomes a bit odd:

      A: Low std can be/is achieved by dropping any noisy bits, making segments really small (partly what happens here with the transition segments). This may be 'true' here, in the sense that the transition is not really part of the segment, but the comparison table is a bit meaningless as the other tools forcibly assign all data to kmers, instead of ignoring parts as transition states. In other words, it is a benchmark that is easy to cheat by assigning more data to noise/transition states.

      B: The values shown are influenced by the alignment made between the read and expected reference signal. Especially Tombo tends to forcibly assign data to whatever looks the most similar nearby rather than providing the correct alignment. So the "benchmark of the segmentation performance" is more of an "overall benchmark of the raw signal alignment". Which is still a good, useful thing, but the text seems to suggest something else.

      Thank you for raising these important concerns regarding the segmentation benchmarking.

      Regarding point A, the base blocks aligned to the same 5mer are concatenated into a single segment, including the short transition blocks between them. These transition blocks are typically very short (4~10 signal points, average 6 points), while a typical 5mer segment contains around 20~60 signal points. To assess whether SegPore’s performance is inflated by excluding transition segments, we conducted an additional comparison: we removed 6 boundary signal points (3 from the start and 3 from the end) from each 5mer segment in Nanopolish and Tombo’s results to reduce potential noise. The new comparison table is shown in the following:

      SegPore consistently demonstrates superior performance. Its key contribution lies in its ability to recognize structured noise in the raw signal and to derive more accurate mean and standard deviation values that more faithfully represent the true state of the k-mer in the pore. The improved mean estimates are evidenced by the clearly separated peaks of modified and unmodified 5mers in Figures 3A and 4B, while the improved standard deviation is reflected in the segmentation benchmark experiments.

      Regarding point B, we apologize for the confusion. We have added a new paragraph to the introduction to clarify that the segmentation task indeed includes the alignment step.

      “The general workflow of Nanopore direct RNA sequencing (DRS) data analysis is as follows. First, the raw electrical signal from a read is basecalled using tools such as Guppy or Dorado, which produce the nucleotide sequence of the RNA molecule. However, these basecalled sequences do not include the precise start and end positions of each ribonucleotide (or k-mer) in the signal. Because basecalling errors are common, the sequences are typically mapped to a reference genome or transcriptome using minimap2 to recover the correct reference sequence. Next, tools such as Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish. Based on this alignment, Nanopolish extracts various features—such as the start and end positions, mean, and standard deviation of the signal segment corresponding to a k-mer. This signal segment or its derived features is referred to as an "event" in Nanopolish. The resulting events serve as input for downstream RNA modification detection tools such as m6Anet and CHEUI.”

      (23) Page 17 “Given the comparable methods and input data requirements, we benchmarked SegPore against several baseline tools, including Tombo, MINES (26), Nanom6A (27), m6Anet, Epinano (28), and CHEUI (29).”

      It seems m6Anet is actually Nanopolish+m6Anet in Figure 3C, this needs a minor clarification here.

      m6Anet uses Nanopolish’s estimated events as input by default.

      (24) Page 18, Figure 3, A and B are figures without any indication of what is on the axis and from the text I believe the position next to each other on the x-axis rather than overlapping is meaningless, while their spread is relevant, as we're looking at the distribution of raw values for this 5mer. The figure as is is rather confusing.

      Thanks for pointing out the confusion. We have added concrete values to the axes in Figures 3A and 3B and revised the figure legend as follows in the manuscript:

      “(A) Histogram of the estimated mean from current signals mapped to an example m6A-modified genomic location (chr10:128548315, GGACT) across all reads in the training data, comparing Nanopolish (left) and SegPore (right). The x-axis represents current in picoamperes (pA).

      (B) Histogram of the estimated mean from current signals mapped to the GGACT motif at all annotated m6A-modified genomic locations in the training data, again comparing Nanopolish (left) and SegPore (right). The x-axis represents current in picoamperes (pA).”

      (25) Page 18 “SegPore's results show a more pronounced bimodal distribution in the raw signal segment mean, indicating clearer separation of modified and unmodified signals.”

      Without knowing the correct values around the target kmer (like Figure 4B), just the more defined bimodal distribution could also indicate the (wrongful) assignment of neighbouring kmer values to this kmer instead, hence this statement lacks some needed support, this is just one interpretation of the possible reasons.

      Thank you for the comment. We have added concrete values to Figures 3A and 3B to support this point. Both peaks fall within a reasonable range: the unmodified peak (125 pA) is approximately 1.17 pA away from its reference value of 123.83 pA, and the modified peak (118 pA) is around 7 pA away from the unmodified peak. This shift is consistent with expected signal changes due to RNA modifications (usually less than 10 pA), and the magnitude of the difference suggests that the observed bimodality is more likely caused by true modification events rather than misalignment.

      (26) Page 18 “Furthermore, when pooling all reads mapped to m6A-modified locations at the GGACT motif, SegPore showed prominent peaks (Fig. 3B), suggesting reduced noise and improved modification detection.”

      I don't think the prominent peaks directly suggest improved detection, this statement is a tad overreaching.

      We revised the sentense to the following:

      “SegPore exhibited more distinct peaks (Fig. 3B), indicating reduced noise and potentially enabling more reliable modification detection”.

      (27) Page18 “(2) direct m6A predictions from SegPore's Gaussian Mixture Model (GMM), which is limited to the six selected 5mers.”

      The 'six selected' refers to what exactly? Also, 'why' this is limited to them is also unclear as it is, and it probably would become clearer if it is clearly defined what this refers to.

      It is explained the page 16 in the SegPore’s workflow in the original manuscript as follows:

      “A key component of SegPore is the 5mer parameter table, which specifies the mean and standard deviation for each 5mer in both modified and unmodified states (Fig. 2A1A). Since the peaks (representing modified and unmodified states) are separable for only a subset of 5mers, SegPore can provide modification parameters for these specific 5mers. For other 5mers, modification state predictions are unavailable.”

      e select a small set of 5mers that show clear peaks (modified and unmodified 5mers) in GMM in the m6A site-level data analysis. These 5mers are provided in Supplementary Fig. S2C, as explained in the section “m6A site level benchmark” in the Material and Methods (page 12 in the original manuscript).

      “…transcript locations into genomic coordinates. It is important to note that the 5mer parameter table was not re-estimated for the test data. Instead, modification states for each read were directly estimated using the fixed 5mer parameter table. Due to the differences between human (Supplementary Fig. S2A) and mouse (Supplementary Fig. S2B), only six 5mers were found to have m6A annotations in the test data’s ground truth (Supplementary Fig. S2C). For a genomic location to be identified as a true m6A modification site, it had to correspond to one of these six common 5mers and have a read coverage of greater than 20. SegPore derived the ROC and PR curves for benchmarking based on the modification rate at each genomic location….”

      We have updated the sentence as follows to increase clarity:

      “which is limited to the six selected 5mers that exhibit clearly separable modified and unmodified components in the GMM (see Materials and Methods for details).”

      (28) Page 19, Figure 4C, the blue 'Unmapped' needs further explanation. If this means the segmentation+alignment resulted in simply not assigning any segment to a kmer, this would indicate issues in the resulting mapping between raw data and kmers as the data that probably belonged to this kmer is likely mapped to a neighbouring kmer, possibly introducing a bimodal distribution there.

      This is due to deletion event in the full alignment algorithm. See Page 8 of SupplementaryNote1:

      During the traceback step of the dynamic programming matrix, not every 5mer in the reference sequence is assigned a corresponding raw signal fragment—particularly when the signal’s mean deviates substantially from the expected mean of that 5mer. In such cases, the algorithm considers the segment to be generated by an unknown 5mer, and the corresponding reference 5mer is marked as unmapped.

      (29) Page 19, “For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third-best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D).”

      How was this selection of motifs made, are these related to the six 5mers in the middle of Supplementary Figure S2? Are these the same six as on page 18? This is not clear to me.

      It is the same, see the response to point 27.

      (30) Page 21 “Biclustering reveals that modifications at the 6th, 7th, and 8th genomic locations are specific to certain clusters of reads (clusters 4, 5, and 6), while the first five genomic locations show similar modification patterns across all reads.”

      This reads rather confusingly. Both the '6th, 7th, and 8th genomic locations' and 'clusters 4,5,6' should be referred to in clearer terms. Either mark them in the figure as such or name them in the text by something that directly matches the text in the figure.

      We have added labels to the clusters and genomic locations Figure 4C, and revised the sentence as follows:

      “Biclustering reveals that modifications at g6 are specific to cluster C4, g7 to cluster C5, and g8 to cluster C6, while the first five genomic locations (g1 to g5) show similar modification patterns across all reads.”

      (31) Page 21, “We developed a segmentation algorithm that leverages the jiggling property in the physical process of DRS, resulting in cleaner current signals for m6A identification at both the site and single-molecule levels.”

      Leverages, or just 'takes into account'?

      We designed our HHMM specifically based on the jiggling hypothesis, so we believe that using the term “leverage” is appropriate.

      (32) Page 21, “Our results show that m6Anet achieves superior performance, driven by SegPore's enhanced segmentation.”

      Superior in what way? It barely improves over Nanopolish in Figure 3C and is outperformed by other methods in Figure 3D. The segmentation may have improved but this statement says something is 'superior' driven by that 'enhanced segmentation', so that cannot refer to the segmentation itself.

      We revise it as follows in the revised manuscript:

      ”Our results demonstrate that SegPore’s segmentation enables clear differentiation between m6A-modified and unmodified adenosines.”

      (33) Page 21, “In SegPore, we assume a drastic change between two consecutive 5mers, which may hold for 5mers with large difference in their current baselines but may not hold for those with small difference.”

      The implications of this assumption don't seem highlighted enough in the work itself and may be cause for falsely discovering bi-modal distributions. What happens if such a 5mer isn't properly split, is there no recovery algorithm later on to resolve these cases?

      We agree that there is a risk of misalignment, which can result in a falsely observed bimodal distribution. This is a known and largely unavoidable issue across all methods, including deep neural network–based methods. For example, many of these models rely on a CTC (Connectionist Temporal Classification) layer, which implicitly performs alignment and may also suffer from similar issues.

      Misalignment is more likely when the current baselines of neighboring k-mers are close. In such cases, the model may struggle to confidently distinguish between adjacent k-mers, increasing the chance that signals from neighboring k-mers are incorrectly assigned. Accurate baseline estimation for each k-mer is therefore critical—when baselines are accurate, the correct alignment typically corresponds to the maximum likelihood.

      We have added the following sentence to the discussion to acknowledge this limitation:

      “As with other RNA modification estimation methods, SegPore can be affected by misalignment errors, particularly when the baseline signals of adjacent k-mers are similar. These cases may lead to spurious bimodal signal distributions and require careful interpretation.”

      (34) Page 21, “Currently, SegPore models only the modification state of the central nucleotide within the 5mer. However, modifications at other positions may also affect the signal, as shown in Figure 4B. Therefore, introducing multiple states to the 5mer could help to improve the performance of the model.”

      The meaning of this statement is unclear to me. Is SegPore unable to combine the information of overlapping kmers around a possibly modified base (central nucleotide), or is this referring to having multiple possible modifications in a single kmer (multiple states)?

      We mean there can be modifications at multiple positions of a single 5mer, e.g. C m5C m6A m7G T. We have revised the sentence to:

      “Therefore, introducing multiple states for a 5mer to accout for modifications at mutliple positions within the same 5mer could help to improve the performance of the model.”

      (35) Page 22, “This causes a problem when apply DNN-based methods to new dataset without short read sequencing-based ground truth. Human could not confidently judge if a predicted m6A modification is a real m6A modification.”

      Grammatical errors in both these sentences. For the 'Human could not' part, is this referring to a single person's attempt or more extensively tested?

      Thanks for the comment. We have revised the sentence as follows:

      “This poses a challenge when applying DNN-based methods to new datasets without short-read sequencing-based ground truth. In such cases, it is difficult for researchers to confidently determine whether a predicted m6A modification is genuine (see Supplmentary Figure S5).”

      (36) Page 22, “…which is easier for human to interpret if a predicted m6A site is real.”

      "a" human, but also this probably meant to say 'whether' instead of 'if', or 'makes it easier'.

      Thanks for the advice. We have revise the sentence as follows:

      “One can generally observe a clear difference in the intensity levels between 5mers with an m6A and those with a normal adenosine, which makes it easier for a researcher to interpret whether a predicted m6A site is genuine.”

      (37) Page 22, “…and noise reduction through its GMM-based approach…”

      Is the GMM providing noise reduction or segmentation?

      Yes, we agree that it is not relevant. We have removed the sentence in the revised manuscript as follows:

      “Although SegPore provides clear interpretability and noise reduction through its GMM-based approach, there is potential to explore DNN-based models that can directly leverage SegPore's segmentation results.”

      (38) Page 23, “SegPore effectively reduces noise in the raw signal, leading to improved m6A identification at both site and single-molecule levels…”

      Without further explanation in what sense this is meant, 'reduces noise' seems to overreach the abilities, and looks more like 'masking out'.

      Following the reviewer’s suggestion, we change it to ‘mask out'’ in the revised manuscript.

      “SegPore effectively masks out noise in the raw signal, leading to improved m6A identification at both site and single-molecule levels.”

      Reviewer #3 (Recommendations for the authors):

      I recommend the publication of this manuscript, provided that the following comments (and the comments above) are addressed.

      In general, the authors state that SegPore represents an improvement on existing software. These statements are largely unquantified, which erodes their credibility. I have specified several of these in the Minor comments section.

      Page 5, Preprocessing: The authors comment that the poly(A) tail provides a stable reference that is crucial for the normalisation of all reads. How would this step handle reads that have variable poly(A) tail lengths? Or have interrupted poly(A) tails (e.g. in the case of mRNA vaccines that employ a linker sequence)?

      We apologize for the confusion. The poly(A) tail–based normalization is explained in Supplementary Note 1, Section 3.

      As shown in Author response image 1 below, the poly(A) tail produces a characteristic signal pattern—a relatively flat, squiggly horizontal line. Due to variability between nanopores, raw current signals often exhibit baseline shifts and scaling of standard deviations. This means that the signal may be shifted up or down along the y-axis and stretched or compressed in scale.

      Author response image 1.

      The normalization remains robust with variable poly(A) tail lengths, as long as the poly(A) region is sufficiently long. The linker sequence will be assigned to the adapter part rather than the poly(A) part.

      To improve clarity in the revised manuscript, we have added the following explanation:

      “Due to inherent variability between nanopores in the sequencing device, the baseline levels and standard deviations of k-mer signals can differ across reads, even for the same transcript. To standardize the signal for downstream analyses, we extract the raw current signal segments corresponding to the poly(A) tail of each read. Since the poly(A) tail provides a stable reference, we normalize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads. This step is crucial for reducing…..”

      We chose to use the poly(A) tail for normalization because it is sequence-invariant—i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.

      Page 7, 5mer parameter table: r9.4_180mv_70bps_5mer_RNA is an older kmer model (>2 years). How does your method perform with the newer RNA kmer models that do permit the detection of multiple ribonucleotide modifications? Addressing this comment is crucial because it is feasible that SegPore will underperform in comparison to the newer RNA base caller models (requiring the use of RNA004 datasets).

      Thank you for highlighting this important point. For RNA004, we have updated SegPore to ensure compatibility with the latest kit. In our revised manuscript, we demonstrate that the translocation-based segmentation hypothesis remains valid for RNA004, as supported by new analyses presented in the supplementary Figure S4.

      Additionally, we performed a new benchmark with f5c and Uncalled4 in RNA004 data in the revised manuscript (Table 2), where SegPore exhibit a better performance than f5c and Uncalled4.

      We agree that benchmarking against the latest Dorado models—specifically rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0, which include built-in modification detection capabilities—would provide valuable context for evaluating the utility of SegPore. However, generating a comprehensive k-mer parameter table for RNA004 requires a large, well-characterized dataset. At present, such data are limited in the public domain. Additionally, Dorado is developed by ONT and its internal training data have not been released, making direct comparisons difficult.

      Our current focus is on improving raw signal segmentation quality, which are upstream tasks critical to many downstream analyses, including RNA modification detection. Future work may include benchmarking SegPore against models like Dorado once appropriate data become available.

      The Methods and Results sections contain redundant information - please streamline the information in these sections and reduce the redundancy. For example, the benchmarking section may be better situated in the Results section.

      Following your advice, we have removed redundant texts about the Segmentation benchmark from Materials and Methods in the revised manuscript.

      Minor comments

      (1) Introduction

      Page 3: "By incorporating these dynamics into its segmentation algorithm...". Please provide an example of how motor protein dynamics can impact RNA translocation. In particular, please elaborate on why motor protein dynamics would impact the translocation of modified ribonucleotides differently to canonical ribonucleotides. This is provided in the results, but please also include details in the Introduction.

      Following your advice, we added one sentence to explain how the motor protein affect the translocation of the DNA/RNA molecule in the revised manuscript.

      “This observation is also supported by previous reports, in which the helicase (the motor protein) translocates the DNA strand through the nanopore in a back-and-forth manner. Depending on ATP or ADP binding, the motor protein may translocate the DNA/RNA forward or backward by 0.5-1 nucleotides.”

      As far as we understand, this translocation mechanism is not specific to modified or unmodified nucleotides. For further details, we refer the reviewer to the original studies cited.

      Page 3: "This lack of interpretability can be problematic when applying these methods to new datasets, as researchers may struggle to trust the predictions without a clear understanding of how the results were generated." Please provide details and citations as to why researchers would struggle to trust the predictions of m6Anet. Is it due to a lack of understanding of how the method works, or an empirically demonstrated lack of reliability?

      Thank you for pointing this out. The lack of interpretability in deep learning models such as m6Anet stems primarily from their “black-box” nature—they provide binary predictions (modified or unmodified) without offering clear reasoning or evidence for each call.

      When we examined the corresponding raw signals, we found it difficult to visually distinguish whether a signal segment originated from a modified or unmodified ribonucleotide. The difference is often too subtle to be judged reliably by a human observer. This is illustrated in the newly added Supplementary Figure S5, which shows Nanopolish-aligned raw signals for the central 5mer GGACT in Figure 4B, displayed both uncolored and colored by modification state (according to the ground truth).

      Although deep neural networks can learn subtle, high-dimensional patterns in the signal that may not be readily interpretable, this opacity makes it difficult for researchers to trust the predictions—especially in new datasets where no ground truth is available. The issue is not necessarily an empirically demonstrated lack of reliability, but rather a lack of transparency and interpretability.

      We have updated the manuscript accordingly and included Supplementary Figure S5 to illustrate the difficulty in interpreting signal differences between modified and unmodified states.

      Page 3: "Instead of relying on complex, opaque features...". Please provide evidence that the research community finds the figures generated by m6Anet to be difficult to interpret, or delete the sections relating to its perceived lack of usability.

      See the figure provided in the response to the previous point. We added a reference to this figure in the revised manuscript.

      “Instead of relying on complex, opaque features (see Supplementary Figure S5), SegPore leverages baseline current levels to distinguish between…..”

      (2) Materials and Methods

      Page 5, Preprocessing: "We begin by performing basecalling on the input fast5 file using Guppy, which converts the raw signal data into base sequences.". Please change "base" to ribonucleotide.

      Revised as requested.

      Page 5 and throughout, please refer to poly(A) tail, rather than polyA tail throughout.

      Revised as requested.

      Page 5, Signal segmentation via hierarchical Hidden Markov model: "...providing more precise estimates of the mean and variance for each base block, which are crucial for downstream analyses such as RNA modification prediction." Please specify which method your HHMM method improves upon.

      Thank you for the suggestion. Since this section does not include a direct comparison, we revised the sentence to avoid unsupported claims. The updated sentence now reads:

      "...providing more precise estimates of the mean and variance for each base block, which are crucial for downstream analyses such as RNA modification prediction."

      Page 10, GMM for 5mer parameter table re-estimation: "Typically, the process is repeated three to five times until the 5mer parameter table stabilizes." How is the stabilisation of the 5mer parameter table quantified? What is a reasonable cut-off that would demonstrate adequate stabilisation of the 5mer parameter table?

      Thank you for the comment. We assess the stabilization of the 5mer parameter table by monitoring the change in baseline values across iterations. If the absolute change in baseline values for all 5mers is less than 1e-5 between two consecutive iterations, we consider the estimation to have stabilized.

      Page 11, M6A site level benchmark: why were these datasets selected? Specifically, why compare human and mouse ribonuclotide modification profiles? Please provide a justification and a brief description of the experiments that these data were derived from, and why they are appropriate for benchmarking SegPore.

      Thank you for the comment. These data are taken from a previous benchmark studie about m6A estimation from RNA002 data in the literature (https://doi.org/10.1038/s41467-023-37596-5). We think the data are appropreciate here.

      Thank you for the comment. The datasets used were taken from a previous benchmark study on m6A estimation using RNA002 data (https://doi.org/10.1038/s41467-023-37596-5). These datasets include human and mouse transcriptomes and have been widely used to evaluate the performance of RNA modification detection tools. We selected them because (i) they are based on RNA002 chemistry, which matches the primary focus of our study, and (ii) they provide a well-characterized and consistent benchmark for assessing m6A detection performance. Therefore, we believe they are appropriate for validating SegPore.

      (3) Results

      Page 13, RNA translocation hypothesis: "The raw current signals, as shown in Fig. 1B...". Please check/correct figure reference - Figure 1B does not show raw current signals.

      Thank you for pointing this out. The correct reference should be Figure 2B. We have updated the figure citation accordingly in the revised manuscript.

      Page 19, m6A identification at the site level: "For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D)." SegPore performs third best of all deep learning methods. Do the authors recommend its use in conjunction with m6Anet for m6A detection? Please clarify in the text.

      This sentence aims to convey that SegPore alone can already achieve good performance. If interpretability is the primary goal, we recommend using SegPore on its own. However, if the objective is to identify more potential m6A sites, we suggest using the combined approach of SegPore and m6Anet. That said, we have chosen not to make explicit recommendations in the main text to avoid oversimplifying the decision or potentially misleading readers.

      Page 19, m6A identification at the single molecule level: "one transcribed with m6A and the other with normal adenosine". I assume that this should be adenine? Please replace adenosine with adenine throughout.

      Thank you for pointing this out. We have revised the sentence to use "adenine" where appropriate. In other instances, we retain "adenosine" when referring specifically to adenine bound to a ribose sugar, which we believe is suitable in those contexts.

      Page 19, m6A identification at the single molecule level: "We used 60% of the data for training and 40% for testing". How many reads were used for training and how many for testing? Please comment on why these are appropriate sizes for training and testing datasets.

      In total, there are 1.9 million reads, with 1.14 million used for training and 0.76 million  for testing (60% and 40%, respectively). We chose this split to ensure that the training set is sufficiently large to reliably estimate model parameters, while the test set remains substantial enough to robustly evaluate model performance. Although the ratio was selected somewhat arbitrarily, it balances the need for effective training with rigorous validation.

      (4) Discussion

      Page 21: "We believe that the de-noised current signals will be beneficial for other downstream tasks." Which tasks? Please list an example.

      We have revised the text for clarity as follows:

      “We believe that the de-noised current signals will be beneficial for other downstream tasks, such as the estimation of m5C, pseudouridine, and other RNA modifications.”

      Page 22: "One can generally observe a clear difference in the intensity levels between 5mers with a m6A and normal adenosine, which is easier for human to interpret if a predicted m6A site is real." This statement is vague and requires qualification. Please reference a study that demonstrates the human ability to interpret two similar graphs, and demonstrate how it relates to the differences observed in your data.

      We apologize for the confusion. We have revised the sentence as follows:

      “One can generally observe a clear difference in the intensity levels between 5mers with an m6A and those with a normal adenosine, which makes it easier for a researcher to interpret whether a predicted m6A site is genuine.”

      We believe that Figures 3A, 3B, and 4B effectively illustrate this concept.

      Page 23: How long does SegPore take for its analyses compared to other similar tools? How long would it take to analyse a typical dataset?

      We have added run-time statistics for datasets of varying sizes in the revised manuscript (see Supplementary Figure S6). This figure illustrates SegPore’s performance across different data volumes to help estimate typical processing times.

      (5) Figures

      Figure 4C. Please number the hierachical clusters and genomic locations in this figure. They are referenced in the text.

      Following your suggestion, we have labeled the hierarchical clusters and genomic locations in Figure 4C in the revised manuscript.

      In addition, we revised the corresponding sentence in the main text as follows: “Biclustering reveals that modifications at g6 are specific to cluster C4, g7 to cluster C5, and g8 to cluster C6, while the first five genomic locations (g1 to g5) show similar modification patterns across all reads.”

    1. Author response:

      We thank the Reviewers and Editors for their time and insightful comments. We are encouraged by their positive assessment and we look forward to addressing the points raised. Areas of primary concern include (1) the use of high concentrations in peptide experiments; (2) improvement of the presentation and discussion of the results; and (3) clarification of the impact of surface adsorption on the mass photometry analyses.

      Regarding (1), we will better explain why some experiments with isolated disordered N-terminal extension were necessarily carried out at high concentrations, in order to demonstrate the potential for these peptides to weakly self-associate. While much lower nucleocapsid protein concentrations are present in the cytosol on average, and are used in our ribonucleoprotein assembly experiments, there are two important physiologically relevant cases where high local concentrations do occur: First, high effective concentrations of tethered disordered N-terminal extensions exist locally in the volume sampled by individual ribonucleoprotein complexes, and, second, high nucleocapsid concentrations are prevalent in its macromolecular condensates. Thus, weak interactions of N-terminal extensions can play a critical role strengthening fuzzy ribonucleoprotein complexes and also altering condensate properties, both of which were confirmed in our experiments. Nonetheless, we do not expect the observed fibrillar state of the concentrated isolated N-terminal peptide to be physiologically relevant, since physiologically they will always remain tethered to the full-length protein impeding fibrillar superstructures.

      (2) We are grateful for the Reviewers’ suggestions to enhance the clarity and accessibility of our findings and to streamline the presentation. We intend to tighten up the text and improve figures throughout, and add discussion points, as proposed.

      (3) We plan to add an analysis of the extent that irreversible surface adsorption decreases solute concentration in mass photometry, and discuss why this has negligible impact on the conclusions drawn under our experimental conditions.In summary, we agree these points all provide opportunities to strengthen the manuscript further and we are glad to revise our manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the Authors:

      Reviewer #1:

      We think that this manuscript brings an important contribution that will be of interest in the areas of statistical physicists, (microbiota) ecology, and (biological) data science. The evidence of their results is solid and the work improves the state-of-the-art in terms of methods. We have a few concerns that, in our opinion, the authors should address.

      Major concerns:

      (1) While the paper could be of interest for the broad audience of e-Life, the way it is written is accessible mainly to physicists. We encourage the authors to take the broad audience into account by i) explaining better the essence of what is being done at each step, ii) highlighting the relevance of the method compared to other methods, iii) discussing the ecological implications of the results.

      Examples on how to approach i) include: Modify or expand Figure 1 so that non-familiar readers can understand the summary of the work (e.g. with cartoons representing communities, diseased states and bacterial interactions and their relationship with the inference method); in each section, summarize at the beginning the purpose of what is going to be addressed in this section, and summarize at the end what the section has achieved; in Figure 2, replace symbols by their meaning as much as possible-the same for Figure 1, at the very least in the figure caption.

      Example on how to approach ii): Since the authors aim to establish a bridge between disordered systems and microbiome ecology, it could be useful to expand a bit the introduction on disordered systems for biologists/biophysicists. This could be done with an additional text box, which could also highlight the advantages of this approach in comparison to other techniques (e.g. model-free approaches can also classify healthy and diseased states).

      Example on how to approach iii): The authors could discuss with more depth the ecological implications of their results. For example, do they have a hypothesis on why demographic and neutral effects could dominate in healthy patients?

      We thank the reviewer for the observations. Following the suggestion in the revised version, each section outlines the goal of what will be addressed in that section, and summarizes what we have achieved at the end; We also updated Figure 1 and Figure 2.

      (i) For figure 1, we expanded and hopefully made more clear how we conceptualize the problem, use the data, andestablish our method. In Figure 2, we enriched the y labels of each panel with the name associated with the order parameter.

      (ii) We thank the reviewer for helping us improve the readability of the introductory part, thus providing moreinsights into disordered systems techniques for a broader audience. We have added a few explanations at the end of page 2 – to explain the advantages of such methodology compared to other strategies and models.

      (iii) We thank the reviewer for raising the need for a more in-depth ecological discussion of our results. A simple wayto understand why neutral effects may dominate in healthy patients is the following. Neutrality implies that species differences are mainly shaped by stochastic processes such as demographic noise, with species treated as different realizations of the same underlying stochastic ecological dynamics. In our analysis, we observe that healthy individuals tend to exhibit highly similar microbial communities, suggesting that the compositional variability among their microbiomes is compatible—at least in part—with the fluctuations expected from demographic stochasticity alone. In contrast, patients with the disease display significantly more heterogeneous microbial compositions. The diversity and structure of their gut communities cannot be satisfactorily explained by neutral demographic fluctuations alone.

      This discrepancy implies that additional deterministic forces—such as altered ecological interactions—are driving the divergence observed in dysbiotic states. In diseased individuals, the breakdown of such interactions leads to a structurally distinct regime that may correspond to a phase of marginal stability, as indicated by our theoretical modeling. This shift marks a transition from a community governed by neutrality and demographic noise to one dominated by non-neutral ecological forces (as depicted in Figure 4). We added these comments in the discussion section of the revised manuscript.

      (2) Taking into account the broader audience, we invite the authors to edit the abstract, as it seems to jump from one ecological concept to another without explicitly communicating what is the link between these concepts. From the first two sentences, the motivation seems to be species diversity, but no mention of diversity comes after the second sentence. There is no proper introduction/definition of what macroecological states are. After that, the authors switch to healthy and unhealthy states, without previously introducing any link between gut microbiota states and the host’s health (which perhaps could be good in the first or second sentence, although other framings can be as valid). After that, interactions appear in the text and are related to instability, but the reader might not know whether this is surprising or if healthy/unhealthy states are generally related to stability.

      We pointed out a few examples, but the authors could extend their revision on i), ii) and iii) beyond such specific comments. In our opinion, this would really benefit the paper.

      In response to the reviewer’s concern about conceptual clarity and structure, we substantially revised the abstract to improve its accessibility and logical flow. In the revised abstract, we now clearly link species diversity to microbiome structure and function from the outset, addressing initial confusion. We provide a concise definition of ”macroecological states,” framing them as reproducible statistical patterns reflecting community-level properties. Additionally, the revised version explicitly connects gut microbiome states to host health earlier, resolving the previous abrupt shift in focus. Finally, we conclude by highlighting how disordered systems theory advances our understanding of microbiome stability and functioning, reinforcing the novelty and broader significance of our approach. Overall, the revised abstract better serves a broad interdisciplinary audience, including readers unfamiliar with the technicalities of disordered systems or microbial ecology, while preserving the scientific depth and accuracy of our work

      (3) The connection with consumer-resource (CR) models is quite unusual. In Equation (12), why do the authors assume that the consumption term does not depend on R? This should be addressed, since this term is usually dependent on R in microbial ecology models.

      In case this is helpful, it is known that the symmetric Lotka-Volterra model emerges from time-scale separation in the MacArthur model, where resources reproduce logistically and are consumed by other species (e.g., plants eaten by herbivores). Consumer-resource models form a broad category, while the MacArthur model is a specific case featuring logistic resource growth. For microbes, a more meaningful justification of the generalized Lotka-Volterra (GLV) model from a consumer-resource perspective involves the consumer-resource dynamics in a chemostat, where time-scale separation is assumed and higher-order interactions are neglected. See, for example: a) The classic paper by MacArthur: R. MacArthur. Species packing and competitive equilibrium for many species. Theoretical Population Biology, 1(1):1-11, 1970. b) Recent works on time-scale separation in chemostat consumer-resource models: Anna Posfai et al., PRL, 2017 Sireci et al., PNAS, 2023 Akshit Goyal et al., PRX-Life, 2025

      We thank the reviewer for the observation. We apologize for the typo that appeared in the main text and that we promptly corrected. The Consumers-Resources model we had in mind is the classical case proposed by MacArthur, where resources are self-regulated according to a logistic growth mechanism, which leads to the generalized LotkaVolterra model we employ in our work.

      Minor concerns:

      (1) The title has a nice pun for statistical physicists, but we wonder if it can be a bit confusing for the broader audience of e-Life. Although we leave this to the author’s decision, we’d recommend considering changing the title, making it more explicit in communicating the main contribution/result of the work.

      Following the reviewer’s suggestion, we have introduced an explanatory subtitle: “Linking Species Interactions to Dysbiosis through a Disordered Lotka-Volterra Framework”.

      (2) Review the references - some preprints might have already been published: Pasqualini J. 2023, Sireci 2022, Wu 2021.

      We thank the reviewer for pointing our attention to this inaccuracy. We updated the references to Pasqualini and Sireci papers. To our knowledge, Wu’s paper has appeared as an arXiv preprint only.

      (3) Species do not generally exhibit identical carrying capacities (see Grilli, Nat. Commun., 2020; some taxa are generally more abundant than others. The authors could discuss whether the model, with the inferred parameters, can accurately reproduce the distribution of species’ mean abundances.

      We thank the reviewer for this insightful comment. As discussed in the revised manuscript (lines 294–299), our current model does not accurately reproduce the empirical species abundance distribution (SAD). This limitation stems from the assumption of constant carrying capacities across species. While empirical observations (e.g., Grilli et al., Nat. Commun., 2020 [1]) show heterogeneous mean abundances often following power-law or log-normal distributions. However, our model assumes constant carrying capacity, resulting in SADs devoid of fat tails, which diverge from empirical data.

      This simplification is implemented to maintain the analytical tractability of the disordered generalized Lotka-Volterra (dGLV) framework, a common approach also found in prior works such as Bunin (2017) and Barbier et al. (2018) [2, 3]. Introducing heterogeneity in carrying capacities, such as drawing them from a log-normal distribution, or switching to multiplicative (rather than demographic) noise, could indeed produce SADs that better align with empirical data. Nevertheless, implementing changes would significantly complicate the analytical treatment.

      We acknowledge these directions as promising avenues for future research. They could help enhance the empirical realism of the model and its capacity to capture observed macroecological patterns while posing new theoretical challenges for disordered systems analysis

      (4) A substantial number of cited works (Grilli, Nat. Commun., 2020; Zaoli & Grilli, Science Advances, 2021; Sireci et al., PNAS, 2023; Po-Yi Ho et al., eLife, 2022) suggest that environmental fluctuations play a crucial role in shaping microbiome composition and dynamics. Is the authors’ analysis consistent with this perspective? Do they expect their conclusions to remain robust if environmental fluctuations are introduced?

      We thank the reviewer for stressing this point. The introduction of environmental fluctuations in the model formally violates detailed balance, thereby preventing the definition of an energy function. To date, no study has integrated random interactions together with both demographic and environmental noise within a unified analytical framework. This is certainly a highly promising direction that some of the authors are already exploring. However, given the inherently out-of-equilibrium nature of the system and the absence of a free energy, we would need to adopt a Dynamical Mean-Field Theory formalism and eventually analyze the corresponding stationary equations to be solved self-consistently. We added, however, a brief note in the Discussion section.

      (5) The term “order parameters“ may not be intuitive for a biological audience. In any case, the authors should explicitly define each order parameter when first introduced.

      We thank the reviewer for the comment. We introduced the names of the order parameters as soon as they are introduced, along with a brief explanation of their meaning that may be accessible to an audience with biological background.

      (6) Line 242: Should ψU be ψD?

      We thank the reviewer for the observation. We corrected the typo.

      (7) Given that the authors are discussing healthy and diseased states and to avoid confusion, the authors could perhaps use another word for ’pathological’ when they refer to dynamical regimes (e.g., in Appendix 2: ’letting the system enter the pathological regime of unbounded growth’).

      We thank the reviewer for the helpful comment. As suggested, we used the term “unphysical” instead of “pathological” where needed.

      Reviewer #2:

      (1) A technical point that I could not understand is how the authors deal with compositional data. One reason for my confusion is that the order parameters h and q0 are fixed n data to 1/S and 1/S2, and thus I do not see how they can be informative. Same for carrying capacity, why is it not 1 if considering relative abundance?

      We thank the reviewer for raising this point. We acknowledge that the treatment of compositional data and the interpretation of order parameters h and q0 were not sufficiently clarified in the manuscript. Additionally, there was an imprecision in the text regarding the interpretation of these parameters.

      As defined in revised Eq. (4) of the manuscript, h and q0 are to be averaged over the entire dataset, summing across samples α. Specifically, and , where S<sub>α</sub> is the number of species present in sample α and is the average over samples. These parameters are therefore informative, as they encapsulate sample-level ecological diversity, and their variation reflects biological differences between healthy and diseased states. For instance, Pasqualini et al., 2024 [4] reported significant differences in these metrics between health conditions, thereby supporting their ecological relevance.

      Regarding carrying capacities, we clarify that although we work with relative abundance data (i.e., compositional data), we do not fix the carrying capacity K to 1. Instead, we set K to the maximum value of xi (relative abundance) within each sample, to preserve compatibility with empirical data and allow for coexistence. While this remains a modeling assumption, it ensures better ecological realism within the constraints of the disordered GLV framework.

      (2) Obviously I’m missing something, so it would be nice to clarify in simple terms the logic of the argument. I understand that Lagrange multipliers are going to be used in the model analysis, and there are a lot of technical arguments presented in the paper, but I would like a much more intuitive explanation about the way the data can be used to infer order parameters if those are fixed by definition in compositional data.

      We thank the reviewer for the observation. The order parameters can be measured directly from the data, even in the presence of compositionality, as explained above. We can connect those parameters with the theory even for compositional data, because the only effect of adding the compositionality constraint is to shift the linear coefficient in the Hamiltonian, which corresponds to shifting the average interaction µ. However, the resulting phase diagram is mostly affected by the variance of the interactions σ2 (as µ is such that we are in the bounded phase).

      (3) Another point that I did not understand comes from the fact that the authors claim that interaction variance is smaller in unhealthy microbiomes. Yet they also find that those are closer to instability, and are more driven by niche processes. I would have expected the opposite to be true, more variance in the interactions leading to instability (as in May’s original paper for instance). Is this apparent paradox explained by covariations in demographic stochasticity (T) and immigration rate (lambda)? If so, I think it would be very useful to comment on that.

      As Altieri and coworkers showed in their PRL (2021) [5], the phase diagram of our model differs fundamentally from that of Biroli et al. (2018) [6]. In the latter, the intuitive rule – greater interaction variance yields greater instability – indeed holds. For the sake of clarity, we have attached below the resulting phase diagram obtained by Altieri et al.

      The apparent paradox arises because the two phase diagrams are tuned by different parameters. Consequently, even at low temperature and with weak interaction variance, our system may sit nearer to the replica-symmetrybreaking (RSB) line.

      Fig. 3 in the main text it is not a (σ,T) phase diagram where all other parameters are kept constant. Rather, it is a plot of the inferred σ and T parameters from the data (without showing the corresponding µ).

      To capture the full, non-trivial influence of all parameters on stability, we studied the so-called “replicon eigenvalue” in the RS (i.e. single equilibrium) approximation. This leading eigenvalue measures how close a given set of inferred parameters – and hence a microbiome – is to the RSB threshold. For a visual representation of these findings, refer to Figure 4.

      Author response image 1.

      (4) What do the empirical SAD look like? It would be nice to see the actual data and how the theoretical SADs compare.

      The empirical species abundance distributions (SADs) analyzed in our study are presented and discussed in detail in Pasqualini et al., 2024 [4]. Given the overlap in content, we chose not to reproduce these figures in the current manuscript to avoid redundancy.

      As we also clarify in the revised text, the theoretical SAD is derived from the disordered generalized Lotka-Volterra (dGLV) model in the unique fixed point phase typically exhibit exponential tails. These distributions do not match the heavier-tailed patterns (e.g., log-normal or power-law-like) observed in empirical microbiome data. This discrepancy stems from the simplifying assumptions of the dGLV framework, including the use of constant carrying capacities and demographic noise.

      In the revised manuscript, we have added a brief discussion in the revised manuscript to explicitly acknowledge this limitation and emphasize it as a direction for future refinement of the model, such as incorporating heterogeneous carrying capacities or exploring alternative noise structures.

      (5) Some typos: often “niche” is written “nice”.

      We thank the reviewer for this suggestion. After inspecting the text, we corrected the reported typos.

      Reviewer #3:

      Major comments:

      (1) In the S3 text, the authors say that filtered metagenomic reads were processed using the software Kaiju. The description of the pipeline does not mention how core genes were selected, which is often a crucial step in determining the abundance of a species in a metagenomic sample. In addition, the senior author of this manuscript has published a version of Kaiju that leverages marker genes classification methods (deemed Core-Kaiju), but it was not used for either this manuscript or Pasqualini et al. (2014; Tovo et al., 2020). I am not suggesting that the data necessarily needs to be reprocessed, but it would be useful to know how core genes were chosen in Pasqualini et al. and why Core-Kaiju was not used (2014).

      Prior to the current manuscript and the PLOS Computational Biology paper by Pasqualini et al. [4], we applied the core-Kaiju protocol to the same dataset used in both studies. However, this tool was originally developed and validated using general catalogs of culturable organisms, not specifically tuned for gut microbiomes. As a result, we have realized that in many samples Core Kajiu would filter only very few species (in some samples, the number of identified species was as low as 5–10), undermining the reliability of the analysis. Due to these limitations, we opted to use the standard Kaiju version in our work. We are actively developing an improved version of the core-Kaiju protocol that will overcome the discussed limitations and preliminary results (not shown here) indicate the robustness of the obtained patterns also in this case.

      (2) My understanding of Pasqualini et al. was that diseased patients experienced larger fluctuations in abundance, while in this study, they had smaller fluctuations (Figure 3a; 2024). Is this a discrepancy between the two models or is there a more nuanced interpretation?

      We thank the reviewer for the observation. This is only an apparent discrepancy, as the term fluctuation has different meanings in the two contexts. The fluctuations referred to by the reviewer correspond to a parameter of our theory—namely, noise in the interactions. Conversely, in Pasqualini et al. σ indicates environmental fluctuations. Nevertheless, there is no conceptual discrepancy in our results: in both studies, unhealthy microbiomes were found to be less stable. In fact, also in this study, notably Fig. 4, shows that unhealthy microbiomes lie closer to the RSB line, a phenomenon that is also associated with enhanced fluctuations.

      (3) Line 38-41: It would be helpful to explicitly state what “interaction patterns” are being referenced here. The final sentence could also be clarified. Do microbiomes “host“ interactions or are they better described as a property (“have”, “harbor”). The word “host” may confuse some readers since it is often used to refer to the human host. I am also not sure what point is being made by “expected to govern natural ones”. There are interactions between members of a microbiome; experimental studies have characterized some of these interactions, which we expect to relate in some way to interactions in nature. Is this what the authors are saying?

      Thanks. We agree that this sentence was not clear. Indeed, we are referring to pairwise species interactions and not to host-microbiome interactions. We have rewritten this part in the following way: In fact, recent work shows that the network-level properties of species-species interactions —for example, the sign balance, average strength, and connectivity of the inferred interaction matrix— shift systematically between healthy and dysbiotic gut communities (see for instance, [7, 8]). Pairwise species interactions have been quantified in simplified in-vitro consortia [9, 10]; we assume that the same classes of interactions also operate—albeit in a more complex form—in the native gut microbiome.

      (4) Line 43: I appreciate that the authors separated neutral vs. logistic models here.

      (5) Lines 51-75: The framing here is well-written and convincing. Network inference is an ongoing, active subject in ecology, and there is an unfortunate focus on inferring every individual interaction because ecologists with biology backgrounds are not trained to think about the problem in the language of statistical physics.

      We thank the reviewer for these positive comments.

      (6) Line 87: Perhaps I’m missing something obvious, but I don’t see how ρi sets the intrinsic timescale of the dynamics when its units are 1/(time*individuals), assuming the dimensions of ri are inverse time.

      We thank the reviewer for the observation. We corrected this phrase in the main text.

      (7) Lines 189-190: “as close as possible to the data” it would aid the reader if you specified the criteria meant by this statement.

      We thank the reviewer for the observation. We removed the sentence, as it introduced some redundancy in our argument. In the subsequent text, the proposed method is exposed in details.

      (8) Line 198: It would aid the reader if you provided some context for what the T - σ plane represents.

      We thank the referee for the helpful indication. Indeed, we have better clarified the mutual role of the demographic noise amplitude and strength of the random interaction matrix, as theoretically predicted in the PRL (2021) by Altieri and coworkers [5]. Please, find an additional paragraph on page 6 of the resubmitted version.

      (9) Line 217: Specifying what is meant by “internal modes“ would aid the typical life science reader.

      We thank the reviewer for the suggestion. Recognizing that referring to “internal modes” to describe the SAD shape in that context might cause confusion, we replaced “internal modes“ with “peaks”.

      (10) Line 219: Some additional justification and clarification are needed here, as some may think of “m“ as being biomass.

      We added a sentence to better explain this concept. “In classical and quantum field theory, the particle-particle interaction embedded in the quadratic term is typically referred to as a mass source. In the context of this study, captures quadratic fluctuations of species abundances, as also appearing in the expression of the leading eigenvalue of the stability matrix.”

      Minor comments:

      (1) I commend the authors for removing metagenomic reads that mapped to the human genome in the preprocessing stage of their pipeline. This may seem like an obvious pre-processing step, but it is unfortunately not always implemented.

      We thank the referee for pointing this potential issue. The data used in this work, as well as the bioinformatic workflow used to generate them has been described in detail in Pasqualini et al., 2024 [4]. As one of the main steps for preprocessing, we remove reads mapping to the human genome.

      (2) Line 13: “Bacterial“ excludes archaea, and while you may not have many high-abundance archaea in your human gut data, this sentence does not specify the human gut. Usually, this exclusion is averted via the term “microbial“, though sometimes researchers raise objections to the term when the data does not include fungal members (e.g., all 16S studies).

      We thank the reviewer for this suggestion. As to include archaeal organisms, we adopt the term “microbial“ instead of “bacterial“.

      (3) Line 18: This manuscript is being submitted under the “Physics of Living Systems“ tract, but it may be useful to explicitly state in the Abstract that disordered systems are a useful approach for understanding large, complex communities for the benefit of life science researchers coming from a biology background.

      Thank. We have modified the abstract following this suggestion.

      (4) Line 68: Consider using “adapted“ or something similar instead of “mutated“ if there is no specific reason for that word choice.

      We thank the reviewer for this suggestion, which was implemented in the text.

      (5) Line 111: It would be useful to define annealed and quenched for a general life science audience.

      We thank the reviewer for this suggestion. In the “Results” section, we have opted for “time-dependent disordered interactions” to reach a broader audience and avoid any jargon. Moreover, in the Discussion we added a detailed footnote: “In contrast to the quenched approximation, the annealed version assumes that the random couplings are not fixed but instead fluctuate over time, with their covariance governed by independent Ornstein–Uhlenbeck processes.”

      (6) Line 124: Likewise for the replicon sector.

      We thank the reviewer for the suggestion. We added a footnote on page 4, after the formula, to highlight the physical intuition behind the introduction of the replicon mode.

      “The replicon eigenvalue refers to a particular type of fluctuation around the saddle-point (mean-field) solution within the replica framework. When the Hessian matrix of the replicated free energy is diagonalized, fluctuations are divided into three sectors: longitudinal, anomalous, and replicon. The replicon mode is the most sensitive to criticality signaling – by its vanishing trend – the emergence of many nearly-degenerate states. It essentially describes how ‘soft’ the system is to microscopic rearrangements in configuration space.”

      (7) Figure 2: It would be helpful to include y-axis labels for each order parameter alongside the mathematical notation.

      We thank the reviewer for this suggestion. Now the y-axis of Figure 2 includes, along the mathmetical symbol, the label of the represented quantities.

      (8) Line 242: Subscript “U” is used to denote “Unhealthy” microbiomes, but “D” is used to denote “Diseased” in Figs. 2 and 3 (perhaps elsewhere as well).

      We thank the reviewer for this observation. After checking the various subscripts in the text, coherently with figure 2 and 3, we homogenized our notation, adopting the subscript “D“ for symbols related to the diseased/unhealthy condition.

      (9) Line 283: “not to“ should be “not due to“

      We thank the reviewer for this suggestion. After inspecting the text, we corrected the reported error.

      (10) Equations 23, 34: Extra “=“ on the RHS of the first line.

      We consistently follow the same formatting across all the line breaks in the equations throughout the text.

      We are thus resubmitting our paper, hoping to have satisfactorily addressed all referees’ concerns.

      References

      (1) Jacopo Grilli. Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1):4743, 2020.

      (2) Guy Bunin. Ecological communities with lotka-volterra dynamics. Physical Review E, 95(4):042414, 2017.

      (3) Matthieu Barbier, Jean-Franc¸ois Arnoldi, Guy Bunin, and Michel Loreau. Generic assembly patterns in complex ecological communities. Proceedings of the National Academy of Sciences, 115(9):2156–2161, 2018.

      (4) Jacopo Pasqualini, Sonia Facchin, Andrea Rinaldo, Amos Maritan, Edoardo Savarino, and Samir Suweis. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9):e1012482, 2024.

      (5) Ada Altieri, Felix Roy, Chiara Cammarota, and Giulio Biroli. Properties of equilibria and glassy phases of the random lotka-volterra model with demographic noise. Physical Review Letters, 126(25):258301, 2021.

      (6) Giulio Biroli, Guy Bunin, and Chiara Cammarota. Marginally stable equilibria in critical ecosystems. New Journal of Physics, 20(8):083051, 2018.

      (7) Amir Bashan, Travis E Gibson, Jonathan Friedman, Vincent J Carey, Scott T Weiss, Elizabeth L Hohmann, and Yang-Yu Liu. Universality of human microbial dynamics. Nature, 534(7606):259–262, 2016.

      (8) Marcello Seppi, Jacopo Pasqualini, Sonia Facchin, Edoardo Vincenzo Savarino, and Samir Suweis. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1):5, 2023.

      (9) Jared Kehe, Anthony Ortiz, Anthony Kulesa, Jeff Gore, Paul C Blainey, and Jonathan Friedman. Positive interactions are common among culturable bacteria. Science advances, 7(45):eabi7159, 2021.

      (10) Ophelia S Venturelli, Alex V Carr, Garth Fisher, Ryan H Hsu, Rebecca Lau, Benjamin P Bowen, Susan Hromada, Trent Northen, and Adam P Arkin. Deciphering microbial interactions in synthetic human gut microbiome communities. Molecular systems biology, 14(6):e8157, 2018.

    1. Author response:

      We gratefully acknowledge the comments on our manuscript and the time you took to read and understand our work. Nevertheless, it is the opinion of these authors that the evidence provided in the submitted paper is strong and we performed multiple replicates of the experiments. In particular, gene deletion and complementation is the accepted gold standard for studies in physiology. In the isoleucine auxotroph (IMaux) strain carrying an ilvG deletion, growth is only possible if ilvG is reintroduced on a plasmid and induced. Additionally, isotopic labeling clearly demonstrates the activity of the proposed pathway. Regardless, we agree with the reviewers that the paper and the scientific community would benefit from an in vitro characterization of the promiscuity of IlvG, so we will perform this experiment and resubmit the paper for further revision, and in this revision also provide more detail on the replicates performed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists. 

      The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      Thank you for the kind words.

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues. 

      We have modified the title to reflect this comment.  “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos”

      The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.

      Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid. The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid. 

      Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront. Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package. 

      Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      You have raised important points, thank you for this feedback. We have added a limitations and considerations section to the discussion which highlights that there are still unanswered questions. Line 311-328

      Considerations and limitations

      “First, we believe it is more likely that the changes in moment arms and EMA can be attributed to speed rather than body mass, given the marked changes in joint angles and ankle height observed at faster hopping speeds. However, our sample included a relatively narrow range of body masses (13.7 to 26.6 kg) compared to the potential range (up to 80 kg), limiting our ability to entirely isolate the effects of speed from those of mass. Future work should examine a broader range of body sizes. Second, kangaroos studied here only hopped at relatively slow speeds, which bounds our estimates of EMA and tendon stress to a less critical region. As such, we were unable to assess tendon stress at fast speeds, where increased forces would reduce tendon safety factors closer to failure. A different experimental or modelling approach may be needed, as kangaroos in enclosures seem unwilling to hop faster over force plates. Finally, we did not determine whether the EMA of proximal hindlimb joints (which are more difficult to track via surface motion capture markers) remained constant with speed. Although the hip and knee contribute substantially less work than the ankle joint (Fig. 4), the majority of kangaroo skeletal muscle is located around these proximal joints. A change in EMA at the hip or knee could influence a larger muscle mass than at the ankle, potentially counteracting or enhancing energy savings in the ankle extensor muscle-tendon units. Further research is needed to understand how posture and muscles throughout the whole body contribute to kangaroo energetics.”

      Additionally, we added a line “Peak GRF also naturally increased with speed together with shorter ground contact durations (Fig. 2b, Suppl. Fig 1b)” (line 238) to highlight that we are not proposing that changes in EMA alone explain the full increase in tendon stress. Both GRF and EMA contribute substantially (almost equally) to stress, and we now give more equal discussion to both. For instance, we now also evaluate how much each contributes: “If peak GRF were constant but EMA changed from the average value of a slow hop to a fast hop, then stress would increase 18%, whereas if EMA remained constant and GRF varied by the same principles, then stress would only increase by 12%. Thus, changing posture and decreasing ground contact duration both appear to influence tendon stress for kangaroos, at least for the range of speeds we examined” (Line 245-249)

      We have added a paragraph in the discussion acknowledging that the cost of generating force problem is not resolved by our work, concluding that “This mechanism may help explain why hopping macropods do not follow the energetic trends observed in other species (Dawson and Taylor 1973, Baudinette et al. 1992, Kram and Dawson 1998), but it does not fully resolve the cost of generating force conundrum” Line 274-276.

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

      We appreciate this comment from the reviewer, however could not extend the study to discuss animal size effects because, as we now note in the results: “The range of body masses may not be sufficient to detect an effect of mass on ankle moment in addition to the effect of speed.” Line 193

      Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics. 

      While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals.  Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.

      In the current study, we aimed to provide a joint-level explanation for the increases of tendon stress that are likely linked to metabolic energy consumption.

      We have now included a limitations section in the manuscript (See response to Rev 1). We plan to expand upon muscle level energetics in the future with a more detailed musculoskeletal model.

      Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured) and did not detectibly associate with hopping speed (see results Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals

      As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA. It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The mediallateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable. We have now added additional details into the text to clarify this

      Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would be greater at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we believe it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency. We have added an additional section into the results to make this clearer (Line 177-185) “If we consider the nonsignificant relationship between R (and EMA) and speed to indicate that there is no change in R, then it conflicts with the ankle height and CoP result. Taking both into account, we think it is more likely that there is a small, but important, change in R, rather than no change in R with speed. It may be undetectable because we expect small effect sizes compared to the measurement range and measurement error (Suppl. Fig. 3h), or be obscured by a similar change in R with body mass. R is highly dependent on the length of the metatarsal segment, which is longer in larger kangaroos (1 kg BM corresponded to ~1% longer segment, P<0.001, R<sup>2</sup>=0.449). If R does indeed increase with speed, both R and r will tend to decrease EMA at faster speeds.”

      These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design

      There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals showed a wide speed range within this. Eight of our 16 kangaroos had a maximum speed that was 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials.   It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference with animals such as chasing is dangerous to kangaroos as they are prone to adverse reactions to stress. We have now added additional information about the chosen hopping speeds into the results and methods sections to clarify this “The kangaroos elected to hop between 1.99 and 4.48 m s<sup>-1</sup>, with a range of speeds and number of trials for each individual (Suppl. Fig. 9).”  (Line 381-382)

      There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate

      We thank the reviewer for this comment. Upon rereading we now understand the reviewers position, and have made substantial revisions to the introduction and discussion (See comments below) 

      My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

      Again we thank the reviewer for their time and appreciate their efforts to strengthen our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechanical analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Thank you!

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:

      It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects? 

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speeds within the bounds of what kangaroos are capable of in the wild (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is a large amount of variation in hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individuals had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We have now removed the terminology “preferred speed” e.g line 115. We have added additional information about the chosen hopping speeds into the results and methods, including an appendix figure “The kangaroos elected to hop between 1.99 and 4.48 m s<sup>-1</sup>, with a range of speeds and number of trials for each individual (Suppl. Fig. 9).” (Line 381-382)

      In the literature cited, what was the range of speeds measured, and was it within or between subjects?

      For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998). 

      Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost?  They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported). 

      The functions that underpin these results (e.g. moment = GRF*R) come from physical mechanics and geometry, rather than statistical correlations. Additionally, a p-value is not appropriate in the relationship between EMA and stress (rather than strain) because the relationship does not appear to be linear. We have made it clearer in the discussion that we are not proposing that entire change in stress is caused by changes in EMA, but that the increase in GRF that naturally occurs with speed will also explain some of the increase in stress, along with other potential mechanisms. The discussion has been extensively revised to reflect this. 

      Tendon strain could be increasing with ground reaction force, independent of EMA. Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.

      Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose (Suppl. Fig. 8), see the formulas in Fig 6, and we have made this clearer in the revised discussion (see above comment).  You are correct that mathematically stress is inversely proportional to EMA, which can be observed in Fig. 7a, and we did find that EMA decreases. 

      The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested. 

      The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We did not have sufficient within individual sample size to use a linear mixed effect model including subject as a random factor, thus all trials were treated individually. We have included this information in the results section. 

      We have now moved the range of speeds from the supplementary material to the results and figure captions. We have added information on the number of trials per kangaroo to the methods, and added Suppl. Fig. 9 showing the distribution of speeds per kangaroo.

      We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups for statistical analysis (the latter was only for display purposes in our figures, which we have now made clearer in the methods statistics section). 

      Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn’t exempt the authors from providing the details of their approach.

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speeds within the bounds of what kangaroos are capable of in the wild (up to 12 m/s). We have now removed the terminology “preferred speed” e.g. line 115. We have added additional information about the chosen hopping speeds into the results and methods, including an appendix figure (see above comment). (Line 381-382)

      Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      Thank you for this comment. The bins are used only for display purposes and not within the statistical analysis. We have clarified this in the revised manuscript: “The data was grouped into body mass (small 17.6±2.96 kg, medium 21.5±0.74 kg, large 24.0±1.46 kg) and speed (slow 2.52±0.25 m s<sup>-1</sup>, medium 3.11±0.16 m s<sup>-1</sup>, fast 3.79±0.27 m s<sup>-1</sup>) subsets for display purposes only”. (Line 495-497)

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster). Subject masses were reported in the first paragraph of the methods, albeit some were estimated as outlined in the same paragraph.

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

      The methods after the discussion is a requirement of the journal. We have incorporated some methods in the results where necessary but not too repetitive or disruptive, e.g. Fig. 1 caption, and specifying we are only analysing EMA for the ankle joint

      Reviewing Editor (Recommendations For The Authors):

      Below is a list of specific recommendations that the authors could address to improve the eLife assessment:

      (1) Based on the data presented and the fact that metabolic energy was not measured, the authors should temper their conclusions and statements throughout the manuscript regarding the link between speed and metabolic energy savings. We recommend adding text to the discussion summarizing the strengths and limitations of the evidence provided and suggesting future steps to more conclusively answer this mystery.

      There is a significant body of work linking metabolic energy savings to measured increases in tendon stress in macropods. However, the purpose of this paper was to address the unanswered questions about why tendon stress increases. We found that stress did not only increase due to GRF increasing with speed as expected, but also due to novel postural changes which decreased EMA. In the revised manuscript, we have tempered our conclusions to make it clearer that it is not just EMA affecting stress, and added limitations throughout the manuscript (see response to Rev 1). 

      (2) To provide stronger evidence of a link between speed, mechanics, and metabolic savings the authors can consider estimating metabolic energy expenditure from their OpenSIM model. This is one suggestion, but the authors likely have other, possibly better ideas. Such a model should also be able to explain why the metabolic rate increases with speed during uphill hopping.

      Extending the model to provide direct metabolic cost estimates will be the goal of a future paper, however the models does not have detailed muscle characteristics to do this in the formulation presented here. It would be a very large undertaking which is beyond the scope of the current manuscript. As per the comment above, the results of this paper are not reliant on metabolic performance. 

      (3) The authors attempt to relate the newly quantified hopping biomechanics to previously published metabolic data. However, all reviewers agree that the logic in many instances is not clear or contradictory. Could one potential explanation be that at slow speeds, forces and tendon strain are small, and thus muscle fascicle work is high? Then, with faster speeds, even though the cost of generating isometric force increases, this is offset by the reduction in the metabolic cost of muscular work. The paper could provide stronger support for their hypotheses with a much clearer explanation of how the kinematics relate to the mechanics and ultimately energy savings.

      In response to the reviewers comments, we have substantially modified the discussion to provide clearer rationale.

      (4) The methods and the effort expended to collect these data are impressive, but there are a number of underlying assumptions made that undermine the conclusions. This is due partly to the methods used, but also the paper's incomplete description of their methods. We provide a few examples below:

      It would be helpful if the authors could speak to the effect of the limited speeds tested and between-animal comparisons on the ability to draw strong conclusions from the present dataset. ·

      Throughout the discussion, the authors highlight the relationship between EMA and speed. However, this is misleading since there was no significant effect of speed on EMA. Speed only affected the muscle moment arm, r. At minimum, this should be clarified and the effect on EMA not be overstated. Additionally, the resulting implications on their ability to confidently say something about the effect of speed on muscle stress should be discussed. 

      We have now provided additional details, (see responses above) to these concerns. For instance, we added a supplementary figure showing the speed distribution per individual. The primary reviewer concern (that each kangaroo travelled at a single speed) was due to a miscommunication around the terminology “preferred” which has now been corrected. 

      We now elaborate in the results why we are not very concerned that EMA is insignificant. The statistical insignificance of EMA is ultimately due to the insignificance of the direct measurement of R, however, we now better explain in the results why we believe that this statistical insignificance is due to error/noise of the measurement which is relatively large compared to the effect size. Indirect indications of how R may increase with speed (via ankle height from the ground) are statistically significant. Lines 177-185. 

      We consider this worth reporting because, for instance, an 18% change in EMA will be undetectable by measurement, but corresponds to an 18% change in tendon stress which is measurable and physiologically significant (safety factor would decrease from 2 to 1.67).  We presented both significant and insignificant results for transparency. 

      We have also discussed this within a revised limitations section of the manuscript (Line 311328). 

      Reviewer #1 (Recommendations For The Authors):

      Title: I would cut the first half of the title. At least hedge it a bit. "Clues" instead of "Unlocking the secrets".

      We have revised the title to: “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos”

      In my comments, ... typically indicates a stylistic change suggested to the text.

      Overall, the paper covers speed and size. Unfortunately, the authors were not 100% consistent in the order of presenting size then speed, or speed then size. Just choose one and stick with it.

      We have attempted to keep the order of presenting size and speed consistent, however there are several cases where this would reduce the readability of the manuscript and so in some cases this may vary. 

      One must admit that there is a lot of vertical scatter in almost all of the plots. I understand that these animals were not in a lab on a treadmill at a controlled speed and the animals wear fur coats so marker placements vary/move etc. But the spread is quite striking, e.g. Figure 5a the span at one speed is almost 10x. Can the authors address this somewhere? Limitations section?

      The variation seen likely results from attempting to display data in a 2D format, when it is in fact the result of multiple variables, including speed, mass, stride frequency and subject specific lengths. Slight variations in these would be expected to produce some noise around the mean, and I think it’s important to consider this while showing the more dominant effects. 

      In many locations in the manuscript, the term "work" is used, but rarely if ever specified that this is the work "per hop". The big question revolves around the rate of metabolic energy consumption (i.e. energy per time or average metabolic power), one must not forget that hop frequency changes somewhat across speed, so work per hop is not the final calculation.

      Thank you for this comment. We have now explicitly stated work per hop in figure captions and in the results (line 208). The change in stride frequency at this range of speeds is very small, particularly compared to the variance in stride frequency (Suppl. Fig. 1d), which is consistent with other researchers who found that stride frequency was constant or near constant in macropods at analogous speeds (e.g. Dawson and Taylor 1973, Baudinette et al. 1987). 

      Line 61 ....is likely related.

      Added “likely” (line 59)

      Line 86 I think the Allen reference is incomplete. Wasn't it in J Exp Biology?

      Thank you. Changed. 

      Line 122 ... at faster speeds and in larger individuals.

      Changed: “We hypothesised that (i) the hindlimb would be more crouched at faster speeds, primarily due to the distal hindlimb joints (ankle and metatarsophalangeal), independent of changes with body mass” (Line 121-122).

      Line 124 I found this confusing. Try to re-word so that you explain you mean more work done by the tendons and less by the ankle musculature.

      Amended: “changes in moment arms resulting from the change in posture would contribute to the increase in tendon stress with speed, and may thereby contribute to energetic savings by increasing the amount of positive and negative work done by the ankle without requiring additional muscle work” (Line 123)

      Line 129 hopefully "braking" not "breaking"!

      Thank you. Fixed. (Line 130)

      Line 129 specify fore-aft horizontal force.

      Added "fore-aft" to "negative fore-aft horizontal component" (Line 130-131)

      Line 130 add something like "of course" or "naturally" since if there is zero fore-aft force, the GRF vector of course must be vertical. 

      Added "naturally" (Line 132)

      Line 138 clarify that this section is all stance phase. I don't recall reading any swing phase data.

      Changed to: "Kangaroo hindlimb stance phase kinematics varied…" (Line 141)

      Line 143 and elsewhere. I found the use of dorsiflexion and plantarflexion confusing. In Figure 3, I see the ankle never flexing more than 90 degrees. So, the ankle joint is always in something of a flexed position, though of course it flexes and extends during contact. I urge the authors to simplify to flextion/extension and drop the plantar/dorsi.

      We have edited this section to describe both movements as greater extension (plantarflexion). (Line 147). We have further clarified this in the figure caption for figure 3.  

      Line 147 ...changes were…

      Fixed, line 150

      Line 155 I'm a bit confused here. Are the authors calculating some sort of overall EMA or are they saying all of the individual joint EMAs all decreased?

      Thank you, we clarified that it is at the ankle. Line 158

      Line 158 since kangaroos hop and are thus positioned high and low throughout the stance phase, try to avoid using "high" and "low" for describing variables, e.g. GRF or other variables. Just use "greater/greatest" etc.

      Thanks for this suggestion. We have changed "higher" into "greater" where appropriate throughout the manuscript e.g. line 161

      Lines 162 and 168 same comment here about "r" and "R". Do you mean ankle or all joints?

      Clarified that it is the gastrocnemius and plantaris r, and the R to the ankle. (Lines 164-165)

      Line 173 really, ankle height?

      Added: ankle height is "vertical distance from the ground". Line 177

      Line 177 is this just the ankle r?

      Added "of the ankle" line 158 and “Achilles” line 187 

      Line 183 same idea, which tendon/tendons are you talking about here?

      Added "Achilles" to be more clear (Line 187)

      Line 195 substitute "converted" for "transferred".

      Done (Line 210)

      Line 223 why so vague? i.e. why use "may"? Believe in your data. ...stress was also modulated by changes....

      Changed "may" to "is"

      Line 229 smaller ankle EMA (especially since you earlier talked about ankle "height").

      Changed “lower” to “smaller” Line 254

      Line 2236 ...and return elastic energy…

      Added "elastic" line 262

      Line 244 IMPORTANT: Need to explain this better! I think you are saying that the net work at the ankle is staying the same across speed, BUT it is the tendons that are storing and returning that work, it's not that the muscles are doing a lot of negative/positive work.

      Changed: “The consistent net work observed among all speeds suggests the ankle extensor muscle-tendon units are performing similar amounts of ankle work independent of speed, which would predominantly be done by the tendon.” Line 270-272)

      Line 258-261 I think here is where you are over-selling the data/story. Although you do say "a" mechanism (and not "the" mechanism, you still need to deal with the cost of generating more force and generating that force faster.

      We removed this sentence and replaced it with a discussion of the cost of generating force hypothesis, and alternative scenarios for the how force and metabolics could be uncoupled. 

      Line 278 "the" tendon? Which tendon?

      Added "Achilles"

      Line 289. I don't think one can project into the past.

      Changed “projected” to "estimated"

      Line 303 no problem, but I've never seen a paper in biology where the authors admit they don't know what species they were studying!

      Can’t be helped unfortunately. It is an old dataset and there aren’t photos of every kangaroo. Fortunately, from the grey and red kangaroos we can distinguish between, we know there are no discernible species effects on the data. 

      Lines 304-306 I'm not clear here. Did you use vertical impulse (and aerial time) to calculate body weight? Or did you somehow use the braking/propulsive impulse to calculate mass? I would have just put some apples on the force plate and waited for them to stop for a snack.

      Stationary weights were recorded for some kangaroos which did stand on the force plate long enough, but unfortunately not all of them were willing to do so. In those cases, yes, we used impulse from steady-speed trials to estimate mass. We cross-checked by estimated mass from segment lengths (as size and mass are correlated). This is outlined in the first paragraph of the methods.

      Lines 367 & 401 When you use the word "scaled" do you mean you assumed geometric similarity?

      No, rather than geometric scaling, we allowed scaling to individual dimensions by using the markers at midstance for measurements. We have amended the paragraph to clarify that the shape of the kangaroo changes and that mass distribution was preserved during the shape change (line 441-446) 

      Lines 381-82 specify "joint work"

      Added "joint work"  (Line 457)

      Figure 1 is gorgeous. Why not add the CF equation to the left panel of the caption?

      We decided to keep the information in the figure caption. “Total leg length was calculated as the sum of the segment lengths (solid black lines) in the hindlimb and compared to the pelvisto-toe distance (dashed line) to calculate the crouch factor”

      Figure 2 specify Horizontal fore-aft.

      Done

      Figure 3g I'd prefer the same Min. Max Flexion vertical axis labels as you use for hip & knee.

      While we appreciate the reviewer trying to increase the clarity of this figure, we have left it as plantar/dorsi flexion since these are recognised biomechanical terms. To avoid confusion, we have further defined these in the figure caption “For (f-g), increased plantarflexion represents a decrease in joint flexion, while increased dorsiflexion represents increased flexion of the joint.”

      Figure 4. I like it and I think that you scaled all panels the same, i.e. 400 W is represented by the same vertical distance in all panels. But if that's true, please state so in the Caption. It's remarkable how little work occurs at the hip and knee despite the relatively huge muscles there.

      Is it true that the y axes are all at the same scale. We have added this to the caption. 

      Figure 5 Caption should specify "work per hop".

      Added

      Figure 7 is another beauty.

      Thank you!

      Supplementary Figure 3 is this all ANKLE? Please specify.

      Clarified that it is the gastrocnemius and plantaris r, and the R to the ankle.

      Reviewer #2 (Recommendations For The Authors):

      To 'unlock the secrets of kangaroo locomotor energetics' I expected the authors to measure the secretive outcome variable, metabolic rate using laboratory measures. Rather, the authors relied on reviewing historic metabolic data and collecting biomechanics data across different animals, which limits the conclusions of this manuscript.

      We have revised to the title to make it clearer that we are investigating a subset of the energetics problem, specifically posture. “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos.” We have also substantially modified the discussion to temper the conclusions from the paper. 

      After reading the hypothesis, why do the authors hypothesize about joint flexion and not EMA? Because the following hypothesis discusses the implications of moment arms on tendon stress, EMA predictions are more relevant (and much more discussed throughout the manuscript).

      Ankle and MTP angles are the primary drivers of changes in r, R & thus, EMA. We used a two part hypothesis to capture this. We have rephased the hypotheses: “We hypothesised that (i) the hindlimb would be more crouched at faster speeds, primarily due to the distal hindlimb joints (ankle and metatarsophalangeal), independent of changes with body mass, and (ii) changes in moment arms resulting from the change in posture would contribute to the increase in tendon stress with speed, and may thereby contribute to energetic savings by increasing the amount of positive and negative work done by the ankle without requiring additional muscle work.”

      If there were no detectable effects of speed on EMA, are kangaroos mechanically like other animals (Biewener Science 89 & JAP 04) who don't vary EMA across speeds? Despite no detectible effects, the authors state [lines 228-229] "we found larger and faster kangaroos were more crouched, leading to lower ankle EMA". Can the authors explain this inconsistency? Lines 236 "Kangaroos appear to use changes in posture and EMA". I interpret the paper as EMA does not change across speed.

      Apologies, we did not sufficiently explain this originally. We now explain in the results our reasoning behind our belief that EMA and R may change with speed. “If we consider the nonsignificant relationship between R (and EMA) and speed to indicate that there is no change in R, then it conflicts with the ankle height and CoP result. Taking both into account, we think it is more likely that there is a small, but important, change in R, rather than no change in R with speed. It may be undetectable because we expect small effect sizes compared to the measurement range and measurement error (Suppl. Fig. 3h), or be obscured by a similar change in R with body mass. R is highly dependent on the length of the metatarsal segment, which is longer in larger kangaroos (1 kg BM corresponded to ~1% longer segment, P<0.001, R<sup>2</sup>=0.449). If R does indeed increase with speed, both R and r will tend to decrease EMA at faster speeds.” (Line 177-185)

      Lines 335-339: "We assumed the force was applied along phalanx IV and that there was no medial or lateral movement of the centre of pressure (CoP)". I'm confused, did the authors not measure CoP location with respect to the kangaroo limb? If not, this simple estimation undermines primary results (EMA analyses).

      We have changed "The anterior or posterior movement of the CoP was recorded by the force plate" to read: "The fore-aft movement of the CoP was recorded by the force plate within the motion capture coordinate system" (Line 406-407) and added more justification for fixing the CoP movement in the other axis: “It was necessary to assume the CoP was fixed in the mediallateral axis because when two feet land on the force plate, the lateral forces on each foot are not recorded, and indeed cancel if the forces are symmetrical (i.e. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials to ensure reliable measures of the anterior-posterior movement of the CoP.” (Line 408-413)

      The introduction makes many assertions about the generalities of locomotion and the relationship between mechanics and energetics. I'm afraid that the authors are selectively choosing references without thoroughly evaluating alternative theories. For example, Taylor, Kram, & others have multiple papers suggesting that decreasing EMA and increasing muscle force (and active muscle volume) increase metabolic costs during terrestrial locomotion. Rather, the authors suggest that decreasing EMA and increasingly high muscle force at faster speeds don't affect energetics unless muscle work increases substantially (paragraph 2)? If I am following correctly, does this theory conflict with active muscle volume ideas that are peppered throughout this manuscript?

      Yes, as you point out, the same mechanism does lead to different results in kangaroos vs humans, for instance, but this is not a contradiction. In all species, decreasing EMA will result in an increase in muscle force due to less efficient leverage (i.e. lower EMA) of the muscles, and the muscle-tendon unit will be required to produce more force to balance the joint moment. As a consequence, human muscles activate a greater volume in order for the muscle-tendon unit to increase muscle work and produce enough force. We are proposing that in kangaroos, the increase in work is done by the achilles tendon rather than the muscles. Previous research suggests that macropod ankle muscles contract isometrically or that the fibres do not shorten more at faster speeds i.e. muscle work does not increase with speed. Instead, the additional force seems to come from the tendon storing and subsequently returning more strain energy (indicated by higher stress). We found that the increase in tendon stress comes from higher ground force at faster speeds, and from it adopting a more crouched posture which increases the tendons’ stresses compared to an upright posture for a given speed (think of this as increasing the tendon’s stress capacity). We have substantially revised the discussion to highlight this.

      Similarly, does increased gross or net tendon mechanical energy storage & return improve hopping energetics? Would more tendon stress and strain energy storage with a given hysteresis value also dissipate more mechanical energy, requiring leg muscles to produce more net work? Does net or gross muscle work drive metabolic energy consumption?

      Based on the cost of generating force hypothesis, we think that gross muscle work would be linked to driving metabolic energy consumption. Our idea here is that the total body work is a product of the work done by the tendon and the muscle combined. If the tendon has the potential to do more work, then the total work can increase without muscle work needing to increase.

      The results interpret speed effects on biomechanics, but each kangaroo was only collected at 1 speed. Are inter-animal comparisons enough to satisfy this investigation?

      We have added a figure (Suppl Fig 9) to demonstrate the distribution of speed and number of trials per kangaroo. We have also removed "preferred" from the manuscript as this seems to cause confusion. Most kangaroos travelled at a range of “casual” speeds.

      Abstract: Can the authors more fully connect the concept of tendon stress and low metabolic rates during hopping across speeds? Surely, tendon mechanics don't directly drive the metabolic cost of hopping, but they affect muscle mechanics to affect energetics.

      Amended to: " This phenomenon may be related to greater elastic energy savings due to increasing tendon stress; however, the mechanisms which enable the rise in stress, without additional muscle work remain poorly understood." (Lines 25-27).

      The topic sentence in lines 61-63 may be misleading. The ensuing paragraph does not substantiate the topic sentence stating that ankle MTUs decouple speeds and energetics.

      We added "likely" to soften the statement. (Line 59)

      Lines 84-86: In humans, does more limb flexion and worse EMA necessitate greater active muscle volume? What about muscle contractile dynamics - See recent papers by Sawicki & colleagues that include Hill-type muscle mechanics in active muscle volume estimates.

      Added: “Smaller EMA requires greater muscle force to produce a given force on the ground, thereby demanding a greater volume of active muscle, and presumably greater metabolic rates than larger EMA for the same physiology”. (Line 80-82)

      Lines 106: can you give the context of what normal tendon safety factors are?

      Good idea. Added: "far lower than the typical safety factor of four to eight for mammalian tendons (Ker et al. 1988)." Line 106-107

      I thought EMA was relatively stable across speeds as per Biewener [Science & JAP '04]. However the authors gave an example of an elephant to suggest that it is typically inversely related to speed. Can the authors please explain the disconnect and the most appropriate explanation in this paragraph?

      Knee EMA in particular changed with speed in Biewener 2004. What is “typical” probably depends on the group of animals studied; e.g., cursorial quadrupedal mammals generally seem to maintain constant EMA, but other groups do not.

      These cases are presented to show a range of consequences for changing EMA (usually with mass, but sometimes with speed). We have made several adjustments to the paragraph to make this clearer. Lines 85-93.

      The results depend on the modeled internal moment arm (r). How confident are the authors in their little r prediction? Considering complications of joint mechanics in vivo including muscle bulging. Holzer et al. '20 Sci Rep demonstrated that different models of the human Achilles tendon moment arm predict vastly different relationships between the moment arm and joint angle.

      Our values for r and EMA closely align with previous papers which measured/calculate these values in kangaroos, such as Kram 1998, and thus we are confident in our interpretation.  

      This is a misleading results sentence: Small decreases in EMA correspond to a nontrivial increase in tendon stress, for instance, reducing EMA from 0.242 (mean minimum EMA of the slow group) to 0.206 (mean minimum EMA of the fast group) was associated with an ~18% increase in tendon stress. The authors could alternatively say that a ~15% decrease in EMA was associated with an ~18% increase in tendon stress, which seems pretty comparable.

      Thank you for pointing this out, it is important that it is made clearer. Although the change in relative magnitude is approximately the same (as it should be), this does not detract from the importance. The "small decrease in EMA" is referring to the absolute values, particularly in respect to the measurement error/noise. The difference is small enough to have been undetectable with other methods used in previous studies. We have amended the sentence to clarify this.

      It now reads: “Subtle decreases in EMA which may have been undetected in previous studies correspond to discernible increases in tendon stress. For instance, reducing EMA from 0.242 (mean minimum EMA of the slow group) to 0.206 (mean minimum EMA of the fast group) was associated with an increase in tendon stress from ~50 MPa to ~60 MPa, decreasing safety factor from 2 to 1.67 (where 1 indicates failure), which is both measurable and physiologically significant.” (Line 195-200)

      Lines 243-245: "The consistent net work observed among all speeds suggests the ankle extensors are performing similar amounts of ankle work independent of speed." If this is true, and presumably there is greater limb work performed on the center of mass at faster speeds (Donelan, Kram, Kuo), do more proximal leg joints increase work and energy consumption at faster speeds?

      The skin over the proximal leg joints (knee and hip) moves too much to get reliable measures of EMA from the ratio of moment arms. This will be pursued in future work when all muscles are incorporated in the model so knee and hip EMA can be determined from muscle force.

      We have added limitations and considerations paragraph to the manuscript: “Finally, we did not determine whether the EMA of proximal hindlimb joints (which are more difficult to track via surface motion capture markers) remained constant with speed. Although the hip and knee contribute substantially less work than the ankle joint (Fig. 4), the majority of kangaroo skeletal muscle is located around these proximal joints. A change in EMA at the hip or knee could influence a larger muscle mass than at the ankle, potentially counteracting or enhancing energy savings in the ankle extensor muscle-tendon units. Further research is needed to understand how posture and muscles throughout the whole body contribute to kangaroo energetics.” (Line 321-328)

      Lines 245-246: "Previous studies using sonomicrometry have shown that the muscles of tammar wallabies do not shorten considerably during hops, but rather act near-isometrically as a strut" Which muscles? All muscles? Extensors at a single joint?

      Added "gastrocnemius and plantaris" Line 164-165

      Lines 249-254: "The cost of generating force hypothesis suggests that faster movement speeds require greater rates of muscle force development, and in turn greater cross-bridge cycling rates, driving up metabolic costs (Taylor et al. 1980, Kram and Taylor 1990). The ability for the ankle extensor muscle fibres to remain isometric and produce similar amounts of work at all speeds may help explain why hopping macropods do not follow the energetic trends observed in quadrupedal species." These sentences confuse me. Kram & Taylor's cost of force-generating hypothesis assumes that producing the same average force over shorter contact times increases metabolic rate. How does 'similar muscle work' across all speeds explain the ability of macropods to use unique energetic trends in the cost of force-generating hypothesis context?

      Thank you for highlighting this confusion. We have substantially revised the discussion clarify where the mechanisms presented deviate from the cost of generating force hypothesis. Lines 270-309

      Reviewer #3 (Recommendations For The Authors):

      In addition to the points described in the public review, I have additional, related, specific comments:

      (1) Results: Please refer to the hypotheses in the results, and relate the the findings back to the hypotheses.

      We now relate the findings back to the hypotheses 

      Line 142 “In partial support of hypothesis (i), greater masses and faster speeds were associated with more crouched hindlimb postures (Fig. 3a,c).”.

      Lines 205-206: “The increase in tendon stress with speed, facilitated in part by the change in moment arms by the shift in posture, may explain changes in ankle work (c.f. Hypothesis (ii)).” 

      (2) Results: please provide the main statistical results either in-line or in a table in the main text.

      We (the co-authors) have discussed this at length, and have agreed that the manuscript is far more readable in the format whereby most statistics lie within the supplementary tables, otherwise a reader is met with a wall of statistics. We only include values in the main text when the magnitude is relevant to the arguments presented in the results and discussion.

      (3) Line 140: Describe how 'crouched' was defined.

      We have now added a brief definition of ‘Crouch factor’ after the figure caption. (Line 143) (Fig. 3a,c; where crouch factor is the ratio of total limb length to pelvis to toe distance).

      (4) Line 162: This seems to be a main finding and should be a figure in the main text not supplemental. Additionally, Supplementary Figures 3a and b do not show this finding convincingly There should be a figure plotting r vs speed and r vs mass.

      The combination of r and R are represented in the EMA plot in the main text. The r and R plots are relegated to the supplementary because the main text is already very crowded.  Thank you for the suggestion for the figure plotting r and R versus speed, this is now included as Suppl. Fig. 3h

      (5) Line 166: Supplementary Figure 3g does not show the range of dorsiflexion angles as a function of speed. It shows r vs dorsiflexion angle. Please correct.

      Thanks for noticing this, it was supposed to reference Fig 3g rather than Suppl Fig 3g in the sentence regarding speed. We have fixed this, Line 170. 

      We had added a reference to Suppl Fig 3 on Line 169 as this shows where the peak in r with ankle angle occurs (114.4 degrees).

      (6) Line 184: Where are the statistical results for this statement?

      The relationship between stress and EMA does not appear to be linear, thus we only present R<sup>^</sup>2 for the power relationship rather than a p-value. 

      (7) Line 192: The authors should explain how joint work and power relate/support the overall hypotheses. This section also refers to Figures 4 and 5 even though Figures 6 and 7 have already been described. Please reorganize.

      We have added a sentence at the end of the work and power section to mention hypothesis (ii) and lead into the discussion where it is elaborated upon. 

      “The increase in positive and negative ankle work may be due to the increase in tendon stress rather than additional muscle work.” Line 219-220 We have rearranged the figure order.

      (8) The statistics are not reported in the main text, but in the supplementary tables. If a result is reported in the main text, please report either in-line or with a table in the main text.

      We leave most statistics in the supplementary tables to preserve the readability of the manuscript. We only include values in the main text when the magnitude is relevant to the arguments raised in the results and discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutively active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics. This paper reports an interesting and important finding. It is somewhat derivative of Zhou et al., 2015. (https://www.science.org/doi/full/10.1126/science.aaa5619). The main novelty seems to be that they find quantitatively different conclusions upon conducting similar experiments, albeit with a different cell line (U2OS) than those used by Zhou et al. Sasaki et al. do show that increased K+ levels increase proliferation, which Zhou et al. did not look at. The data presented in this paper are a useful contribution to a field often lacking such data.

      Strengths:

      Bioelectricity is an important field for areas of cell, developmental, and evolutionary biology, as well as for biomedicine. Confirmation of ERK as a transduction mechanism and a characterization of the molecular details involved in the control of cell proliferation are interesting and impactful.

      Weaknesses:

      The authors lean heavily on the assumption that the Nernst equation is an accurate predictor of membrane potential based on K+ level. This is a large oversimplification that undermines the author's conclusions, most glaringly in Figure 2C. The author's conclusions should be weakened to reflect that the activity of voltage gated ion channels and homeostatic compensation are unaccounted for.

      We appreciate the reviewer’s thoughtful comment regarding our reliance on the Nernst equation to estimate membrane potential. We agree that the Nernst equation is a simplification and does not account for the activity of other ions, voltage-gated channels, or homeostatic compensation mechanisms. To address this concern, we conducted electrophysiological experiments in which the membrane potential was directly controlled using the perforated patch-clamp technique (Fig. 3). Under these conditions, we also monitored the membrane potential and confirmed that there was negligible drift within 20 minutes of perfusion with 145 mM K<sup>⁺</sup> (only a 1–5 mV change). These results suggest that the influence of voltage-gated channels and homeostatic compensation is minimal in our experimental setup. We revised the manuscript to clarify these limitations and to present our conclusions more cautiously in light of this point.

      “A potential limitation of extracellular K<sup>⁺</sup>-based approaches is their reliance on the Nernst equation to estimate membrane potential, which oversimplifies the actual situation by neglecting voltage-gated ion channel activity and compensatory mechanisms. To directly address this concern, we measured membrane potential using the perforated patch-clamp technique and confirmed that the potential was stable during perfusion with 145 mM K<sup>⁺</sup> (only a 1–5 mV drift within 20 min). Moreover, we used a voltage clamp to precisely control the membrane potential and demonstrated that ERK activity was directly regulated by the voltage itself, excluding the influence of other secondary factors. An additional strength of electrophysiology is its ability to examine the effects of repolarization, which is difficult to assess with conventional perfusion-based methods owing to slow solution exchange.”

      There are grammatical tense errors are made throughout the paper (ex line 99 "This kinetics should be these kinetics")

      We thank the reviewer for pointing out the grammatical errors. We carefully revised the entire manuscript.

      Line 71: Zhou et al. use BHK, N2A, PSA-3 cells, this paper uses U2OS (osteosarcoma) cells. Could that explain the differences in bioelectric properties that they describe? In general, there should be more discussion of the choice of cell line. Why were U2OS cells chosen? What are the implications of the fact that these are cancer cells, and bone cancer cells in particular? Does this paper provide specific insights for bone cancers? And crucially, how applicable are findings from these cells to other contexts?

      We thank the reviewer for this valuable comment regarding the choice of cell line. We selected U2OS cells primarily because they are well suited for live-cell FRET imaging. We did not use BHK, N2A, or PSA-3 cells, and therefore it is difficult for us to provide a clear comparison with the specific bioelectric properties reported in Zhou et al. Nevertheless, we agree that cancer cell lines, including U2OS, may exhibit bioelectric properties that differ from those of non-cancerous cells. While this could be a potential limitation, we are inclined to consider voltage-dependent ERK activation to be a fundamental and generalizable phenomenon, not restricted to osteosarcoma cells. The key components of this pathway—phosphatidylserine, Ras, MAPK (including ERK)—are expressed in essentially all mammalian cells. In support of our view, we observed voltage-dependent ERK activation not only in U2OS cells but also in HeLa, HEK293, and A431 cells. These results strongly suggest that the mechanism we describe is not cell-type specific but rather a universal feature of mammalian cells. In the revised Discussion, we expanded our rationale to choose U2OS cells, while addressing the potential implications of using a cancer-derived cell line. 

      “In this study, we primarily used U2OS cells because their flat morphology makes them suitable for live-cell FRET imaging. Although cancer cell lines, including U2OS, may display bioelectric properties that differ from those of noncancerous cells, our findings raise the possibility that voltage-dependent ERK activation is a fundamental and broadly applicable phenomenon rather than a feature specific to osteosarcoma cells. This conclusion is supported by the fact that essential components of this pathway, namely phosphatidylserine, Ras, and MAPK (including ERK), are ubiquitously expressed in mammalian cells. Consistent with this finding, we observed voltage-dependent ERK activation across multiple cell lines: U2OS, HeLa, HEK293, and A431 cells (Fig.S2). These observations indicate that the mechanism we describe is not cell-type-restricted, but rather a universal property of mammalian cells.”

      Line 115: The authors use EGF to calibrate 'maximal' ERK stimulation. Is this level near saturation? Either way is fine, but it would be useful to clarify.

      We thank the reviewer for raising this important point. The YFP/CFP ratio obtained after EGF stimulation is generally considered to represent saturation levels detectable by EKAREV imaging. However, we acknowledge that it remains uncertain whether 10 ng/mL EGF induces the absolute maximal ERK activity in all contexts. To clarify this point, we revised the manuscript (result) text as follows:

      “To normalize variation among cells, cells were stimulated with EGF (10 ng/mL) at the end of the experiment, which presumably yielded a near-saturated YFP/CFP value (ERK activity). This value was used to determine the maximum ERK activity in each cell”

      Line 121: Starting line 121 the authors say "Of note, U2OS cells expressed wild-type K-Ras but not an active mutant of K-Ras, which means voltage dependent ERK activation occurs not only in tumor cells but also in normal cells". Given that U2OS cells are bone sarcoma cells, is it appropriate to refer to these as 'normal' cells in contrast to 'tumor' cells?

      We thank the reviewer for pointing this out. We agree that it is not appropriate to contrast U2OS cells with “normal” cells, since they are sarcoma-derived. To address this point, we revised the sentence to weaken the claim and avoid the misleading terminology.

      “Importantly, as U2OS cells express wild-type K-Ras rather than an oncogenic mutant (16), our results raise the possibility that voltage-dependent ERK activation may also occur in non-transformed cells.”

      Line 101: These normalizations seem reasonable, the conclusions sufficiently supported and the requisite assumptions clearly presented. Because the dish-to-dish and cell-to-cell variation may reflect biologically relevant phenomena it would be ideal if non-normalized data could be added in supplemental data where feasible.

      We thank the reviewer for this helpful suggestion. As recommended, we added representative non-normalized data in the Supplemental Figure S1, which illustrates the non-normalized variation across cells and dishes.

      Figure 2C is listed as Figure 2D in the text

      There is no Figure 2F (Referenced in line 148)

      We thank the reviewer for pointing out these errors. The incorrect figure citations were corrected.

      Reviewer #2 (Public review):

      Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but a convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.

      However, there are still some concerns as detailed in specific comments below:

      Specific comments:

      (1) All the observations of ERK activation, by both high extracellular K+ and voltage clamp, could be explained by cell volume increase (more discussion in subsequent comments). There is a substantial literature on ERK activation by hypotonic cell swelling (e.g. https://doi.org/10.1042/bj3090013, https://doi.org/10.1002/j.1460-2075.1996.tb00938.x, among others). Here are some possible observations that could demonstrate that ERK activation by volume change is distinct from the effects reported here:

      (i) Does hypotonic shock activate ERK in U2OS cells?

      (ii) Can hypotonic shock activate ERK even after PS depletion, whereas extracellular K+ cannot?

      (iii) Does high extracellular K+ change cell volume in U2OS cells, measured via an accurate method such as fluorescence exclusion microscopy?

      (iv) It would be helpful to check the osmolality of all the extracellular solutions, even though they were nominally targeted to be iso-osmotic.

      (2) Some more details about the experimental design and the results are needed from Figure 1:

      (i) For how long are the cells serum-starved? From the Methods section, it seems like the G1 release in different K+ concentration is done without serum, is this correct? Is the prior thymidine treatment also performed in the absence of serum?

      (ii) There is a question of whether depolarization constitutes a physiologically relevant mechanism to regulate proliferation, and how depolarization interacts with other extracellular signals that might be present in an in vivo context. Does depolarization only promote proliferation after extended serum starvation (in what is presumably a stressed cell state)? What fraction of total cells are observed to be mitotic (without normalization), and how does this compare to the proliferation of these cells growing in serum-supplemented media? Can K+ concentration tune proliferation rate even in serum-supplemented media?

      (3) In Figure 2, there are some possible concerns with the perfusion experiment:

      (i) Is the buffer static in the period before perfusion with high K+, or is it perfused? This is not clear from the Methods. If it is static, how does the ERK activity change when perfused with 5 mM K+? In other words, how much of the response is due to flow/media exchange versus change in K+ concentration?

      (ii) Why do there appear to be population-average decreases in ERK activity in the period before perfusion with high K+ (especially in contrast to Fig. 3)? The imaging period does not seem frequent enough for photobleaching to be significant.

      (4) Figure 3 contains important results on couplings between membrane potential and MAPK signaling. However, there are a few concerns:

      (i) Does cell volume change upon voltage clamping? Previous authors have shown that depolarizing voltage clamp can cause cells to swell, at least in the whole-cell configuration: https://www.cell.com/biophysj/fulltext/S0006-3495(18)30441-7 . Could it be possible that the clamping protocol induces changes in ERK signaling due to changes in cell volume, and not by an independent mechanism?

      (ii) Does the -80 mV clamp begin at time 0 minutes? If so, one might expect a transient decrease in sensor FRET ratio, depending on the original resting potential of the cells. Typical estimates for resting potential in HEK293 cells range from -40 mV to -15 mV, which would reach the range that induces an ERK response by depolarizing clamp in Fig. 3B. What are the resting potentials of the cells before they are clamped to -80 mV, and why do we not see this downward transient?

      (5) The activation of ERK by perforated voltage clamp and by high extracellular K+ are each convincing, but it is unclear whether they need to act purely through the same mechanism - while additional extracellular K+ does depolarize the cell, it could also be affecting function of voltage-independent transporters and cell volume regulatory mechanisms on the timescales studied. To more strongly show this, the following should be done with the HEK cells where there is already voltage clamp data:

      (i) Measure resting potential using the perforated patch in zero-current configuration in the high K+ medium. Ideally this should be done in the time window after high K+ addition where ERK activation is observed (10-20 minutes) to minimize the possibility of drift due to changes in transporter and channel activity due to post-translational regulation.

      (ii) Measure YFP/CFP ratio of the HEK cells in the high K+ medium (in contrast to the U2OS cells from Fig. 2 where there is no patch data).

      (iii) The assertion that high K+ is equivalent to changes in Vmem for ERK signaling would be supported if the YFP/CFP change from K+ addition is comparable to that induced by voltage clamp to the same potential. This would be particularly convincing if the experiment could be done with each of the 15 mM, 30 mM, and 145 mM conditions.

      (6) Line 170: "ERK activity was reduced with a fast time course (within 1 minute) after repolarization to -80 mV." I don't see this in the data: in Fig. 3C, it looks like ERK remains elevated for > 10 min after the electrical stimulus has returned to -80 mV

      Comments on revisions:

      The authors have done a good job addressing the comments on the previous submission.

      Reviewer #3 (Public review):

      Summary:

      This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.

      Strengths

      A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.

      Weaknesses

      A weakness of the study is the data in Figure 1 showing that membrane depolarization results in an increase of cells entering mitosis. There are very few cells entering mitosis in their sample in any condition. This should be done with many more cells to increase the confidence in the results. The study also lacks a mechanistic link between ERK activation by membrane depolarization and increased cell proliferation.

      The authors did achieve their aims with the caveat that the cell proliferation results could be strengthened. The results, for the most par,t support the conclusions.

      This work suggests that alterations in membrane potential may have more physiological functions than action potential in the neural system as it has an effect on intracellular signalling and potentially cell proliferation.

      In the revised manuscript, the authors have now addressed the issues with Figure 1, and the data presented are much clearer. They did also attempt to pinpoint when in the cell cycle ERK is having its activity, but unfortunately, this was not conclusive.

      Reviewer #2 (Recommendations for the authors):

      Small issues:

      Fig. 1A. Please add a mark on the timeline showing when the K+ concentration is changed. Also, please add a time axis that matches the time axis in (C), so readers can know when in C the medium was changed.

      1B caption: unclear what "the images were 20 min before and after cytokinesis" means, given that the images go from -30 min to +20 min. Maybe the authors mean, "the indicated times are measured relative to cytokinesis."

      Thank you for bringing these points to our attention that can confuse readers. We revised the figure legend.

      Line 214: nonoclusters --> nanoclusters

      Line 475: 10 mm -> 10 ¥mum

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper presents results from four independent experiments, each of which tests for rhythmicity in auditory perception. The authors report rhythmic fluctuations in discrimination performance at frequencies between 2 and 6 Hz. The exact frequency depends on the ear and experimental paradigm, although some frequencies seem to be more common than others.

      Strengths:

      The first sentence in the abstract describes the state of the art perfectly: "Numerous studies advocate for a rhythmic mode of perception; however, the evidence in the context of auditory perception remains inconsistent". This is precisely why the data from the present study is so valuable. This is probably the study with the highest sample size (total of > 100 in 4 experiments) in the field. The analysis is very thorough and transparent, due to the comparison of several statistical approaches and simulations of their sensitivity. Each of the experiments differs from the others in a clearly defined experimental parameter, and the authors test how this impacts auditory rhythmicity, measured in pitch discrimination performance (accuracy, sensitivity, bias) of a target presented at various delays after noise onset.

      Weaknesses:

      (1) The authors find that the frequency of auditory perception changes between experiments. I think they could exploit differences between experiments better to interpret and understand the obtained results. These differences are very well described in the Introduction, but don't seem to be used for the interpretation of results. For instance, what does it mean if perceptual frequency changes from between- to within-trial pitch discrimination? Why did the authors choose this experimental manipulation? Based on differences between experiments, is there any systematic pattern in the results that allows conclusions about the roles of different frequencies? I think the Discussion would benefit from an extension to cover this aspect.

      We believe that interpreting these differences remains difficult and a precise, detailed (and possibly mechanistic) interpretation is beyond the goal of the present study. The main goal of this study was to explore the consistency and variability of effects across variations of the experimental design and samples of participants. Interpreting specific effects, e.g. at particular frequencies, would make sense mostly if differences between experiments have been confirmed in a separate reproduction. Still, we do provide specific arguments for why differences in the outcome between different experiments, e.g. with and without explicit trial initialization by the participants, could be expected. See lines 91ff in the introduction and 786ff in the discussion.

      (2) The Results give the impression of clear-cut differences in relevant frequencies between experiments (e.g., 2 Hz in Experiment 1, 6 Hz in Exp 2, etc), but they might not be so different. For instance, a 6 Hz effect is also visible in Experiment 1, but it just does not reach conventional significance. The average across the three experiments is therefore very useful, and also seems to suggest that differences between experiments are not very pronounced (otherwise the average would not produce clear peaks in the spectrum). I suggest making this point clearer in the text.

      We have revised the conclusions to note that the present data do not support clear cut differences between experiments. For this reason we also refrain from detailed interpretations of specific effects, as suggested by this reviewer in point 1 above.

      (3) I struggle to understand the hypothesis that rhythmic sampling differs between ears. In most everyday scenarios, the same sounds arrive at both ears, and the time difference between the two is too small to play a role for the frequencies tested. If both ears operate at different frequencies, the effects of the rhythm on overall perception would then often cancel out. But if this is the case, why would the two ears have different rhythms to begin with? This could be described in more detail.

      This hypothesis was not invented by us, but in essence put forward in previous work. The study by Ho et al. CurrBiol 2017 has reported rhythmic effects at different frequencies in the left and right ears, and we here tried to reproduce these effects. One could speculate about an ear-difference based on studies reporting a right-ear advantage in specific listening tasks, and the idea that different time scales of rhythmic brain activity may be specifically prevail in the left and right cortical hemispheres; hence it does not seem improbable that there could be rhythmic effects in both ears at different frequencies. We note this in the introduction, l. 65ff.

      Reviewer #2 (Public review):

      Summary:

      The current study aims to shed light on why previous work on perceptual rhythmicity has led to inconsistent results. They propose that the differences may stem from conceptual and methodological issues. In a series of experiments, the current study reports perceptual rhythmicity in different frequency bands that differ between different ear stimulations and behavioral measures.

      The study suggests challenges regarding the idea of universal perceptual rhythmicity in hearing.

      Strengths:

      The study aims to address differences observed in previous studies about perceptual rhythmicity. This is important and timely because the existing literature provides quite inconsistent findings. Several experiments were conducted to assess perceptual rhythmicity in hearing from different angles. The authors use sophisticated approaches to address the research questions.

      Weaknesses:

      (1) Conceptional concerns:

      The authors place their research in the context of a rhythmic mode of perception. They also discuss continuous vs rhythmic mode processing. Their study further follows a design that seems to be based on paradigms that assume a recent phase in neural oscillations that subsequently influence perception (e.g., Fiebelkorn et al.; Landau & Fries). In my view, these are different facets in the neural oscillation research space that require a bit more nuanced separation. Continuous mode processing is associated with vigilance tasks (work by Schroeder and Lakatos; reduction of low frequency oscillations and sustained gamma activity), whereas the authors of this study seem to link it to hearing tasks specifically (e.g., line 694). Rhythmic mode processing is associated with rhythmic stimulation by which neural oscillations entrain and influence perception (also, Schroeder and Lakatos; greater low-frequency fluctuations and more rhythmic gamma activity). The current study mirrors the continuous rather than the rhythmic mode (i.e., there was no rhythmic stimulation), but even the former seems not fully fitting, because trials are 1.8 s short and do not really reflect a vigilance task. Finally, previous paradigms on phase-resetting reflect more closely the design of the current study (i.e., different times of a target stimulus relative to the reset of an oscillation). This is the work by Fiebelkorn et al., Landau & Fries, and others, which do not seem to be cited here, which I find surprising. Moreover, the authors would want to discuss the role of the background noise in resetting the phase of an oscillation, and the role of the fixation cross also possibly resetting the phase of an oscillation. Regardless, the conceptional mixture of all these facets makes interpretations really challenging. The phase-reset nature of the paradigm is not (or not well) explained, and the discussion mixes the different concepts and approaches. I recommend that the authors frame their work more clearly in the context of these different concepts (affecting large portions of the manuscript).

      Indeed, the paradigms used here and in many similar previous studies incorporate an aspect of phase-resetting, as the presentation of a background noisy may effectively reset ongoing auditory cortical processes. Studies trying to probe for rhythmicity in auditory perception in the absence any background noise have not shown any effect (Zoefel and Heil, 2013), perhaps because the necessary rhythmic processes along auditory pathways are only engaged when some sound is present. We now discuss these points, and also acknowledge the mentioned studies in the visual system; l. 57.

      (2) Methodological concerns:

      The authors use a relatively unorthodox approach to statistical testing. I understand that they try to capture and characterize the sensitivity of the different analysis approaches to rhythmic behavioral effects. However, it is a bit unclear what meaningful effects are in the study. For example, the bootstrapping approach that identifies the percentage of significant variations of sample selections is rather descriptive (Figures 5-7). The authors seem to suggest that 50% of the samples are meaningful (given the dashed line in the figure), even though this is rarely reached in any of the analyses. Perhaps >80% of samples should show a significant effect to be meaningful (at least to my subjective mind). To me, the low percentage rather suggests that there is not too much meaningful rhythmicity present. 

      We note that there is no clear consensus on what fraction of experiments should be expected or how this way of quantifying effects should be precisely valued (l. 441ff). However, we now also clearly acknowledge in the discussion that the effective prevalence is not very high (l. 663).

      I suggest that the authors also present more traditional, perhaps multi-level, analyses: Calculation of spectra, binning, or single-trial analysis for each participant and condition, and the respective calculation of the surrogate data analysis, and then comparison of the surrogate data to the original data on the second (participant) level using t-tests. I also thought the statistical approach undertaken here could have been a bit more clearly/didactically described as well.

      We here realize that our description of the methods was possibly not fully clear. We do follow the strategy as suggested by this reviewer, but rather than comparing actual and surrogate data based on a parametric t-test, we compare these based on a non-parametric percentile-based approach. This has the advantage of not making specific (and possibly not-warranted) assumptions about the distribution of the data. We have revised the methods to clarify this, l. 332ff. 

      The authors used an adaptive procedure during the experimental blocks such that the stimulus intensity was adjusted throughout. In practice, this can be a disadvantage relative to keeping the intensity constant throughout, because, on average, correct trials will be associated with a higher intensity than incorrect trials, potentially making observations of perceptual rhythmicity more challenging. The authors would want to discuss this potential issue. Intensity adjustments could perhaps contribute to the observed rhythmicity effects. Perhaps the rhythmicity of the stimulus intensity could be analyzed as well. In any case, the adaptive procedure may add variance to the data.

      We have added an analysis of task difficulty to the results (new section “Effects of adaptive task difficulty“) to address this. Overall we do not find systematic changes in task difficulty across participants for most of the experiments, but for sure one cannot rule out that this aspect of the design also affects the outcomes.  Importantly, we relied on an adaptive task difficulty to actually (or hopefully) reduce variance in the data, by keeping the task-difficulty around a certain level. Give the large number of trials collected, not using such an adaptive produce may result in performance levels around chance or near ceiling, which would make impossible to detect rhythmic variations in behavior. 

      Additional methodological concerns relate to Figure 8. Figures 8A and C seem to indicate that a baseline correction for a very short time window was calculated (I could not find anything about this in the methods section). The data seem very variable and artificially constrained in the baseline time window. It was unclear what the reader might take from Figure 8.

      This figure was intended mostly for illustration of the eye tracking data, but we agree that there is no specific key insight to be taken from this. We removed this. 

      Motivation and discussion of eye-movement/pupillometry and motor activity: The dual task paradigm of Experiment 4 and the reasons for assessing eye metrics in the current study could have been better motivated. The experiment somehow does not fit in very well. There is recent evidence that eye movements decrease during effortful tasks (e.g., Contadini-Wright et al. 2023 J Neurosci; Herrmann & Ryan 2024 J Cog Neurosci), which appears to contradict the results presented in the current study. Moreover, by appealing to active sensing frameworks, the authors suggest that active movements can facilitate listening outcomes (line 677; they should provide a reference for this claim), but it is unclear how this would relate to eye movements. Certainly, a person may move their head closer to a sound source in the presence of competing sound to increase the signal-to-noise ratio, but this is not really the active movements that are measured here. A more detailed discussion may be important. The authors further frame the difference between Experiments 1 and 2 as being related to participants' motor activity. However, there are other factors that could explain differences between experiments. Self-paced trials give participants the opportunity to rest more (inter-trial durations were likely longer in Experiment 2), perhaps affecting attentional engagement. I think a more nuanced discussion may be warranted.

      We expanded the motivation of why self-pacing trials may effectively alter how rhythmic processes affect perception, and now also allude to attention and expectation related effects (l. 786ff). Regarding eye movements we now discuss the results in the light of the previously mentioned studies, but again refrain from a very detailed and mechanistic interpretation (l. 782).

      Discussion:

      The main data in Figure 3 showed little rhythmicity. The authors seem to glance over this fact by simply stating that the same phase is not necessary for their statistical analysis. Previous work, however, showed rhythmicity in the across-participant average (e.g., Fiebelkorn's and similar work). Moreover, one would expect that some of the effects in the low-frequency band (e.g., 2-4 Hz) are somewhat similar across participants. Conduction delays in the auditory system are much smaller than the 0.25-0.5 s associated with 2-4 Hz. The authors would want to discuss why different participants would express so vastly different phases that the across-participant average does not show any rhythmicity, and what this would mean neurophysiologically.

      We now discussion the assumptions and implications of similar or distinct phases of rhythmic processes within and between participants (l. 695ff). In particular we note that different origins of the underlying neurophysiological processes eventually may suggest that such assumptions are or a not warranted.  

      An additional point that may require more nuanced discussion is related to the rhythmicity of response bias versus sensitivity. The authors could discuss what the rhythmicity of these different measures in different frequency bands means, with respect to underlying neural oscillations.

      We expanded discussion to interpret what rhythmic changes in each of the behavioral metric could imply (l. 706ff).

      Figures:

      Much of the text in the figures seems really small. Perhaps the authors would want to ensure it is readable even for those with low vision abilities. Moreover, Figure 1A is not as intuitive as it could be and may perhaps be made clearer. I also suggest the authors discuss a bit more the potential monoaural vs binaural issues, because the perceptual rhythmicity is much slower than any conduction delays in the auditory system that could lead to interference.

      We tried to improve the font sizes where possible, and discuss the potential monaural origins as suggested by other reviewers. 

      Reviewer #3 (Public review):

      Summary:

      The finding of rhythmic activity in the brain has, for a long time, engendered the theory of rhythmic modes of perception, that humans might oscillate between improved and worse perception depending on states of our internal systems. However, experiments looking for such modes have resulted in conflicting findings, particularly in those where the stimulus itself is not rhythmic. This paper seeks to take a comprehensive look at the effect and various experimental parameters which might generate these competing findings: in particular, the presentation of the stimulus to one ear or the other, the relevance of motor involvement, attentional demands, and memory: each of which are revealed to effect the consistency of this rhythmicity.

      The need the paper attempts to resolve is a critical one for the field. However, as presented, I remain unconvinced that the data would not be better interpreted as showing no consistent rhythmic mode effect. It lacks a conceptual framework to understand why effects might be consistent in each ear but at different frequencies and only for some tasks with slight variants, some affecting sensitivity and some affecting bias.

      Strengths:

      The paper is strong in its experimental protocol and its comprehensive analysis, which seeks to compare effects across several analysis types and slight experiment changes to investigate which parameters could affect the presence or absence of an effect of rhythmicity. The prescribed nature of its hypotheses and its manner of setting out to test them is very clear, which allows for a straightforward assessment of its results

      Weaknesses:

      There is a weakness throughout the paper in terms of establishing a conceptual framework both for the source of "rhythmic modes" and for the interpretation of the results. Before understanding the data on this matter, it would be useful to discuss why one would posit such a theory to begin with. From a perceptual side, rhythmic modes of processing in the absence of rhythmic stimuli would not appear to provide any benefit to processing. From a biological or homeostatic argument, it's unclear why we would expect such fluctuations to occur in such a narrow-band way when neither the stimulus nor the neurobiological circuits require it.

      We believe that the framework for why there may be rhythmic activity along auditory pathways that shapes behavioral outcomes has been laid out in many previous studies, prominently here (Schroeder et al., 2008; Schroeder and Lakatos, 2009; Obleser and Kayser, 2019). Many of the relevant studies are cited in the introduction, which is already rather long given the many points covered in this study. 

      Secondly, for the analysis to detect a "rhythmic mode", it must assume that the phase of fluctuations across an experiment (i.e., whether fluctuations are in an up-state or down-state at onset) is constant at stimulus onset, whereas most oscillations do not have such a total phase-reset as a result of input. Therefore, some theoretical positing of what kind of mechanism could generate this fluctuation is critical toward understanding whether the analysis is well-suited to the studied mechanism.

      In line with this and previous comments (by reviewer 2) we have expanded the discussion to consider the issue of phase alignment (l. 695ff). 

      Thirdly, an interpretation of why we should expect left and right ears to have distinct frequency ranges of fluctuations is required. There are a large number of statistical tests in this paper, and it's not clear how multiple comparisons are controlled for, apart from experiment 4 (which specifies B&H false discovery rate). As such, one critical method to identify whether the results are not the result of noise or sample-specific biases is the plausibility of the finding. On its face, maintaining distinct frequencies of perception in each ear does not fit an obvious conceptual framework.

      Again this point was also noted by another reviewer and we expanded the introduction and discussion in this regard (l. 65ff).

      Reviewer #1 (Recommendations for the authors):

      (1) An update of the AR-surrogate method has recently been published (https://doi.org/10.1101/2024.08.22.609278). I appreciate that this is a lot of work, and it is of coursee up to the authors, but given the higher sensitivity of this method, it might be worth applying it to the four datasets described here.

      Reading this article we note that our implementation of the AR-surrogate method was essentially as suggested here, and not as implemented by Brookshire. In fact we had not realized that Brookshire had apparently computed the spectrum based on the group-average data. As explained in the Methods section, as now clarified even better, we compute for each participant the actual spectrum of this participant’s data, and a set of surrogate spectra. We then perform a group-average of both to compute the p-value of the actual group-average based on the percentile of the distribution of surrogate averages. This send step differs from Harris & Beale, which used a one-sided t-test. The latter is most likely not appropriate in a strict statistical sense, but possibly more powerful for detecting true results compared to the percentile-based approach that we used (see l. 332ff).

      (2) When results for the four experiments are reported, a reminder for the reader of how these experiments differ from each other would be useful.

      We have added this in the Results section.

      "considerable prevalence of differences around 4Hz, with dual‐task requirements leading to stronger rhythmicity in perceptual sensitivity". There is a striking similarity to recently published data (https://doi.org/10.1101/2024.08.10.607439 ) demonstrating a 4-Hz rhythm in auditory divided attention (rather than between modalities as in the present case). This could be a useful addition to the paragraph.

      We have added a reference to this preprint, and additional previous work pointing in the same direction mentioned in there.  

      (3) There are two typos in the Introduction: "related by different from the question", and below, there is one "presented" too much.

      These have been fixed.

      Reviewer #3 (Recommendations for the authors):

      My major suggestion is that these results must be replicated in a new sample. I understand this is not simple to do and not always possible, but at this point, no effect is replicated from one experiment to the next, despite very small changes in protocol (especially experiment 1 vs 2). It's therefore very difficult to justify explaining the different effects as real as opposed to random effects of this particular sample. While the bootstrapping effects show the level of consistency of the effect within the sample studied, it can not be a substitute for a true replication of the results in a new sample.

      We agree that only an independent replication can demonstrate the robustness of the results. We do consider experiment 1 a replication test of Ho et al. CurrBiol 2017, which results in different results than reported there. But more importantly, we consider the analysis of ‘reproducibility’ by simulating participant samples a key novelty of the present work, and want to emphasize this over the within-study replication of the same experiment.  In fact, in light of the present interpretation of the data, even a within-study replication would most likely not offer a clear-cut answer. 

      As I said in the public review, the interpretation of the results, and of why perceptual cycles in arhythmic stimuli could be a plausible theory to begin with, is lacking. A conceptual framework would vastly improve the impact and understanding of the results.

      We tried to strengthen the conceptual framework in the introduction. We believe that this is in large provided by previous work, and the aim of the present study was to explore the robustness of effects and not to suggest and discover novel effects. 

      Minor comments:

      (1) The authors adapt the difficulty as a function of performance, which seems to me a strange choice for an experiment that is analyzing the differences in performance across the experiment. Could you add a sentence to discuss the motivation for this choice?

      We now mention the rationale in the Methods section and in a new section of the Results. There we also provide additional analyses on this parameter.

      (2) The choice to plot the p-values as opposed to the values of the actual analysis feels ill-advised to me. It invites comparison across analyses that isn't necessarily fair. It would be more informative to plot the respective analysis outputs (spectral power, regression, or delta R2) and highlight the windows of significance and their overlap across analyses. In my opinion, this would be more fair and accurate depiction of the analyses as they are meant to be used.

      We do disagree. As explained in the Methods (l. 374ff): “(Showing p-values) … allows presenting the results on a scale that can be directly compared between analysis approaches, metrics, frequencies and analyses focusing on individual ears or the combined data. Each approach has a different statistical sensitivity, and the underlying effect sizes (e.g. spectral power) vary with frequency for both the actual data and null distribution. As a result, the effect size reaching statistical significance varies with frequency, metrics and analyses.” 

      The fact that the level of power (or R2 or whatever metric we consider) required to reach significance differs between analyses (one ear, both ears), metrics (d-prime, bias, RT) and between analyses approaches makes showing the results difficult, as we would need a separate panel for each of those. This would multiply the number of panels required e.g. for Figure 4 by 3, making it a figure with 81 axes. Also neither the original quantities of each analysis (e.g. spectral power) nor the p-values that we show constitute a proper measure of effect size in a statistical sense. In that sense, neither of these is truly ideal for comparing between analyses, metrics etc. 

      We do agree thought that many readers may want to see the original quantification and thresholds for statistical significance. We now show these in an exemplary manner for the Binned analysis of Experiment 1, which provides a positive result and also is an attempt to replicate the findings by  Ho et al 2017. This is shown in new Figure 5. 

      (3) Typo in line 555 (+ should be plus minus).

      (4) Typo in line 572: "Comparison of 572 blocks with minus dual task those without"

      (5) Typo in line 616: remove "one".

      (6) Line 666 refers to effects in alpha band activity, but it's unclear what the relationship is to the authors' findings, which peak around 6 Hz, lower than alpha (~10 Hz).

      (7) Line 688 typo, remove "amount of".

      These points have been addressed.  

      (8) Oculomotor effect that drives greater rhythmicity at 3-4 Hz. Did the authors analyze the eye movements to see if saccades were also occurring at this rate? It would be useful to know if the 3-4 Hz effect is driven by "internal circuitry" in the auditory system or by the typical rate of eye movement.

      A preliminary analysis of eye movement data was in previous Figure 8, which was removed on the recommendation of another review.  This showed that the average saccade rate is about 0.01 saccade /per trial per time bin, amounting to on average less than one detected saccade per trial. Hence rhythmicity in saccades is unlikely to explain rhythmicity in behavioral data at the scale of 34Hz. We now note this in the Results.

      Obleser J, Kayser C (2019) Neural Entrainment and Attentional Selection in the Listening Brain. Trends Cogn Sci 23:913-926.

      Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci 32:9-18.

      Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106-113.

      Zoefel B, Heil P (2013) Detection of Near-Threshold Sounds is Independent of EEG Phase in Common Frequency Bands. Front Psychol 4:262.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e., those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light-responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light-responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light-sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating the generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. They go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate that their sensors work in the gut - and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      (1) The experiments are well-founded, well-executed, and rigorous.

      (2) The manuscript is clearly written.

      (3) The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      (4) This study is a valuable contribution to photobiology and optogenetics.

      We thank the reviewer for the positive verdict on our manuscript.

      Weaknesses:

      (1) As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g., in vivo).

      We principally concur with this reviewer’s assessment that delivery of light (of any color) into living tissue can be severely limited by absorption, reflection, and scattering. That notwithstanding, at least two considerations suggest that in-vivo deployment of the pNIRusk setups we presently advance may be feasible.

      First, while the pNIRusk setups are indeed less light-sensitive compared to, e.g., our earlier redlight-responsive pREDusk and pDERusk setups (see Meier et al. Nat Commun 2024), we note that the overall light fluences required for triggering them are in the range of tens of µW per cm<sub>2</sub>. By contrast, optogenetic experiments in vivo, in particular in the neurosciences, often employ light area intensities on the order of mW per cm<sub>2</sub> and above. Put another way, compared to the optogenetic tools used in these experiments, the pNIRusk setups are actually quite sensitive to light.

      Second, sensitivity to NIR light brings the advantage of superior tissue penetration, see data reported by Weissleder Nat Biotech 2001 and Ash et al. Lasers Med Sci 2017 (both papers are cited in our manuscript). Based on these data, the intensity of blue light (450 nm) therefore falls off 5-10 times more strongly with penetration depth than that of NIR light (800 nm).

      We have added a brief treatment of these aspects in the Discussion section.

      (2) Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g., blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

      The reviewer is correct in noting that, at least to a certain extent, the pNIRusk systems also respond to blue light owing to their Soret absorbance bands (see Fig. 1). That said, we note two points:

      First, a given photoreceptor that preferentially responds to certain wavelengths, e.g., 700 nm in the case of conventional bacterial phytochromes (BphP), generally absorbs shorter wavelengths to some degree as well. Absorption of these shorter wavelengths suffices for driving electronic and/or vibronic transitions of the chromophore to higher energy levels which often give rise to productive photochemistry and downstream signal transduction. Put another way, a certain response of sensory photoreceptors to shorter wavelengths is hence fully expected and indeed experimentally borne out, as for instance shown by Ochoa-Fernandez et al. in the so-called PULSE setup (Nat Meth 2020, doi: 10.1038/s41592-020-0868-y).

      Second, known BphPs share similar Pr and Pfr absorbance spectra. We therefore expect other BphP-based optogenetic setups to also respond to blue light to some degree. Currently, there are insufficient data to gauge whether individual BphPs systematically differ in their relative sensitivity to blue compared to red or NIR light. Arguably, pertinent experiments may be an interesting subject for future study.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Weaknesses:

      (1) The expression of multi-gene operons and fluorescent reporters could impose a metabolic burden. The authors should present data comparing optical density for growth curves of engineered strains versus the corresponding empty-vector control to provide insight into the burden and overall impact of the system on host viability and growth.

      In response to this comment, we have recorded growth kinetics of bacteria harboring the pNIRusk-DsRed plasmids or empty vectors under both inducing (i.e., under NIR light) and noninducing conditions (i.e., darkness). We did not observe systematic differences in the growth kinetics between the different cultures, thus suggesting that under the conditions tested there is no adverse effect on cell viability.

      We include the new data in Suppl. Fig. 5c-d and refer to them in the main text.

      (2) The manuscript consistently presents normalized fluorescence values, but the method of normalization is not clear (Figure 2 caption describes normalizing to the maximal fluorescence, but the maximum fluorescence of what?). The authors should provide a more detailed explanation of how the raw fluorescence data were processed. In addition, or potentially in exchange for the current presentation, the authors should include the raw fluorescence values in supplementary materials to help readers assess the actual magnitude of the reported responses.

      We appreciate this valid comment and have altered the representation of the fluorescence data. All values for a given fluorescent protein (i.e., either DsRed or YPet) across all systems are now normalized to a single reference value, thus enabling direct comparison between experiments.

      (3) Related to the prior point, it would be useful to have a positive control for fluorescence that could be used to compare results across different figure panels.

      As all data are now normalized to the same reference value, direct comparison across all figures is enabled.

      (4) Real-time gene expression data are not presented in the current manuscript, but it would be helpful to include a time-course for some of the key designs to help readers assess the speed of response to NIR light.

      In response to this comment, we include in the revised manuscript induction kinetics of bacterial cultures bearing pNIRusk upon transfer to inducing NIR-light conditions. To this end, aliquots were taken at discrete timepoints, transcriptionally and translationally arrested, and analyzed for optical density and DsRed reporter fluorescence after allowing for chromophore maturation.

      We include the new data in Suppl. Fig. 5e and refer to them in the manuscript.

      Moreover, we note that the experiments in Agrobacterium tumefaciens used a luciferase reporter thus enabling the continuous monitoring of the light-induced expression kinetics. These data (unchanged in revision) are to be found in Suppl. Fig. 9.

      Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for the regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathyBphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light-activated systems. The experiments are performed carefully, and the manuscript is well written.

      Weaknesses:

      My major criticism is that some information is difficult to obtain, and some data is presented with limited interpretation, making it difficult to obtain intuition for why certain responses are observed. For example, the changes in red/infrared responses across different figures and cellular contexts are reported but not rationalized. Extensive experiments with variable linker sequences were performed, but the rationale for linker choices was not clearly explained. These are minor weaknesses in an overall very strong paper.

      We are grateful for the positive take on our manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) As eLife is a broad audience journal, please define the Soret and Q-bands (line 125).

      We concur and have added labels in fig. 1a that designate the Soret and Q bands.

      (2) The initial (0) Ac design in Figure 2b is activated by NIR and Red light, albeit modestly. The authors state that this construct shows "constant reporter fluorescence, largely independent of illumination" (line 167). This language should be changed to reflect the fact that this Ac construct responds to both of these wavelengths.

      Agreed. We have amended the text accordingly.

      (3) pNIRusk Ac 0 appears to show a greater light response than pNIRusk Av -5. However, the authors claim that the former is not light-responsive and the latter is. This conclusion should be explained or changed.

      The assignment of pNIRusk Av-5 as light-responsive is based on the relative difference in reporter fluorescence between darkness and illumination with either red or NIR light. Although the overall fluorescence is much lower in Av-5 than for Av-0, the relative change upon illumination is much more pronounced. We add a statement to this effect to the text.

      (4) The authors state that "when combining DmDERusk-Str-YPet with AvTod+21-DsRed expression rose under red and NIR light, respectively, whereas the joint application of both light colors induced both reporter genes" (lines 258-261). In contrast, Figure 3c shows that application of both wavelengths of light results in exclusive activation of YPet expression. It appears the description of the data is wrong and must be corrected. That said, this error does not impact their conclusion that two separate target genes can be independently activated by NIR and red light.

      We thank the reviewer for catching this error which we have corrected in the revised manuscript.

      (5) Line 278: I don't agree with the authors' blanket statement that the use of upconversion nanoparticles is a "grave" limitation for NIR-light mediated activation of bacterial gene expression in vivo. The authors should either expound on the severity of the limitation or use more moderate language.

      We have replaced the word ‘grave’ by ‘potential’ and thereby toned down our wording.

      Reviewer #2 (Recommendations for the authors):

      (1) Please include a discussion on the expected depth penetration of different light wavelengths. This is most relevant in the context of the discussion about how these NIR systems could be used with living therapeutics.

      Given the heterogeneity of biological tissue, it is challenging to state precise penetration depths for different wavelengths of light. That said, blue light for instance is typically attenuated by biological tissue around 5 to 10 times as strongly as near-infrared light is.

      We have expanded the Discussion chapter to cover these aspects.

      (2) It would be helpful for Figure 2C (or supplementary) to also include the response to blue light stimulation.

      We agree and have acquired pertinent data for the blue-light response. The new data are included in an updated Fig. 2c. Data acquired at varying NIR-light intensities, originally included in Fig. 2c, have been moved to Suppl. Fig. 5a-b.

      (3) In Figure 4A, data on the response of E. coli Nissle to blue and red light are missing. Including this would help identify whether the reduced sensitivity to non-NIR wavelengths observed in the E. coli lab strain is preserved in the probiotic background.

      In response to this comment, we have acquired pertinent data on E. coli Nissle. While the results were overall similar to those in the laboratory strain, the response to blue and NIR light was yet lower in the Nissle bacteria which stands to benefit optogenetic applications.

      We have updated Fig. 4a accordingly. For clarity, we only show the data for AvNIRusk in the main paper but have relegated the data on AcNIRusk to Suppl. Fig. 8. (Note that this has necessitated a renumbering of the subsequent Suppl. Figs.)

      (4) On many of the figures, there are thin gray lines that appear between the panels that it would be nice to eliminate because, in some cases, they cut through words and numbers.

      The grey lines likely arose from embedding the figures into the text document. In the typeset manuscript, which has become available on the eLife webpage in the meantime, there are no such lines. That said, we will carefully check throughout the submission/publishing/proofing process lest these lines reappear.

      (5) Page 7, line 155: "As not least seen" typo or awkward phrasing.

      We have restructured the sentence and thereby hopefully clarified the unclear phrasing.

      (6) Page 7, line 167: It does not appear to be the case that the initial pNIRusk designs show constant fluorescence that is largely independent of illumination. AcNIRusk shows an almost twofold change from dark to NIR. Reword this to avoid confusion.

      We concur with this comment, similar to reviewer #1’s remark, and have adjusted the text accordingly.

      (7) Page 8, line 174: Related to the previous point, AvNIRusk has one design that is very minimally light switchable (-5), so stating that six light switchable designs have been identified is also confusing.

      As stated in our response to reviewer #1 above, the assignment of AvNIRusk-5 as light-switchable is based on the relative fluorescence change upon illumination. We have added an explanation to the text.

      (8) Page 10, line 228-229: I was not able to find the data showing that expression levels were higher for the DmTtr systems than the pREDusk and pNIRusk setups. This may be an issue related to the normalization point. It was not clear to me how to compare these values.

      We apologize for the initially unclear representation of the data. In response to this reviewer’s general comments above, we have now normalized all fluorescence values to a single reference value, thus allowing their direct comparison.

      (9) Page 12, line 264: "finer-grained expression control can be exerted..." Either show data or adjust the language so that it is clear this is a prediction.

      True, we have replaced ‘can’ by ‘could’.

      (10) Page 25, line 590: CmpX13 cells have a reference that is given later, but it should be added where it first appears.

      Agreed, we have added the reference in the indicated place.

      (11) Page 25, line 592: define LB/Kan.

      We had already defined this abbreviation further up but, for clarity, we have added it again in the indicated position.

      (12) Page 40, line 946: "normalized by" rather than "to".

      We have implemented the requested change in the indicated and several other positions of the manuscript.

      (13) Figures 2C, 3C, and similar plots in the supplementary material would benefit from having a legend for the colors.

      We agree and have added pertinent legends to the corresponding main and supplementary figures.

      (14) As a reader, I had some trouble following all the acronyms. This is at the author's discretion, but I would eliminate ones that are not strictly essential (e.g. MTP for microtiter plate; I was unable to identify what "MCS" meant; look for other opportunities to remove acronyms).

      In the revised manuscript, we have defined the abbreviation ‘MCS’ (for ‘multiple-cloning site’) upon first occurrence. We have decided to retain the abbreviation ‘MTP’ in the text.

      (15) Could the authors briefly speculate on why A. tumefaciens activation with red light might occur?

      While we can but speculate as to the underlying reasons for the divergent red-light response in A. tumefaciens, we discuss possible scenarios below.

      Commonly, two-component systems (TCS) exhibit highly cooperative and steep responses to signal. As a consequence, even small differences in the intracellular amounts of phosphorylated and unphosphorylated response regulator (RR) can give to significantly changed gene-expression output. Put another way, the gene-expression output need not scale linearly with the extent of RR phosphorylation but, rather, is expected to show nonlinear dependence with pronounced thresholding effects.

      Differences in the pertinent RR levels can for instance arise from variations in the expression levels of the pNIRusk system components between E. coli and A. tumefaciens. Moreover, the two bacteria greatly differ in their two-component-system (TCS) repertoire. Although TCSs are commonly well insulated from each other, cross-talk with endogenous TCSs, even if limited, may cause changes in the levels of phosphorylated RR and hence gene-expression output. In a similar vein, the RR can also be phosphorylated and dephosphorylated non-enzymatically, e.g., by reaction with high-energy anhydrides (such as acetyl phosphate) and hydrolysis, respectively. Other potential origins for the divergent red-light response include differences in the strength of the promoters driving expression of the pNIRusk system components and the fluorescent/luminescent reporters, respectively.

      (16) It would be helpful for the authors to briefly explain why they needed to switch to luminescence from fluorescence for the A. tumeraciens studies.

      While there was no strict necessity to switch from the fluorescence-based system used in E. coli to a luminescence-based system in A. tumefaciens, we opted for luminescence based on prior experience with other Alphaproteobacteria (e.g., 10.1128/mSystems.00893-21), where luminescence offered significant advantages. Specifically, it provides essentially background-free signal detection and greater sensitivity for monitoring gene expression. In addition, as demonstrated in Suppl. Fig. 9c and d, the luminescence system enables real-time tracking of gene expression dynamics, which further supported its use in our experimental setup (see our response to reviewer #2’s general comments).

      (17) This is a very minor comment that the authors can take or leave, but I got hung up on the word "implement" when it appeared a few times in the manuscript because I tended to read it as "put a plan into place" rather than its other meaning.

      In the abstract, we have replaced one instance of the word ‘implement’ by ‘instrument’.

      (18) The authors should include the relevant constructs on AddGene or another public strainsharing service.

      We whole-heartedly subscribe to the idea of freely sharing research materials with fellow scientists. Therefore, we had already deposited the most relevant AvNIRusk in Addgene, even prior to the initial submission of the manuscript (accession number 235084). In the meantime, we have released the deposition, and the plasmid can be obtained from Addgene since May 15<sub>th</sub> of this year.

      Reviewer #3 (Recommendations for the authors):

      Suggestion for improvement:

      This paper relies heavily on variations in linker sequences to shift responses. I am familiar with prior work from the Moglich lab in which helical linkers were employed to shift responses in synthetic two-component systems, with interesting periodicity in responses with every 7 residues (as expected for an alpha helix) and inversion of responses at smaller linker shifts. There is no mention in this paper whether their current engineering follows a similar rationale, what types of linkers are employed (e.g. flexible vs helical), and whether there is an interpretation for how linker lengths alter responses. Can you explain what classes of linker sequences are used throughout Figures 2 and 3, and whether length or periodicity affects the outcome? This would be very helpful for readers who are new to this approach, or if the rationale here differs from the authors' prior work.

      The PATCHY approach employed at present followed a closely similar rationale as in our previous studies. That is, linkers were extended/shortened and varied in their sequence by recombining different fragments of the natural linkers of the parental receptors, i.e., the bacteriophytochrome and the FixL sensor histidine kinase, respectively. We have added a statement to this effect in the text and a reference to Suppl. Fig. 3 which illustrates the principal approach.

      Compared to our earlier studies, we isolated fewer receptor variants supporting light-regulated responses, despite covering a larger sequence space. Owing to the sparsity of the light-regulated variants, an interpretation of the linker properties and their correlation with light-regulated activity is challenging. Although doubtless unsatisfying from a mechanistic viewpoint, we therefore refrain from a pertinent discussion which would be premature and speculative at this point. As the reviewer raises a valid and important point, we have expanded the text by referring to our earlier studies and the observed dependence of functional properties on linker composition.

      It is sometimes difficult to intuit or rationalize the differences in red/IR sensitivity across closely related variants. An important example appears in Figure 3C vs 3B. I think the AvTod+21 in 3B should be the equivalent to the DsRed response in the second column of 3C (AvTod+21 + DmDERusk), except, of course, that the bacteria in 3C carry an additional plasmid for the DERusk system. However, in 3B, the response to red light is substantial - ~50% as strong as that for IR, whereas in 3C, red light elicits no response at all. What is the difference? The reason this is important is that the AvTod+21 and DMDERusk represent the best "orthogonal" red and infrared light responses, but this is not at all obvious from 3B, where AvTod+21 still causes a substantial (and for orthogonality, undesirable) response under red light. Perhaps subtle differences in expression level due to plasmid changes cause these differences in light responses? Could the authors test how the expression level affects these responses? The paper would be greatly improved if observations of the diverse red/IR responses could be rationalized by some design criteria.

      As noted above in our response to reviewer #2, we have now normalized all fluorescence readings to joint reference values, thus allowing a better comparison across experiments.

      The reviewer is correct in noting that upon multiplexing, the individual plasmid systems support lower fluorescence levels than when used in isolation. We speculate that the combination of two plasmids may affect their copy numbers (despite the use of different resistance markers and origins of replications) and hence their performance. Likewise, the cellular metabolism may be affected when multiple plasmids are combined. These aspects may well account for the absent red-light response in AvTod+21 in the multiplexing experiments which is – indeed – unexpected. As, at present, we cannot provide a clear rationalization for this effect, we recommend verifying the performance of the plasmid setups when multiplexing.

      The paper uses "red" and "infrared" to refer to ~624 nm and ~800 nm light, respectively. I wonder whether it might be possible to shift these peak wavelengths to obtain even better separation for the multiplexing experiments. Perhaps shifting the specific red wavelength could result in better separation between DERusk and AvTod systems, for example? Could the authors comment on this (maybe based on action spectra of their previously developed tools) or perhaps test a few additional stimulation wavelengths?

      The choice of illumination wavelengths used in these experiments is dictated by the LED setups available for illumination of microtiter plates. On the one hand, we are using an SMD (surface-mount device) three-color LED with a fixed wavelength of the red channel around 624 nm (see Hennemann et al., 2018). On the other hand, we are deploying a custom-built device with LEDs emitting at around 800 nm (see Stüven et al., 2019 and this work). Adjusting these wavelengths is therefore challenging, although without doubt potentially interesting.

      To address this reviewer comment, we have added a statement to the text that the excitation wavelengths may be varied to improve multiplexed applications.

      Additional minor comments:

      (1) Figure 2C: It would be very helpful to place a legend on the figure panel for what the colors indicate, since they are unique to this panel and non-intuitive.

      This comment coincides with one by reviewer #2, and we have added pertinent legends to this and related supplementary figures.

      (2) Figure 3C: it is not obvious which system uses DsRed and which uses YPet in each combination, since the text indicates that all combinations were cloned, and this is not clearly described in the legend. Is it always the first construct in the figure legend listed for DsRed and the second for YPet?

      For clarification, we have revised the x-axis labels in Fig. 3C. (And yes, it is as this reviewer surmises: the first of the two constructs harbored DsRed and the second one YPet.)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting study of the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging (’pooling’) of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.

      More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.

      Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach and I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics, and the physiology of low- and mid-level vision.

      The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy-to-follow manner. The authors could also have been more explicit about the assumptions that they make.

      Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.

      Reviewer #2 (Public Review):

      Summary

      This paper expands on the literature on spatial metamers, evaluating different aspects of spatial metamers including the effect of different models and initialization conditions, as well as the relationship between metamers of the human visual system and metamers for a model. The authors conduct psychophysics experiments testing variations of metamer synthesis parameters including type of target image, scaling factor, and initialization parameters, and also compare two different metamer models (luminance vs energy). An additional contribution is doing this for a field of view larger than has been explored previously

      General Comments

      Overall, this paper addresses some important outstanding questions regarding comparing original to synthesized images in metamer experiments and begins to explore the effect of noise vs image seed on the resulting syntheses. While the paper tests some model classes that could be better motivated, and the results are not particularly groundbreaking, the contributions are convincing and undoubtedly important to the field. The paper includes an interesting Voronoi-like schematic of how to think about perceptual metamers, which I found helpful, but for which I do have some questions and suggestions. I also have some major concerns regarding incomplete psychophysical methodology including lack of eye-tracking, results inferred from a single subject, and a huge number of trials. I have only minor typographical criticisms and suggestions to improve clarity. The authors also use very good data reproducibility practices.

      Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.

      Specific Comments

      Experimental Setup

      Firstly, the experiments do not appear to utilize an eye tracker to monitor fixation. Without eye tracking or another manipulation to ensure fixation, we cannot ensure the subjects were fixating the center of the image, and viewing the metamer as intended. While the short stimulus time (200ms) can help minimize eye movements, this does not guarantee that subjects began the trial with correct fixation, especially in such a long experiment. While Covid-19 did at one point limit in-person eye-tracked experiments, the paper reports no such restrictions that would have made the addition of eye-tracking impossible. While such a large-scale experiment may be difficult to repeat with the addition of eye tracking, the paper would be greatly improved with, at a minimum, an explanation as to why eye tracking was not included.

      Addressed on pg. 25, starting on line 658.

      Secondly, many of the comparisons later in the paper (Figures 9,10) are made from a single subject. N=1 is not typically accepted as sufficient to draw conclusions in such a psychophysics experiment. Again, if there were restrictions limiting this it should be discussed. Also (P11) Is subject sub-00 is this an author? Other expert? A naive subject? The subject’s expertise in viewing metamers will likely affect their performance.

      Addressed on pg. 14, starting on line 308.

      Finally, the number of trials per subject is quite large. 13,000 over 9 sessions is much larger than most human experiments in this area. The reason for this should be justified.

      In general, we needed a large number of trials to fit full psychometric functions for stimuli derived for both models, with both types of comparison, both initializations, and over many target images. We could have eliminated some of these, but feel that having a consistent dataset across all these conditions is a strength of the paper.

      In addition to the sentence on pg. 14, line 318, a full enumeration of trials is now described on pg. 23, starting on line 580.

      Model

      For the main experiment, the authors compare the results of two models: a ’luminance model’ that spatially pools mean luminance values, and an ’energy model’ that spatially pools energy calculated from a multi-scale pyramid decomposition. They show that these models create metamers that result in different thresholds for human performance, and therefore different critical scaling parameters, with the basic luminance pooling model producing a scaling factor 1/4 that of the energy model. While this is certain to be true, due to the luminance model being so much simpler, the motivation for the simple luminance-based model as a comparison is unclear.

      The use of simple models is now addressed on pg. 3, starting on line 98, as well as the sentence starting on pg. 4 line 148: the luminance model is intended as the simplest possible pooling model.

      The authors claim that this luminance model captures the response of retinal ganglion cells, often modeled as a center-surround operation (Rodieck, 1964). I am unclear in what aspect(s) the authors claim these center-surround neurons mimic a simple mean luminance, especially in the context of evidence supporting a much more complex role of RGCs in vision (Atick & Redlich, 1992). Why do the authors not compare the energy model to a model that captures center-surround responses instead? Do the authors mean to claim that the luminance model captures only the pooling aspects of an RGC model? This is particularly confusing as Figures 6 and 9 show the luminance and energy models for original vs synth aligning with the scaling of Midget and Parasol RGCs, respectively. These claims should be more clearly stated, and citations included to motivate this. Similarly, with the energy model, the physiological evidence is very loosely connected to the model discussed.

      We have removed the bars showing potential scaling values measured by electrophysiology in the primate visual system and attempted to clarify our language around the relationship between these models and physiology. Our metamer models are only loosely connected to the physiology, and we’ve decided in revision not to imply any direct connection between the model parameters and physiological measurements. The models should instead be understood as loosely inspired by physiology, but not as a tool to localize the representation (as was done in the Freeman paper).

      The physiological scaling values are still used as the mean of the priors on the critical scaling value for model fitting, as described on pg. 27, starting on line 698.

      Prior Work:

      While the explorations in this paper clearly have value, it does not present any particularly groundbreaking results, and those reported are consistent with previous literature.The explorations around critical eccentricity measurement have been done for texture models (Figure 11) in multiple papers (Freeman 2011, Wallis, 2019, Balas 2009). In particular, Freeman 20111 demonstrated that simpler models, representing measurements presumed to occur earlier in visual processing need smaller pooling regions to achieve metamerism. This work’s measurements for the simpler models tested here are consistent with those results, though the model details are different. In addition, Brown, 2023 (which is miscited) also used an extended field of view (though not as large as in this work). Both Brown 2023, and Wallis 2019 performed an exploration of the effect of the target image. Also, much of the more recent previous work uses color images, while the author’s exploration is only done for greyscale.

      We were pleased to find consistency of our results with previous studies, given the (many) differences in stimuli and experimental conditions (especially viewing angle), while also extending to new results with the luminance model, and the effects of initialization. Note that only one of the previous studies (Freeman and Simoncelli, 2011) used a pooled spectral energy model. Moreover, of the previous studies, only one (Brown et al., 2023) used color images (we have corrected that citation - thanks for catching the error).

      Discussion of Prior Work:

      The prior work on testing metamerism between original vs. synthesized and synthesized vs. synthesized images is presented in a misleading way. Wallis et al.’s prior work on this should not be a minor remark in the post-experiment discussion. Rather, it was surely a motivation for the experiment. The text should make this clear; a discussion of Wallis et al. should appear at the start of that section. The authors similarly cite much of the most relevant literature in this area as a minor remark at the end of the introduction (P3L72).

      The large differences we observed between comparison types (original vs synthesized, compared to synthesized vs synthesized) surprised us. Understanding such difference was not a primary motivation for the work, but it is certainly an important component of our results. In the introduction, we thought it best to lay out the basic logic of the metamer paradigm for foveated vision before mentioning the complications that are introduced in both the Wallis and Brown papers (paragraph beginning p. 3, line 109). Our results confirm and bolster the results of both of those earlier works, which are now discussed more fully in the Introduction (lines 109 and following).

      White Noise: The authors make an analogy to the inability of humans to distinguish samples of white noise. It is unclear however that human difficulty distinguishing samples of white noise is a perceptual issue- It could instead perhaps be due to cognitive/memory limitations. If one concentrates on an individual patch one can usually tell apart two samples. Support for these difficulties emerging from perceptual limitations, or a discussion of the possibility of these limitations being more cognitive should be discussed, or a different analogy employed.

      We now note the possibility of cognitive limits on pg. 8, starting on line 243, as well as pg. 22, line 571. The ability of observers to distinguish samples of white noise is highly dependent on display conditions. A small patch of noise (i.e., large pixels, not too many) can be distinguished, but a larger patch cannot, especially when presented in the periphery. This is more generally true for textures (as shown in Ziemba and Simoncelli (2021)). Samples of white noise at the resolution used in our study are indistinguishable.

      Relatedly, in Figure 14, the authors do not explain why the white noise seeds would be more likely to produce syntheses that end up in different human equivalence classes.

      In figure 14, we claim that white noise seeds are more likely to end up in the same human equivalence classes than natural image seeds. The explanation as to why we think this may be the case is now addressed on pg. 19, starting on line 423.

      It would be nice to see the effect of pink noise seeds, which mirror the power spectrum of natural images, but do not contain the same structure as natural images - this may address the artifacts noted in Figure 9b.

      The lack of pink noise seeds is now addressed on pg. 19, starting on line 429.

      Finally, the authors note high-frequency artifacts in Figure 4 & P5L135, that remain after syntheses from the luminance model. They hypothesize that this is due to a lack of constraints on frequencies above that defined by the pooling region size. Could these be addressed with a white noise image seed that is pre-blurred with a low pass filter removing the frequencies above the spatial frequency constrained at the given eccentricity?

      The explanation for this is similar to the lack of pink noise seeds in the previous point: the goal of metamer synthesis is model testing, and so for a given model, we want to find model metamers that result in the smallest possible critical scaling value. Taking white noise seed images and blurring them will almost certainly remove the high frequencies visible in luminance metamers in figure 4 and thus result in a larger critical scaling value, as the reviewer points out. However, the logic of the experiments requires finding the smallest critical scaling value, and so these model metamers would be uninformative. In an early stage of the project, we did indeed synthesize model metamers using pink noise seeds, and observed that the high frequency artifacts were less prominent.

      Schematic of metamerism: Figures 1,2,12, and 13 show a visual schematic of the state space of images, and their relationship to both model and human metamers. This is depicted as a Voronoi diagram, with individual images near the center of each shape, and other images that fall at different locations within the same cell producing the same human visual system response. I felt this conceptualization was helpful. However, implicitly it seems to make a distinction between metamerism and JND (just noticeable difference). I felt this would be better made explicit. In the case of JND, neighboring points, despite having different visual system responses, might not be distinguishable to a human observer.

      Thanks for noting this – in general, metamers are subthreshold, and for the purpose of the diagram, we had to discretize the space showing metameric regions (Voronoi regions) around a set of stimuli. We’ve rewritten the captions to explain this better. We address the binary subthreshold nature of the metamer paradigm in the discussion section (pg. 19, line 438).

      In these diagrams and throughout the paper, the phrase ’visual stimulus’ rather than ’image’ would improve clarity, because the location of the stimulus in relation to the fovea matters whereas the image can be interpreted as the pixels displayed on the computer.

      We agree and have tried to make this change, describing this choice on pg. 3 line 73.

      Other

      The authors show good reproducibility practices with links to relevant code, datasets, and figures.

      Reviewer #1 (Recommendations For The Authors):

      In its current form, I found the introduction to be too cursory. I felt that the article would benefit from a clearer motivation for the two models that are considered as the reader is left unclear why these particular models are of special scientific significance. The luminance model is intended to capture some aspects of retinal ganglion cells response characteristics and the spectral energy model is intended to capture some aspects of the primary visual cortex. However, one can easily imagine models that include the pooling of other kinds of features, and it would be helpful to get an idea of why these are not considered. Which aspects of processing in the retina and V1 are being considered and which are being left out, and why? Why not consider representations that capture even higher-order statistical structure than those covered by the spectral energy model (or even semantics)? I think a bit of rewriting with this in mind could improve the introduction.

      Along similar lines, I would have appreciated having the logic of the study explained more explicitly and didactically: which overarching research question is being asked, how it is operationalised in the models and experiments, and what are the predictions of the different models. Figures 2 and 3 are certainly helpful, but I felt further explanations would have made it easier for the reader to follow. Throughout, the writing could be improved by a careful re-reading with a view to making it easier to understand. For example, where results are presented, a sentence or two expanding on the implications would be helpful.

      I think the authors could also be more explicit about the assumptions they make. While these are obviously (tacitly) included in the description of the models themselves, it would be helpful to state them more openly. To give one example, when introducing the notion of critical scaling, on p.6 the authors state as if it is a self-evident fact that "metamers can be achieved with windows whose size is matched to that of the underlying visual neurons". This presumably is true only under particular conditions, or when specific assumptions about readout from populations of neurons are invoked. It would be good to identify and state such assumptions more directly (this is partly covered in the Discussion section ’The linking proposition underlying the metamer paradigm’, but this should be anticipated or moved earlier in the text).

      We agree that our introduction was too cursory and have reworked it. We have also backed off of the direct comparison to physiology and clarified that we chose these two as the simplest possible pooling models. We have also added sentences at the end of each result section attempting to summarize the implication (before discussing them fully in the discussion). Hopefully the logic and assumptions are now clearer.

      There are also some findings that warrant a more extensive discussion. For example, what is the broader implication of the finding that original vs. synthesised and synthesised vs. synthesised comparisons exhibit very different scaling values? Does this tell us something about internal visual representations, or is it simply capturing something about the stimuli?

      We believe this difference is a result of the stimuli that are used in the experiment and thus the synthesis procedure itself, which interacts with the model’s pooled image feature. We have attempted to update the relevant figures and discussions to clarify this, in the sections starting on pg 17 line 396 and pg. 19 line 417.

      At some points in the paper, a third model (’texture model’) creeps into the discussion, without much explanation. I assume that this refers to models that consider joint (rather than marginal) statistics of wavelet responses, as in the famous Portilla & Simoncelli texture model. However, it would be helpful to the reader if the authors could explain this.

      Addressed on pg. 3, starting on line 94.

      Minor corrections.

      Caption of Figure 3: ’top’ and ’bottom’ should be ’left’ and ’right’

      Line 177: ’smallest tested scaling values tested’. Remove one instance of ’tested’

      Line 212: ’the images-specific psychometric functions’ -> ’image-specific’

      Line 215: ’cloud-like pink noise’. It’s not literally pink noise, so I would drop this.

      Line 236: ’Importantly, these results cannot be predicted from the model, which gives no specific insight as to why some pairs are more discriminable than others’. The authors should specify what we do learn from the model if it fails to provide insight into why some image pairs are more discriminable than others.

      Figure 9: it might be helpful to include small insets with the ’highway’ and ’tiles’ source images to aid the reader in understanding how the images in 9B were generated.

      Table 1 placement should be after it is first referred to on line 258.

      In the Discussion section "Why does critical scaling depend on the comparison being performed", it would be helpful to consider the case where the two model metamers *are* distinguishable from each other even though each is indistinguishable from the target image. I would assume that this is possible (e.g., if the target image is at the midpoint between the two model images in image space and each of the stimuli is just below 1 JND away from the target). Or is this not possible for some reason?

      Regarding line 236: this specific line has been removed, and the discussion about this issue has all been consolidated in the final section of the discussion, starting on pg. 19 line 438.

      Regarding the final comment: this is addressed in the paragraph starting on pg. 16 line 386. To expand upon that: the situation laid out by the reviewer is not possible in our conceptualization, in which metamerism is transitive and image discriminability is binary. In order to investigate situations like the one laid out by the reviewer, one needs models whose representations have metric properties, i.e., which allow you to measure and reason about perceptual distance, which we refer to in the paragraph starting on pg. 20 line 460. We also note that this situation has not been observed in this or any other pooling model metamer study that we are aware of. All other minor changes have been addressed.

      Reviewer #2 (Recommendations For The Authors):

      Original image T should be marked in the Voronoi diagrams.

      Brown et al is miscited as 2021 should be ACM Transactions on Applied Perception 2023.

      Figure 3 caption: models are left and right, not top and bottom.

      Thanks, all of the above have been addressed.

      References

      BrownReral Encoding, in the Human Visual System. ACM Transactions on Applied Perception. 2023 Jan; 20(1):1–22.http://dx.doi.org/10.1145/356460, Dutell V, Walter B, Rosenholtz R, Shirley P, McGuire M, Luebke D. Efficient Dataflow Modeling of Periph-5, doi: 10.1145/3564605.

      Freeman Jdoi: 10.1038/nn.2889, Simoncelli EP. Metamers of the ventral stream. Nature Neuroscience. 2011 aug; 14(9):1195–1201..

      Ziemba CMnications. 2021 jul; 12(1)., Simoncelli EP. Opposing Effects of Selectivity and Invariance in Peripheral Vision. Nature Commu-https://doi.org/10.1038/s41467-021-24880-5, doi: 10.1038/s41467-021-24880-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The authors make fairly strong claims that "arousal-related fluctuations are isolated from neurons in the deep layers of the SC" (emphasis added). This conclusion is based on comparisons between a "slow drift axis", a low-dimensional representation of neuronal drift, and other measures of arousal (Figures 2C, 3) and motor output sensitivity (Figures 2B, 3B). However, the metrics used to compare the slow-drift axis and motor activity were computed during separate task epochs: the delay period (600-1100 ms) and a perisaccade epoch (25 ms before and after saccade initiation), respectively. As the authors reference, deep-layer SC neurons are typically active only around the time of a saccade. Therefore, it is not clear if the lack of arousal-related modulations reported for deep-layer SC neurons is because those neurons are truly insensitive to those modulations, or if the modulations were not apparent because they were assessed in an epoch in which the neurons were not active. A potentially more valuable comparison would be to calculate a slow-drift axis aligned to saccade onset. 

      The reviewer makes an important point that the calculation of an axis can depend critically on the time window of neuronal response. We find when considering this that the slow drift axis is less sensitive to this issue because it is calculated on time-averaged activity over multiple trials. In previous work we found that slow drift calculated on the stimulus evoked response in V4 was very well aligned to slow drift calculated on pre-stimulus spontaneous activity (Cowley et al, Neuron, 2020, Supplemental Figure 3A and 3B). To address this issue in the present data, we compared the axis computed for an example session for neural activity during the delay period and neural activity aligned to saccade onset. As shown new Figure 2 – figure supplement 1 in the revised manuscript, we found a similar lack of arousal-related modulations for deep-layer SC neurons when slow drift was computed using the saccade epoch (25ms before to 25ms after the onset of the saccade). Figure 2 – figure supplement 1A shows loadings for the SC slow drift axis when it was computed using spiking responses during the delay period (as in the main manuscript analysis). In contrast, Figure 2 – figure supplement 1B shows loadings from the same session when the SC slow drift axis was computed using spiking responses during the saccade epoch. The plots are highly similar and in both cases the loadings were weaker for neurons recorded from channels at the bottom of the probe which have a higher motor index. Finally, we found that projections onto the SC slow drift axis for this session were strongly correlated when the slow drift axis was computed using spiking responses during the delay period and the saccade epoch (r = 0.66, p < 0.001, Figure 1C). Taken together, these results suggest that arousal-related modulations are less evident in deep-layer SC neurons irrespective of whether slow drift was computed during the delay or saccade epoch (see also Public Reviews, Reviewer 1, Point 2).

      (2) More generally, arousal-related signals may persist throughout multiple different epochs of the task. It would be worthwhile to determine whether similar "slow-drift" dynamics are observed for baseline, sensory-evoked, and saccade-related activity. Although it may not be possible to examine pupil responses during a saccade, there may be systematic relationships between baseline and evoked responses. 

      Similar to the point above, slow drift dynamics tend to be similar across different response epochs because they are averaged across many trials and seem to tap into responsivity trends that are robust across epochs. As shown in Author response image 1 below, and the Figure 2 – figure supplement 1 in the revised manuscript, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. We did not investigate differences between baseline and evoked pupil responses in the current paper. However, these effects were characterized in one of our previous papers that focused exclusively on the relationship between slow drift and eye-related metrics (Johnston et al., 2022, Cereb. Cortex, Figure 6). In this previous work, we found a negative correlation between baseline and evoked pupil size. Both variables were significantly correlated with slow drift, the only difference being the sign of the correlation.

      Author response image 1.

      (A-C) Dynamics of slow drift for three example sessions when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. Baseline = 100ms before the onset of the target stimulus; Delay = 600 to 1100ms after the offset of the target stimulus; Stim = 25ms to 125ms after the onset of the target stimulus; Sac = 25ms before to 25ms after the onset of the saccade.

      Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.

      (3) The relationships between changes in SC activity and pupil size are quite small (Figures 2C & 5C). Although the distribution across sessions (Figure 2C) is greater than chance, they are nearly 1/4 of the size compared to the PFC-SC axis comparisons. Likewise, the distribution of r2 values relating pupil size and spiking activity directly (Figure 5) is quite low. We remain skeptical that these drifts are truly due to arousal and cannot be accounted for by other factors. For example, does the relationship persist if accounting for a very simple, monotonic (e.g., linear) drift in pupil size and overall firing rate over the course of an individual session? 

      Firstly, it is important to note that the strength of the relationship between projections onto the SC slow drift axis and pupil size (r<sup>2</sup> = 0.06) is within the range reported by Joshi et al. (2016, Neuron, Figure 3). They investigated the median variance explained between the spiking responses of individual SC neurons and pupil size and found it to be approximately 0.02 across sessions. Secondly, our statistical approach of testing the actual distribution of r<sup>2</sup> values against a shuffled distribution was specifically designed to rule out the possibility that the relationship between SC spiking responses and pupil size occurred due to linear drifts. The shuffled distribution in Figure 2C of the main manuscript represents the variance that can be explained by one session’s slow drift correlated with another session’s pupil, which would contain effects that occurred due to linear drifts alone. That the actual proportion of variance explained was significantly greater than this distribution suggests that the relationship between projections onto the SC slow drift axis and pupil size reflects changes in arousal rather than other factors related to linear drifts.

      Joshi S, Li Y, Kalwani RM, Gold JI (2016) Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex. Neuron 89:221–234.

      (4) It is not clear how the final analysis (Figure 6) contributes to the authors' conclusions. The authors perform PCA on: (i) residual spiking responses during the delay period binned according to pupil size, and (ii) spiking responses in the saccade epoch binned according to target location (i.e., the saccade tuning curve). The corresponding PCs are the spike-pupil axis and the saccade tuning axis, respectively. Unsurprisingly, the spikepupil axis that captures variance associated with arousal (and removes variance associated with saccade direction) was not correlated with a saccade-tuning axis that captures variance associated with saccade direction and omits arousal. Had these measures been related it would imply a unique association between a neuron's preferred saccade direction and pupil control- which seems unlikely. The separation of these axes thus seems trivial and does not provide evidence of a "mechanism...in the SC to prevent arousal-related signals interfering with the motor output." It remains unknown whether, for example, arousal-related signals may impact trial-by-trial changes in neuronal gain near the time of a saccade, or alter saccade dynamics such as acceleration, precision, and reaction time. 

      The reviewer makes a good point, and we agree that more evidence is needed to determine if the separation of the pupil size axis and saccade tuning axis is the mechanism through which cognitive and arousal-related signals can be intermixed in the SC. In the revised manuscript (lines 679-682), we have raised this as a possible explanation that necessitates further study rather than stating definitively that it is the exact mechanism through which these signals are kept separate. Our analysis here is similar to the one from Smoulder et al (2024, Neuron, Fig. 2F), in which the interactions between reward signals and target tuning in M1 were examined (and found to be orthogonal). While we agree with the reviewer that it may seem “trivial” for these axes to be orthogonal, it does not have to be so. If, for example, neural tuning curves shifted with changes in pupil size through gain changes that revealed tuning or affected tuning curve shape, there could be projections of the pupil axis onto the target tuning axis. Thus, while we agree with the reviewer that it appears sensible for these two axes to be orthogonal, our result is nonetheless a novel finding. We have edited the text in our revised manuscript, however, to make sure the nuance of this point is conveyed to the reader.

      Smoulder AL, Marino PJ, Oby ER, Snyder SE, Miyata H, Pavlovsky NP, Bishop WE, Yu BM, Chase SM, Batista AP. A neural basis of choking under pressure. Neuron. 2024 Oct 23;112(20):3424-33.

      Reviewer #2 (Public Review):

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themselves introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time explaining the importance of arousal and how it could interfere with oculomotor behavior. 

      Although attention does represent an important cognitive process, we did not design an experiment in which attention and oculomotor control are differentiated because attention does not appear to be related to slow drift. In our first paper that reported on this phenomenon, we investigated the effects of spatial attention on slow fluctuations in neural activity by cueing the monkeys to attend to a stimulus in the left or right visual field in a block-wise manner. Each block lasted ~20 minutes and we found that slow drift did not covary with the timing of cued blocks (see Figure 4A, Cowley et al., 2020, Neuron). Furthermore, there is a large body of work showing that arousal also impacts motor behavior leading to changes in a range of eye-related metrics (e.g., pupil size, microsaccade rate and saccadic reaction time - for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). We also note that the terms attention and arousal are often used in nonspecific and overlapping ways in the literature, adding to some potential confusion here. Nonetheless, pupil-linked arousal is an important variable that impacts motor performance. This has now been stated clearly in the Introduction of the revised manuscript (lines 108-114) to address the reviewer’s concerns and highlight the importance of studying how precise fixation and eye movements are maintained even in the presence of signals related to ongoing changes in brain state. 

      Cowley BR, Snyder AC, Acar K, Williamson RC, Yu BM, Smith MA (2020) Slow Drift of Neural Activity as a Signature of Impulsivity in Macaque Visual and Prefrontal Cortex. Neuron 108:551-567.e8.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results. 

      As described above, several studies across species have demonstrated that arousal impacts motor behavior e.g., saccade reaction time, saccade velocity and microsaccade rate (for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). This has been clarified in the Introduction of the revised manuscript to address the reviewer's concerns (lines 108-114). Our prior work (Johnston et al, Cerebral Cortex, 2022) shows that slow drift impacts several types of oculomotor behavior. Overall, these studies highlight the impact of arousal on eye movements as a robust effect, and support the present investigation into arousal and oculomotor control signals. While we agree reaction time, accuracy, and speed all can be influenced by arousal depending on task demands, the present study is focused on the connection between slow fluctuations in neural activity, linked to arousal, and different subpopulations of SC neurons. 

      Di Stasi LL, Catena A, Cañas JJ, Macknik SL, Martinez-Conde S (2013) Saccadic velocity as an arousal index in naturalistic tasks. Neurosci Biobehav Rev 37:968–975.

      Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC? 

      We agree with the reviewer that our actual data distribution was non-uniform. We examined individual sessions with high and low variance explained and did not find notable differences. One source of this variation has to do with session length. Longer sessions in principle should have a chance distribution of variance explained closer to zero because they contained more time bins. Given that we had no specific hypothesis for a non-uniform distribution, we have simply displayed the full distribution of values in our figure and the statistical result of a comparison to a shuffled distribution.

      Reviewer #3 (Public Review):

      (1) However, I am concerned about two main points: First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC? In other words, it seems important to show distributions of encountered neurons (regardless of the motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of Figure Supplement 1. I elaborate more on these points in the detailed comments below. 

      The reviewer makes a good point about the efferent signals from SC. It is true that electrical thresholds are often lowest in intermediate layers, though deep layers do project to the oculomotor nuclei (Sparks, 1986; Sparks & Hartwich-Young, 1989) and often intermediate and deep layers are considered to function together to control eye movements (Wurtz & Albano, 1980). As suggested by the reviewer, we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index, as well as included the above references and points about the intermediate and deep layers (Lines 73-81). Aside from the question of which layers of the SC function as the “motor output”, the reviewer raises a separate and important question – are our deep recordings still in SC. Here, we can say definitively that they are. We removed neurons if they did not exhibit elevated (above baseline) firing rates during the visual or saccade epochs of the MGS task (see Methods section on “Exclusion criteria”). All included neurons possessed a visual, visuomotor or motor response, consistent with the response properties of neurons in the SC. In addition, we found a number of neurons well above the bottom of the probe with strong motor responses and minimal loadings onto the slow drift axis (see Figure 2 – figure supplement 1A), consistent with the reviewer’s comment that intermediate layer neurons are tuned for movement and play a role in saccade production.

      Mohler CW, Wurtz RH. Organization of monkey superior colliculus: intermediate layer cells discharging before eye movements. Journal of neurophysiology. 1976 Jul 1;39(4):722-44.

      Sparks DL. Translation of sensory signals into commands for control of saccadic eye movements: role of primate superior colliculus. Physiol Rev. 1986 Jan;66(1):118-71. doi: 10.1152/physrev.1986.66.1.118. PMID: 3511480.

      Sparks DL, Hartwich-Young R. The deep layers of the superior colliculus. Reviews of oculomotor research. 1989 Jan 1;3:213-55.

      Wurtz RH, Albano JE. Visual-motor function of the primate superior colliculus. Annu Rev Neurosci. 1980;3:189-226. doi: 10.1146/annurev.ne.03.030180.001201. PMID: 6774653.

      (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC. 

      The reviewer makes an important point about the SC’s visual responses. Neurons with a low motor index are, conversely, likely to have a stronger visual response index. However, we do not believe that changes in luminance can explain why the correlation between SC spiking response and pupil size is weaker for neurons with a lower motor index. Firstly, the changes in pupil size observed in the current paper and our previous work are slow and occur on a timescale of minutes (Cowley et al., 2020, Neuron) and are correlated with eye movement measures such as reaction time and microsaccade rate (Johnston et al., 2022, Cerebral Cortex). This is in stark contrast to luminance-evoked changes in pupil size that occur on a timescale of less than a second. Secondly, as shown the new Figure 5 – figure supplement 1 in the revised manuscript, very similar results were found when SC spiking responses were correlated with pupil size during the baseline period, when only the fixation point was on the screen. Although the luminance of the small peripheral target stimulus can result in small luminance-evoked changes in pupil size, no changes in luminance occurred during the baseline period which was defined as 100ms before the onset of the target stimulus. In Figure 2 – figure supplement 1 and Author response image 1 above, we show that slow drift is the same whether calculated on the baseline response, delay period, or peri-saccadic epoch. Thus, the measurement of slow drift is insensitive to the precise timing of the selection of both the window for the spiking response and the window for the pupil measurement. If luminance were the explanation for the slow changes in firing observed in visually responsive SC neurons, it would require those neurons to exhibit robust, sustained tuned responses to the small changes in retinal illuminance induced by the relatively small fluctuations in pupil size we observed from minute to minute. We are aware of no reports of such behavior in visually-responsive neurons in SC. We have included these analyses and this reasoning in the revised manuscript on lines 478-495.

      Reviewer#1 (Recommendations for the author):

      (1) It would be useful to provide line numbers in subsequent manuscripts for reviewers.

      Line numbers have been added in the revised version of the manuscript.

      (2) Page #6; last sentence: "...even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in motor output." I do not believe the authors have provided evidence that arousal levels were not associated with changes in motor output.

      As suggested by Reviewer 3 (see Public Reviews, Reviewer 3, Point 2), we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index. This sentence in the revised manuscript now reads:

      “This provides a potential mechanism through which signals related to cognition and arousal can exist in the SC, and even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in SC neurons that are linked to saccade execution.”

      (3) Page #8; last paragraph: Although deep-layer SC neurons may not have been obtained during every recording session, a summary of the motor index scores observed along the probe across sessions would be useful to confirm their assumptions. 

      See Author response image 2 below which shows the motor index of each recoded SC neuron on the x-axis and session number on the y-axis. The points are colored by to the squared factor loading which represents the variance explained between the response a neuron and the slow drift axis (see Figure 3B of the main manuscript). You can see from this plot that neurons with a stronger component loading (shown in teal to yellow) typically have a lower motor index whereas the opposite is true for neurons with a weaker component loading (shown in dark blue).

      Author response image 2.

      Scatter plot showing the motor index of each recorded neuron along with the session number in which it was recorded. The points are colored by to the squared factor loading for each neuron along the slow drift axis. Note that loadings above 0.5 (33 data points in total) have been thresholded at 0.5 so that we could effectively use the color range to show all of the slow drift axis loadings.

      (4) Page #10; first paragraph: The authors should state the time window of the delay period used, since it may be distinct from the pupil analysis (first 200ms of delay). 

      This has been stated in the revised version of the manuscript. The sentence now reads:

      “We first asked if arousal-related fluctuations are present in the SC. As in previous studies that recorded from neurons in the cortex (Cowley et al., 2020), we found that the mean spiking responses of individual SC neurons during the delay period (chosen at random on each trial from a uniform distribution spanning 600-1100ms, see Methods) fluctuated over the course of a session while the monkeys performed the MGS task (Figure 2A, left).”

      (5) Page #10; second paragraph: Extra period at the end of a sentence: " most variance in the data..". 

      Fixed in the revised version of the manuscript.

      (6) Page #12: "between projections onto the SC slow drift axis and mean pupil size during the first 200ms of the delay period when a task-related pupil response could be observed." What criteria was used to determine whether a task-related pupil response was observed? 

      This was chosen based on the results of a previous study in our lab that used the same memory-guided saccade task to investigate the relationship between slow drift and changes in based and evoked pupil size (see Johnston et al., 2022, Cereb. Cortex, Figure 6B). The period was chosen based on plotting the average pupil size aligned on different trial epochs. As we show in Figure 5-figure supplement 3 above, the pupil interactions with slow drift did not depend on the particular time window of the pupil we chose.  

      (7) Page #14; Figure 2A: The axes for the individual channels are strangely floating and quite different from all other figures. Please label the channel in the figure legend that was used as an example of the projected values onto the slow drift axis.

      The figure has been changed in the revised version of the manuscript so that the tick mark denoting zero residual spikes per second is on the top layer of each plot. A scale bar was chosen instead of individual axes to reduce clutter in the figure as it was used to demonstrate how slow drift was computed. Residual spiking responses from all neurons were projected on the slow drift axis to generate the scatter plot in the bottom right-hand corner of Figure 2A. There is no single neuron to label.

      (8) Page #16: "These results demonstrate that even though arousal-related fluctuations are present in the SC, they are isolated from deep-layer neurons that elicit a strong saccadic response and presumably reside closer to the motor output." In line with our major comments, lack of arousal-related activity during the delay period is meaningless for deep-layer SC neurons that are generally inactive during this time. It does not imply that there is no arousal signal! 

      Addressed in Public Reviews, Reviewer 1, Point 1 & 2. We found a similar lack of arousal-related modulations reported for deep-layer SC neurons when slow drift was computed using the saccade epoch (Figure 1 above). In addition, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade period (Figure 2).

      (9) Page #18: "These findings provide additional support for the hypothesis that arousalrelated fluctuations are isolated from neurons in the deep layers of the SC." The same criticism from above applies.

      Addressed in Public Reviews, Reviewer 1, Point 1 & 2.

      (10) Page #20; paragraph 3: "Taken together, the findings outlined above..." Would be useful to be more specific when referring to "activity" ; e.g., "...these neurons did not exhibit large fluctuations in delay-period activity over time".

      This sentence has been changed in the revised manuscript in light of the reviewer’s comments. It now reads:

      “In addition to being more weakly correlated with pupil size, the spiking responses of these neurons did not exhibit large fluctuations over time (Figure 2), and when considering the neuronal population as a whole, explained less variance in the slow drift axis when it was computed using population activity in the SC (Figure 3) and PFC (Figure 4).”

      Reviewer #3 (Recommendations for the author):

      The paper is clear and well-written. However, I am concerned about two main points: 

      (1) First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC. In other words, it seems important to show distributions of encountered neurons (regardless of motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of the figure supplement 1. I elaborate more on these points in the detailed comments below. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC. 

      Addressed in Public Reviews, Reviewer 3, Point 2.

      (3) I think that a remedy to the first point above is to change the text to make it a bit more descriptive and less interpretive. For example, just say that the slow drifts were less evident among the neurons with high motor index. 

      We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 1).

      (4) For the second point, I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. 

      We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 2).

      (5) Line 31: I'm a bit underwhelmed by this kind of statement. i.e. we already know that cognitive processes and brain states do alter eye movements, so why is it "critical" that high precision fixation and eye movements are maintained? And, isn't the next sentence already nulling this idea of criticality because it does show that the brain state alters the SC neurons? In fact, cognitive processes are already known to be most prevalent in the intermediate and deep layers of the SC. 

      It seems clear that while cognitive state does affect eye movements, it is desirable to have some separation between cognitive state and eye movement control. Covert attention, for instance, is precisely a situation where eye movement control is maintained to avoid overt saccades to the attended stimulus, and yet there are clear indications of attention’s impact on microsaccades and fixation. We stand by our statement that an important goal of vision is to have precise fixation and movements of the eye, and yet at the same time the eyes are subject to numerous influences by cognitive state.

      (6) Line 65: it is better to clarify that these are "functional layers" because there are actually more anatomical layers. 

      We have edited this sentence in the revised version of the manuscript so that it now reads:

      “The role of these projections in the visuomotor transformation depends on the functional layer of the SC in which they terminate”.

      (7) Line 73: this makes it sound like only the deepest layers are topographically organized, which is not true. Also, as early as Mohler & Wurtz, 1972, it was suggested that the intermediate layers have the biggest impacts downstream of the SC. This is also consistent with electrical microstimulation current thresholds for evoking saccades from the SC. 

      We have addressed the reviewers’ comments about the intermediate layers having the biggest impact downstream of the SC in Public Reviews, Reviewer 3, Point 1. Furthermore, line 73 has been changed in the revised manuscript so that it now reads:

      “As is the case for neurons in the superficial and intermediate layers, they [SC motor neurons] form a topographically organized map of visual space (White et al. 2017; Robinson 1972; Katnani and Gandhi 2011)”.  

      (8) Line 100: there is an analogous literature regarding the question of why unwanted muscle contractions do not happen. Specifically, in the context of why SC visual bursts do not automatically cause saccades (which is a similar problem to the ones you mention about cognitive signals interfering by generating unwanted eye movements), both Jagadisan & Gandhi, Curr Bio, 2022 and Baumann et al, PNAS, 2023 also showed that SC population activity not only has different temporal structure (Jagadisan & Gandhi) but also occupy different subspaces (Baumann et al) under these two different conditions (visual burst versus saccade burst). This is conceptually similar to the idea that you are mentioning here with respect to arousal. So, it is worth it to mention these studies here and again in the discussion. 

      We are grateful to the reviewer for these suggestions and have included text in the Introduction (Lines 125-128) and Discussion (Lines 678-682) of the revised manuscript along with the references cited above.

      (9) Line 147: as mentioned above, it is now generally accepted that there are quite a few "pure" motor neurons in the SC. This is consistent with what you find. E.g. Baumann et al., 2023. And, again see Mohler and Wurtz in the 1970's. So, I wonder how useful it is to go too much into this idea of the deeper motor neurons (e.g. the correlations in the other panels of the Figure 1 supplement). 

      This is related to the reviewer’s comment that the output of the SC might be in the intermediate layers. This concern has been addressed in Public Reviews, Reviewer 3, Point 1.

      (10) Figure 1 should say where the RF was for the shown spike rasters. i.e. were these the same saccade target across trials? And where was that location relative to the RF? It would help also in the text to say whether the saccade was always to the RF center or whether you were randomizing the target location. 

      We centered the array of saccade targets using the microstimulation-evoked eye movement for SC (see Methods section “Memory-guided saccade task”) to find the evoked eccentricity, and then used saccade targets with equal spacing of 45 degrees starting at zero (rightward saccade target). We did not do extensive RF mapping beyond this microstimulation centering. In Figure 1, the spike rasters are shown for a target that was visually identified to be within the neuron’s RF based on assessing responses to all 8 target angles. We have added information about this to the figure caption.

      (11) Line 218: but were there changes in the eye movement statistics? For example, the slow drift eye movements during fixation? Or even the microsaccades? 

      Addressed in Public Reviews, Reviewer 2, Point 2.  

      (12) Line 248: shuffling what exactly? I think that more explanation would be needed here. 

      Addressed in Public Reviews, Reviewer 1, Point 3.  

      (13) Line 263: but isn't this reflecting a sensory transient in the pupil diameter, since the target just disappeared? 

      Addressed in Public Reviews, Reviewer 3, Point 2.  

      (14) Line 271: I suspect that slow drift eye movements (in between microsaccades) would show higher correlations. Not sure how well you can analyze those with a video-based eye tracker. 

      We agree that fixational drift would be a worthwhile metric, but it is not one we have focused on here and to our knowledge does require higher precision tracking. 

      (15) Line 286: again, see above about similar demonstrations with respect to the visual and motor burst intervals, which clearly cause the same problem (even stronger) as the one studied here. 

      See reply, including Figure 2.

      (16) Line 330: again, I'm not sure deeper necessarily automatically means closer to the output. For example, current thresholds for evoked saccades grow higher as you go deeper. Maybe the authors can ask their colleague Neeraj Gandhi about this point specifically, just to be safe. Maybe the safest would be to remain descriptive about the data, and just say something like: arousal-related fluctuations were absent in our deepest recorded sites. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (17) Line 332: likewise, statements like this one here would be qualified if the output was the intermediate layers......anyway if I understand what I read so far in the paper, the signal will be anyway orthogonal to the motor burst population subspace. So, maybe there's no need to emphasize that it goes away in the very deepest layers. 

      See reply above, Public Reviews, Reviewer 1, Point 4.

      (18) Figure 3A: related to the above, I think one issue could be that the deeper contacts might already be out of the SC. Maybe some cell count distribution from each channel should help in this regard. i.e. were you finding way fewer saccade-related neurons in the deepest channels (even though the few that you found were with high motor index)? If so, then wouldn't this just mean that the channel was too deep? I think there needs to be an analysis like this, to convince readers that the channels were still in the SC. Ideally, electrical stimulation current thresholds for evoking saccades at different depths would be tested, but I understand that this can be difficult at this stage. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (19) I keep repeating this because in general, cognitive effects are stronger in the intermediate/deeper layers than in the superficial layers. If these interfere with eye movements like arousal, then why should arousal be different?

      Few studies have investigated the effects of attention on “pure” movement SC neurons that only discharge during a saccade. One study, which we cited in Introduction (Ignashchenkova et al., 2004, Nat. Neurosci.), found significant differences in spiking responses between trials with and without attentional cueing for visual and visuomotor neurons. No significant difference was found for motor neurons, consistent with our hypothesis that signals related to cognition and arousal are kept separate from saccade-related signals in the SC.

      (20) The problem with Figure 5 and its related text is that the neurons with low motor index are additionally visual. So, of course, they can be modulated if the pupil diameter changes!

      Addressed in Public Reviews, Reviewer 3, Point 2.  

      (21) I had a hard time understanding Figure 6. 

      See reply above, Public Reviews, Reviewer 1, Point 4.

      (22) Line 586: these cells have more visual responses and will be affected by the amount of light entering the eye. 

      Addressed in Public Reviews, Reviewer 3, Point 2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      (1) Glycogen biosynthesis typically involves several enzymes. In this context, could the authors comment on the effect of overexpressing a single enzyme - especially a mutant version - on the structure or quality of the glycogen synthesized?

      While quantitative molecular weight analysis of synthesized glycogen was not performed, we documented changes in glycogen particle morphology. GYSmut overexpression resulted in significantly enlarged singular glycogen granules, suggesting potential high molecular mass, while GYS-GYG co-overexpression in MSCs (GYG being the essential enzyme for glycogen synthesis initiation) produced a diffuse glycogen distribution pattern rather than particulate structures. We have incorporated this result as new Figure S2C.

      These results suggest that overexpression of specific glycogen-metabolizing enzymes significantly influences glycogen structure. Consequently, targeted modulation of glycogen architecture and properties through key enzymes represents a potential avenue for future investigation.

      (2) Regarding the in vitro starvation experiments (Figure 2C), what oxygen conditions (pO₂) were used? Are these conditions physiologically relevant and representative of the in vivo lung microenvironment?

      Our in vitro starvation experiments (Figure 3C) were conducted under normoxic (21%). The oxygen concentration in human lungs is physiologically lower than atmospheric levels, with healthy individuals exhaling air containing approximately 16% oxygen (Thalakkotur Lazar Mathew, Diagnostics 2015). To our knowledge, direct measurements of alveolar oxygen concentration in pulmonary fibrosis are rare. Therefore, to evaluate the performance of GYSmut under hypoxic conditions, in the revised manuscript, Figure S2 has been augmented to include assessment of cell performance under combined hypoxia (oxygen concentration < 5%)and nutrient deprivation stress, which further corroborate the superiority of the GYSmut group over the control under different oxygen concentrations. 

      (3) In the in vitro model, how many hours does it take for the intracellular glycogen reserve to be completely depleted under starvation conditions?

      While quantitative cell viability data were recorded up to 72 hours post-implantation (Fig 3C), we observed cell viability at approximately 96 hours. We noticed that the presence of glycogen particles exhibited a correlation with sustained cell viability. However, reliable quantitative assessment of glycogen became increasingly challenging upon significant depletion of viable cells, thereby limiting our measurements during later time points.

      (4) For the in vivo model, is there a quantitative analysis of the survival kinetics of the transplanted cells over time for each group? This would help to better assess the role and duration of glycogen stores as an energy buffer after implantation.

      We tracked the in vivo distribution and persistence of implanted MSCs using enzymatic activity quantification assays (using Gluc luciferase assay) and live animal imaging (using Akaluc luciferase). The revised manuscript includes quantitative analysis of the in vivo fluorescence imaging data, which has been supplemented as Figure S4. Glycogen-engineered MSCs and control cells were quantitatively assessed at three discrete time points post-implantation. This quantification revealed a transient divergence in cell viability between the experimental and control groups around day 7. However, fluorescence in both cohorts subsequently declined to similar levels over the extended observation period.

      (5) Finally, the study was performed in male mice only. Could sex differences exist in the efficacy or metabolism of the engineered MSCs? It would be helpful to discuss whether the approach could be expected to be similarly effective in female subjects.

      We appreciate the reviewer’s important question regarding potential sex differences. Our study used male mice based on three key considerations: 1) Clinical Relevance: Idiopathic pulmonary fibrosis (IPF) shows significant male predominance, with diagnosis rates 3.5-fold higher in men (37.8% vs 10.6%, p<0.0001) and greater diagnostic confidence (Assayag et al., Thorax 2020). 2) Model Consistency: The bleomycin model (our chosen method) demonstrates more consistent fibrotic responses in male mice (Gul et al., BMC Pulm Med 2023). 3) Biological Rationale:

      Estrogen’s protective effects in females may confound therapeutic assessments (cited in Assayag et al.).

      We fully acknowledge this limitation and will include female subjects in subsequent translational studies. The therapeutic principle should theoretically apply to both sexes, but we agree this requires experimental validation.

      (6) The number of mice for each group and time point should be specified.

      The manuscript text has been revised to enhance clarity, and the number of mice for each group and time point has been specified (line 170 to 182).

      Reviewer #2 (Public Review):

      (4) Inconsistencies in In Vivo Data: There is a discrepancy between the number of animals shown in the figures and the graph (three individuals vs. five animals), as well as missing details on how luciferase signal intensity was quantified, requiring further clarification.

      To assess MSC survival in vivo, we employed two strategies utilizing distinct luciferases optimized for specific detection modalities. MSC viability was quantified ex vivo through Gaussia luciferase (Gluc) activity, leveraging its high sensitivity and established commercial assay kits (n = 3 mice per group per time point). For non-invasive longitudinal tracking within living animals, MSC distribution and viability were monitored via in vivo bioluminescence imaging using Akaluc luciferase, selected for its superior tissue penetration and sensitivity in situ (n = 5 mice per group).The manuscript text has been revised to enhance clarity, and the experiment protocols for luciferase signal detection and quantification has been added into Methods.

      (1) (2) (3) (5):

      We fully agree that further investigation into the functional consequences of glycogen engineering in MSCs – encompassing core cellular functions, immunomodulatory properties, and associated signaling pathways – is important to fully elucidate the underlying mechanisms. Cellular metabolism is intrinsically intertwined with diverse physiological processes. Consequently, we believe that glycogen engineering exerts multifaceted effects on MSCs, likely extending beyond the modulation of any single specific pathway. Studying the metabolic perturbation induced by such engineering approaches in mammalian cells represents an interesting field. The exploration of these aspects remains an long-term research objective within our group.

      Reviewer #2 (Recommendations for the authors):

      (6) Clarification of Data in the Murine Model:

      In Figure 4B, there is a discrepancy between the number of animals shown in the image (five) and those represented in the graph (three). This discrepancy needs clarification. Additionally, the study lacks information regarding the intensity of the signal in the luciferase assays. It is unclear how luciferase expression in the mice was quantified, and providing this detail would enhance the understanding of the data presented.

      We sincerely appreciate these valuable suggestions. We have revised the relevant text for greater clarity. Figure 4B and Figure 4C present results from two distinct experimental approaches, each employing different luciferase reporters and measurement methodologies, and different num of mice were used in these two experiments.

      Quantitative data derived from the in vivo bioluminescence imaging has been supplemented as Figure S4. The experiment protocols for luciferase signal detection and quantification has been added into Methods.

      To other recommendations of reviewer 2:

      We sincerely appreciate your valuable insights, which demonstrate your deep expertise. We fully agree that beyond nutrient availability, factors such as reactive oxygen species (ROS) and the immune microenvironment are also critical limitations affecting the survival and therapeutic efficacy of implanted MSCs.

      We propose that glycogen engineering exerts broad effects on MSCs. These effects manifest as changes in multiple cellular characteristics, including proliferation, differentiation, surface marker expression, antioxidant capacity, and immunomodulatory activity – all crucial factors for the therapeutic purpose of MSCs.

      We believe these changes likely involve complex networks of interconnected regulatory factors. The underlying mechanisms might be clarified through proteomic and metabolomic profiling.

      However, comprehensively investigating these interconnected aspects requires significant time and resources. Some components of this research extend beyond the current scope of our project. Nevertheless, exploring these mechanisms remains an important objective, and we will actively work to investigate them further in our ongoing studies.

    1. Author response

      We would like to thank the editors and two reviewers for the assessment and the constructive feedback on our manuscript, “Toward Robust Neuroanatomical Normative Models: Influence of Sample Size and Covariates Distributions”. We appreciate the thorough reviews and believe the constructive suggestions will substantially strengthen the clarity and quality of our work. We plan to submit a revised version of the manuscript and a full point-by-point response addressing both the public reviews and the recommendations to the authors. 

      Reviewer 1. 

      In revision, we plan to address the reviewer’s comments by: (i) strengthen the interpretation of model fit through reporting the proportion of healthy controls within and outside the extreme percentile bounds; (ii) adding age-resolved overlays of model-derived percentile curves compared to those from the full reference cohort for key sample sizes and regions; (iii) quantifying age-distribution alignment between train and test set; and (iv) summarizing model performance as a joint function of age-distribution alignment and sample size.

      Reviewer 2. 

      In the revised manuscript, we will (i) expand the Discussion to more clearly outline the trade-offs between simple regression frameworks and hierarchical models for normative modeling (e.g., scalability, handling of multi-site variation, computational considerations), and discuss alternative approaches and harmonization as important directions for multi-site settings; (ii) contextualize OASIS-3 vs AIBL differences by quantifying train– test age-alignment across sampling strategies and emphasize that skewness should be interpreted relative to the target cohort’s alignment rather than absolute numbers. (iii) reassess sex-imbalance effects by reporting expected age distributions per condition and re-evaluate sex effects while controlling for age; (iv) investigate the apparent dip at n≈300 dip by increasing sub-sampling seeds, testing neighboring sample sizes, and using an alternative age-binning scheme to clarify the observed artifact; (v) clarify potential divergence between tOC separation and global fit under discrepancies in demographic distributions and relate tOC to age-alignment distance; (vi)  reframe the sample-size guidance in terms of distributional alignment rather than an absolute n.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      The authors describe the degradation of an intrinsically disordered transcription factor (LMO2) via PROTACs (VHL and CRBN) in T-ALL cells. Given the challenges of drugging transcription factors, I find the work solid and a significant scientific contribution to the field. 

      Strengths: 

      (1) Validation of LMO2 degradation by starting with biodegraders, then progressing to chemical degrades. 

      (2)interrogation of the biology and downstream pathways upon LMO2 degradation (collateral degradation §

      (3) Cell line models that are dependent/overexpression of LMO2 vs LMO2 null cell lines. 

      (4) CRBN and VHL-derived PROTACs were synthesized and evaluated. 

      Weaknesses: 

      (1) The conventional method used to characterize PROTACs in the literature is to calculate the DC50 and Dmax of the degraders, I did not find this information in the manuscript. 

      As noted in the reply to referee’s point 4 below, our first generation compounds are not highly potent. The DC<sub>50</sub> values have been computed specifically using Western blot reflected in the data shown in Fig. 2. The revised version Supplementary Fig. S3 shows these quantified Western blot data from a time course of treating KOPT-K1 cells with either Abd-CRBN and Abd-VHL, where the 24 hour blot data are shown in Figure 2, G and E, and the quantified data from each 24 hour treatment are quantified in Supplementary Fig. S3). With these data, the DC<sub>50</sub> values 9 μM for Abd-CRBN and 15 μM Abd-VHL), included in in the main text and the Supplementary Fig. S3 figure legend.

      In addition, the loss of signal of the LMO2-Rluc reporter protein from PROTAC treated cells shown in Fig. 2M has been used to calculate a half-point of degradation; although strictly not DC<sub>50</sub>, as it measures a reporter protein, this yielded values are 10 μM for Abd-CRBN and 9 μM Abd-VHL. 

      (2) The proteomics data is not very convincing, and it is not clear why LMO2 does not show in the volcano plot (were higher concentrations of the PROTAC tested? and why only VHL was tested and not CRBN-based PROTAC?).

      Due to the relatively small size of the LMO2 protein, it is challenging to produce enough unique peptides for reliable identification, especially to distinguish some proteins in the LMO2 complex.  

      (3) The correlation between degradation potency and cell growth is not well-established (compare Figure 4C: P12-Ichikawa blots show great degradation at 24 and 48 hrs, but it is unclear if the cell growth in this cell line is any better than in PF-382 or MOLT-16) - Can the authors comment on the correlation between degradation and cell growth?  

      In this study (Fig. 4) we did not aim to compare the effect of LMO2 loss on cell growth among LMO2 positive cells. Rather, we aimed to evaluate the LMO2 importance for cell growth in LMO2-expressing T-ALL cells compared to non-expressing cells and to correlate the loss of the protein with this effect on the cell growth. In addition, the treatment of cells with the LMO2 compounds did now show an effect to LMO2 negative cells until at least 48 hours of treatment indicating that low toxicity of our PROTAC compounds and providing correlation between LMO2 loss and cell growth. 

      (4) The PROTACs are not very potent (double-digit micromolar range?) - can the authors elaborate on any challenges in the optimization of the degradation potency? 

      The Abd methodology to use intracellular domain antibodies to screen for compounds that bind to intrinsically disordered proteins such as the LMO2 transcription factors offers a tractable approach to hard drug targets but, in so doing, creates challenging factors to improve the potency that are not the same as those targets for which structural data are available. LMO2 is an intrinsically disordered protein, for which soluble recombinant protein is not readily available to identify the binding pocket of compounds. The potency has so far been optimized solely based on the different moieties substituted in cell-based SAR studies (http://advances.sciencemag.org/cgi/content/full/7/15/eabg1950/DC1) and all new compounds were tested with BRET assays. Thus, currently optimization of the degradation potency (including properties such as improved solubility) for the LMO2-binding compounds relies on chemical modification the three areas of the compounds indicated in Fig. 2 B,C.  

      (5) The authors mentioned trying six iDAb-E3 ligase proteins; I would recommend listing the E3 ligases tried and commenting on the results in the main text. 

      The six chimaeric iDAb-E3 ligase proteins involved one anti-LMO2 iDAb and three different E3 ligase where either fused at the N- or the C-terminus of the VH (giving six protein formats). These six fusion proteins were described in the text referring to the degrader studies described in Supplementary Fig. 1. 

      Reviewer #2 (Public review): 

      Summary: 

      Sereesongsaeng et al. aimed to develop degraders for LMO2, an intrinsically disordered transcription factor activated by chromosomal translocation in T-ALL. The authors first focused on developing biodegraders, which are fusions of an anti-LMO2 intracellular domain antibody (iDAb) with cereblon. Following demonstrations of degradation and collateral degradation of associated proteins with biodegraders, the authors proceeded to develop PROTACs using antibody paratopes (Abd) that recruit VHL (Abd-VHL) or cereblon (Abd-CRBN). The authors show dose-dependent degradation of LMO2 in LMO2+ T-ALL cell lines, as well as concomitant dose-dependent degradation of associated bHLH proteins in the DNA-binding complex. LMO2 degradation via Abd-VHL was also determined to inhibit proliferation and induce apoptosis in LMO2+ T-ALL cell lines. 

      Strengths: 

      The topic of degrader development for intrinsically disordered proteins is of high interest, and the authors aimed to tackle a difficult drug target. The authors evaluated methods, including the development of biodegraders, as well as PROTACs that recruit two different E3 ligases. The study includes important chemical control experiments, as well as proteomic profiling to evaluate selectivity. 

      Weaknesses: 

      The overall degradation is relatively weak, and the mechanism of potential collateral degradation is not thoroughly evaluated

      The purpose of the study was to evaluate effects of LMO2 degraders. The mechanism of the observed collateral degradation could not be investigated directly within the scope of our study. In the main text, discussed two possible, not exclusive, explanations. One being that our work (and previously published, cited work) indicates that the DNA-binding bHLH proteins have relatively short half file (Supplementary Fig. S12) and may therefore be subject to normal turnover when the LMO2, which is in the complex, turns over. Further, the known structure of the LMO2-bHLH interactions (from Omari et al, doi: 10.1016/j.celrep.2013.06.008) was also examined for the location of lysines in the TAL1 & E47 partners (Supplementary Fig. S11). It is possible that their local association with the LMO2-E3-ligase complex created by the PROTAC interaction, could cause their concurrent degradation. Mutagenesis and structural analysis would be needed to establish this point.

      In addition, experiments comparing the authors' prior work with their anti-LMO2 iDAb or Abl-L are lacking, which would improve our understanding of the potential advantages of a degrader strategy for LMO2.  

      A major motivation behind developing the Antibody-derived (Abd) method to select compounds, which are surrogates of the antibody paratope, is because using iDAbs directly as inhibitors requires the development of delivery technologies for these macromolecules, as protein directly or as vectors or mRNA for their expression. Ultimately, high affinity anti-LMO2 iDAbs should directly be used as tractable inhibitors when delivery methods redeveloped. In the meantime, Abd compounds were envisaged as being surrogates suitable for development into reagents, and potentially drugs, by medicinal chemistry. We evaluated selected first generation LMO2-binding Abd compounds previously, finding their ability to interfere with LMO2-iDAb BRET signal to EC<sub>max</sub> about 50% but these compounds do not have potency to have an effect on the interaction of LMO2 with a non-mutated iDAb (nM affinity). These data indicated that efficacy improvement for the PROTACs was needed. In addition, in the current study, we observed viability effects in T-ALL lines at high concentrations (20 μM) irrespective of LMO2 expression (Supplementary Fig. S 2A, B) These data indicated that efficacy improvement was needed and potentially converting the degraders (PROTACs) would add to in-cell potency. By adding the E3 ligase ligands, we found the toxicity of non-LMO2 expressing Jurkat was significantly reduced (Supplementary Fig. S 2E, F). 

      Reviewer #2 (Recommendations for the authors): 

      Suggestions for additional experiments: 

      (1) The data presented is primarily focused on demonstrating targeted degradation of LMO2, with a focus on phenotypes such as proliferation and apoptosis. In this manuscript, there are limited comparative evaluations of anti-LMO2 iDAb or Abl-L to show the potential benefits of a degrader approach to their previously described work, as well as why targeted degradation is in fact, advantageous. For example, the authors' previous work has shown that anti-LMO2 iDAb inhibits tumor growth in a mouse transplantation model. Comparisons in vitro would be supportive of the importance of continued degrader optimization/development.  

      we have previously shown that an anti-LMO2 scFv inhibits tumour growth in a mouse model but this work used an expressed scFv antibody that binds to LMO2 in nM range. The Abd compounds are much lower potency that the antibody and, because recombinant LMO2 is difficult to work with, we could only evaluate interactions of compounds with LMO2 in cell-based assays like BRET (LMO2-iDAb BRET). In this cell-based assay, the first generation Abd compounds do not have sufficient potency to block LMO2-iDAb interaction unless the affinity of the iDAb is reduced to sub-μM. The justification for proceeding on the degrader process rather than just using the protein-protein interaction (PPI) inhibition was based largely around the low potency of the first generation PPI compounds in cell assays and that incorporation protein degradation with PPI inhibition would enhance the efficacy.

      In addition, the viability experiments are also very short-term; is there a reason why the authors did not carry out these experiments for 3-5 days to fully understand the impacts on proliferation? 

      In Supplementary Fig. S5, we did show assays up to 3 days. In KOPT-K1 (LMO2+), the LMO2 levels were reduced during the time course of this assay (from a single compound dose at time zero) (Supplementary Fig S 5A, B). We also show CellTitreGlo assays up to 3 days and, with these second generation compounds, we observed sustained effects on KOPT-K1 (LMO2+) but low non-DMSO toxicity in Jurkat (LMO2-) (revised version Supplementary (Fig S5 C, D).

      (2) The potential mechanism of collateral degradation is interesting and important in evaluating the on-target responses and consequences of degrading LMO2. At this time, the data supporting collateral degradation is limited and would be strengthened by showing that it is not due to a change in mRNA levels and not due to complex dissociation. Overall, the kinetics and depth of loss of complex members such as E47 in Figure 3 appear more substantial than LMO2 itself, and as presented, collateral degradation is not effectively demonstrated. In addition, to aid in the readers' assessments, additional background and references around the roles of TAL1 and E47 would be helpful. For example, structurally, where do they (and other associated proteins that are not degraded) fit in the complex? 

      We have responded above in relation to the Public Review Comments and note that a structure of the complex was in submitted version (now revised version Supplementary Fig. S11). 

      (3) In Figure 1A, the blots show decreased levels of endogenous CRBN with iDAB-CRBN. Is this a known consequence of this approach in these cell lines? Does the partial recovery of endogenous CRBN in KOPTK1 cells have any indication of iDAB-CRBN levels? 

      We cannot be sure why the endogenous level of CRBN decreases in doxycycline treated cells. It has been shown (DOI:10.1371/journal.pone.0064561) that doxycycline used in the inducible expression system (and its derivatives), such as the lentivirus we used, has an effect to gene expression patterns, which can be increase or decrease expression. Although the published study did not examine CRBN expression, the effect might explain the CRBN expression decrease on doxycycline addition and remains the same level after that. 

      (4) In Figure S7, the authors do not fully explain the results and why there is minimal rescue with epoxomicin (S7A) or MLN4924 (S7J). This could indicate an alternative mechanism of degradation and loss at play, given the lack of rescue. Can the authors comment on this discrepancy, and have they looked autophagy inhibitor or other agents to achieve the chemical rescue? 

      In the experiments such as in revised version Supplementary Fig. S6, we used KOPT-K1 cells with a single concentration of the inhibitors and the cells may less susceptible to the epoxomicin (0.8 μM) but lenalidomide and free thalidomide restored the LMO2 levels fully. In the main text Fig. 3D, we also showed that including epoxomicin and thalidomide with the Abd-CRBN in KOPT-K1 and CCRF-CEM restore LMO2 levels, supporting the conclusion that the main mechanism of degradation is through ubiquitination proteosomal route.

      (5) For the proteomics data, it would be helpful to have the proteins in yellow highlighted to have them noted in 5D and 5E. In addition, can the authors comment on why LMO2 or their collateral targets are not confirmed in the table? Furthermore, 5C is difficult to interpret; if there are no significantly changing proteins in the Jurkat cells, why are there pathways that are identified? 

      As mentioned in reply to referee 1, due to the relatively small size of the LMO2 protein, it is challenging to produce enough unique peptides for reliable identification, especially to distinguish some proteins in the LMO2 complex where expression levels are low.

    1. Author response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Confirmation of daf-7::GFP data and inheritance beyond F2

      Reviewer suggested confirming daf-7::GFP molecular marker data and testing inheritance beyond the F2 generation to further strengthen the findings.

      We agree these experiments would provide valuable mechanistic insights into the molecular basis of transgenerational inheritance. However, our study was specifically designed as a reproducibility study focusing on the central controversy regarding F2 inheritance (Gainey et al. vs. Murphy lab findings). The daf-7::GFP molecular marker experiments, while important for understanding mechanisms, represent a different research question requiring extensive additional resources and expertise beyond the scope of this validation study. Our primary goal was to provide independent confirmation of the disputed F2 inheritance using standardized behavioral assays. It is our hope that future work will pursue these important mechanistic validations.

      "Exhaustive attempts" language

      Reviewer disagreed with characterizing Gainey et al.'s efforts as "exhaustive attempts" since they modified the original protocol.

      We revised this statement in the Results and Discussion to more accurately reflect the experimental situation: "In contrast, Gainey et al. (2025), representing the Hunter group, reported that while parental and F1 avoidance behaviors were evident, transgenerational inheritance was not reliably observed beyond the F1 generation under their experimental conditions."

      Importance of sodium azide

      Reviewer suggested including more discussion about the recent findings on the importance of sodium azide in the assay, referencing the Murphy group's response paper.

      We have prominently highlighted the critical role of sodium azide in our Introduction with strengthened language that emphasizes its importance for resolving the scientific controversy: "Critically, Kaletsky et al. (2025) demonstrated that omission of sodium azide during scoring can completely abolish detection of inherited avoidance, revealing that this key methodological difference may explain the conflicting results between laboratories. The use of sodium azide to immobilize worms at the moment of initial bacterial choice appears essential for capturing the inherited behavioral response. These findings highlight how seemingly minor methodological variations can dramatically impact detection of transgenerational inheritance and underscore the need for independent replication using standardized protocols."

      Protocol fidelity statement

      Reviewer requested a more direct statement clarifying that we followed the Murphy group protocol, noting that we made some modifications.

      We followed the core Murphy lab protocol with two evidence-based optimizations that preserve the essential experimental elements: 1) We used 400 mM sodium azide instead of 1 M based on preliminary data showing the higher concentration caused premature paralysis before worms could make behavioral choices, and 2) We used liquid NGM buffer instead of M9 to maintain chemical consistency with the solid NGM plates used for worm culture, minimizing potential osmotic stress. These modifications improved experimental reliability while maintaining the critical components: sodium azide immobilization, bacterial lawn density standardization (OD<sub>600</sub> = 1.0), and synchronized scoring conditions that are essential for detecting inherited avoidance.

      Overstated dilution claim

      Reviewer noted that the statement about "gradual decrease" in avoidance strength was overstated and didn't reflect the actual data presented in the manuscript.

      We removed this statement.

      Environmental variables phrasing

      Reviewer found the sentence about environmental variables unclear, noting that Gainey et al. didn't actually acknowledge variability but saw it as indicating error or stochastic processes.

      We refined this statement for greater precision and clarity: "This underscores the assay's sensitivity to environmental variables, such as synchronization method and bacterial lawn density. This highlights the importance of consistency across experimental setups and support the view that context-dependent variation may underlie previously reported discrepancies."

      Reviewer #2 (Public Review):

      Reagent sourcing

      Reviewer suggested listing the sources of media ingredients with company names and catalog numbers, as this might be important for reproducibility.

      To ensure complete reproducibility, we created a comprehensive Table S3 listing all reagents, suppliers, and catalog numbers used in our experiments. This detailed information enables exact replication of our experimental conditions and addresses potential variability that might arise from different reagent sources between laboratories.

      Reviewer #3 (Public Review):

      Raw data transparency

      Reviewer noted that while a spreadsheet with choice assay results was provided, the individual raw data from assays was not included, which would be helpful for assessing sample sizes.

      We now provide complete experimental transparency through Table S2, which contains individual choice indices from all 138 assays conducted across four independent trials. This comprehensive dataset allows full assessment of our experimental outcomes, statistical robustness, and reproducibility while enabling other researchers to perform independent statistical analyses.

      F1/F2 assay disparity

      Reviewer questioned whether the higher number of F2 assays compared to F1 represented truly independent assays, asking if multiple F2 assays were performed from offspring of one F1 plate (which would not represent independent assays).

      We clarified this important statistical consideration in Methods (Transgenerational Testing): "Each behavioral assay was conducted using animals from a biologically independent growth plate. While F2 plates were derived from pooled embryos from multiple F1 parents, each assay represents an independent biological replicate with no reuse of animals across assays. F2 assays (n=45) exceeded F1 assays (n=20) due to PA14-induced fecundity reduction in trained worms, limiting the number of viable F1 progeny. The higher number of F2 assays reflects the greater reproductive success of healthy F1 animals and provides additional statistical power for population-level behavioral comparisons." We also enhanced our Controls section to clarify that "Our experimental design employed population-level comparisons across generations using unpaired statistical analyses, with no attempt to track individual lineages across generations."

      Methodological variations overstatement

      Reviewer felt the Introduction overstated the findings by suggesting the authors "address potential methodological variations," when they only used one assay setup throughout.

      We have corrected the Introduction to accurately reflect our study design and scope: "Here, we adapted the protocol established by the Murphy group, maintaining the critical use of sodium azide to paralyze worms at the time of choice, to test whether parental exposure to PA14 elicits consistent avoidance in subsequent generations. Our study specifically focuses on the transmission of learned avoidance through the F2 generation, beyond the intergenerational (F1) effect, because this is where divergence between published studies begins."

      Reviewer #1 (Recommendations for the authors):

      Worm numbers

      Reviewer noted that information about the number of worms used should be included in the training and choice assay methods section rather than separated.

      We clarified worm numbers and sample sizes in the Methods (Controls and Additional Considerations): "Each individual assay averaged 62 ± 43 animals (range: 15-150 worms per assay), with a total of 138 assays conducted across four independent experimental trials. The variation in worm numbers per assay reflects natural variation in worm recovery and immobilization efficiency during choice assays. We conducted an average of 8.5 assays per condition during each of the four replicates."

      Figure 1 legend and consistency

      Reviewer identified several issues: inconsistent terminology ("treated" vs "trained"), incorrect statistical test naming, missing p-value annotations, and need for consistency between figure and legend. We have systematically addressed all figure consistency and statistical annotation issues:

      Replaced inconsistent "treated" terminology with "trained" throughout

      Corrected the statistical test description to accurately reflect our analysis: "Kruskal-Wallis oneway ANOVA followed by Dunn's post hoc" which properly corresponds to the statistical tests detailed in Table S1

      Added explicit p-value annotations in the figure legend: "*p<0.05, **p<0.01 means and SEM shown (see Table S1 for statistics and Table S2 for raw data)"

      Ensured consistent terminology between figure and legend

      NGM vs. M9 buffer

      Reviewer questioned whether we used NGM buffer or M9 buffer for washing steps, noting that NGM isn't usually referred to as "buffer."

      We have prominently featured and thoroughly clarified our rationale for using liquid NGM buffer in the Methods (Synchronization of Worms section). The explanation now appears upfront in the methods: "We used liquid NGM buffer instead of M9 buffer (as specified in the original Murphy protocol) to maintain chemical consistency with the solid NGM culture plates. This modification minimizes potential osmotic stress since liquid NGM matches the pH (6.0) and ionic composition of the growth medium, whereas M9 buffer has a different pH (7.0) and ionic profile." We provide detailed chemical differences and explain that this modification maintains consistency with culture conditions while preserving essential experimental procedures.

      Grammar/typos

      Reviewer noted that the manuscript needed thorough proofreading to address grammatical errors and typographical mistakes.

      We have conducted comprehensive proofreading and editing throughout the manuscript to resolve grammatical and typographical errors. Specific improvements include: clarified sentence structure in the Introduction and Results sections, corrected technical terminology consistency, improved figure legend clarity, and enhanced overall readability while maintaining scientific precision.

      Sodium azide concentration

      Reviewer noted that our sodium azide concentration differed from the Moore paper and requested comment on this difference.

      We have included explicit justification for our sodium azide concentration choice in the Methods (Training and Choice Assay): "We used 400 mM sodium azide rather than the 1 M concentration reported by Moore et al. (2019) because preliminary trials showed that higher concentrations caused premature paralysis before worms could reach either bacterial spot, potentially biasing choice measurements. The 400 mM concentration provided sufficient immobilization while preserving the behavioral choice window."

      Reviewer #2 (Recommendations for the authors):

      Comparative reagent analysis

      Reviewer suggested creating a supplemental table comparing reagent sources between our study, Gainey et al., and Murphy et al., proposing that media ingredient differences might explain the discrepancies.

      While direct reagent comparison between laboratories was beyond the scope of this validation study, we recognize this as an important consideration for understanding experimental variability. Our comprehensive reagent sourcing information (Table S3) provides the foundation for future comparative studies. We encourage collaborative efforts to systematically compare reagent sources across laboratories, as media component differences could contribute to the experimental variability observed between research groups. Such analyses would be valuable for establishing standardized protocols across the field.

      Conclusion

      We hope that these revisions satisfactorily address the reviewers’ concerns. We believe these improvements significantly strengthened the manuscript's contribution to resolving this important scientific controversy.

      We thank the reviewers again for their invaluable insights and constructive feedback, which have substantially improved the quality and impact of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.

      Strengths:

      (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.

      (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.

      Weaknesses:

      (1) The study focuses on cell-type label changes as the main downstream impact of imperfections. Broadening the scope to explore expression response changes of downstream analyses would offer a more complete picture of the biological consequences of these imperfections and enhance the utility of the tool.

      Here, we focused on the consequences of imperfections on cell-type labels, one common use for spatial transcriptomics datasets. Spatial datasets are used for so many other purposes that there are almost endless ways in which imperfections could impact downstream analyses. It is difficult to see how we might broaden the scope to include more downstream effects, while providing enough analysis to derive meaningful conclusions, all within the scope of a single paper. Existing studies bring some insight into the impact of imperfections and we expect future studies will extend our understanding of consequences in other biological contexts.

      (2) While the manuscript identifies and quantifies imperfections effectively, it does not propose post-imaging data processing solutions to correct these issues, aside from the exclusion of problematic sections or transcript species. While this is understandable given the study is aimed at the highest quality atlas effort, many researchers don't need that level of quality to compare groups. It would be important to include discussion points as to how those cut-offs should be decided for a specific study.

      Studies differ greatly in their aims and, as a result, the impact of imperfections in the underlying data will differ also, preventing us from offering meaningful guidance on how cut-offs might best be identified. Rather, our aim with MerQuaCo was to provide researchers with tools to generate information on their spatial datasets, to facilitate downstream decisions on data inclusion and cut-offs.

      (3) Although the authors demonstrate the applicability of MerQuaCo on a large MERFISH dataset, and the limited number of sections from other platforms, it would be helpful to describe its limitations in its generalizability.

      In figure 9, we addressed the limitations and generalizability of MerQuaCo as best we could with the available datasets. Gaining deep insight into the limitations and generalizability of MerQuaCo would require application to multiple large datasets and, to the best of our knowledge, these datasets are not available.

      Reviewer #2 (Public review):

      The authors present MerQuaCo, a computational tool for quality control in image-based spatial transcriptomic, especially MERSCOPE. They assessed MerQuaCo on 641 slides that are produced in their institute in terms of the ratio of imperfection, transcript density, and variations of quality by different planes (x-axis).

      Strengths:

      This looks to be a valuable work that can be a good guideline of quality control in future spatial transcriptomics. A well-controlled spatial transcriptomics dataset is also important for the downstream analysis.

      Weaknesses:

      The results section needs to be more structured.

      We have split the ‘Transcript density’ subsection of the results into 3 new subsections.

      Reviewer #3 (Public review):

      MerQuaCo is an open-source computational tool developed for quality control in imagebased spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 freshfrozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.

      Strengths:

      The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.

      Weaknesses:

      While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation require expansion, particularly for non-MERSCOPE platforms and real-world biological impact.

      We agree that there is value in expanding our analyses to non-Merscope platforms, to tissues other than brain, and to analyses other than cell typing. The limiting factor in all these directions is the availability of large enough datasets to probe the limits of MerQuaCo. We look forward to a future in which more datasets are available and it’s possible to extend our analyses

      Reviewer #1(Recommendation for the Author):

      (1) To better capture the downstream impacts of imperfections, consider extending the analysis to additional metrics, such as specificity variation across cell types, gene coexpression, or spatial gene patterning. This would deepen insights into how these imperfections shape biological interpretations and further demonstrate the versatility of MerQuaCo.

      These are compelling ideas, but we are unable to study so many possible downstream impacts in sufficient depth in a single study. Insights into these topics will likely come from future studies.

      (2) In Figure 7 legend, panel label (D) is repeated thus panels E-F are mislabelled. 

      We have corrected this error.

      (3) Ensure that the image quality is high for the figures. 

      We will upload Illustrator files, ensuring that images are at full resolution.

      Reviewer #2 (Recommendation for the Author):

      (1) A result subsection "Transcript density" looks too long. Please provide a subsection heading for each figure. 

      We have split this section into 3 with new subheadings.

      (2) The result subsection title "Transcript density" sounds ambiguous. Please provide a detailed title describing what information this subsection contains. 

      We have renamed this section ‘Differences in transcript density between MERSCOPE experiments’.

      Minor: 

      (1) There is no explanation of the black and grey bars in Figure 2A.

      We have added information to the figure legend, identifying the datasets underlying the grey and black bars.

      (2) In the abstract, the phrase "High-dimension" should be "High-dimensional". 

      We have changed ‘high-dimension’ to ‘high-dimensional’.

      (3) In the abstract, "Spatial results" is an unclear expression. What does it stand for? 

      We have replaced the term ‘spatial results’ with ‘the outputs of spatial transcriptomics platforms’.

      Reviewer #3 (Recommendation for the Author):

      (1) While the tool claims broad applicability, validation is heavily centered on MERSCOPE data, with limited testing on other platforms. The authors should expand validation to include more diverse platforms and add a small analysis of non-brain tissue. If broader validation isn't feasible, modify the title and abstract to reflect the focus on the mouse brain explicitly.

      We agree that expansion to other platforms is desirable, but to the best of our knowledge sufficient datasets from other platforms are not available. In the abstract, we state that ‘… we describe imperfections in a dataset of 641 fresh-frozen adult mouse brain sections collected using the Vizgen MERSCOPE.’

      (2) The impact of data imperfections on downstream analysis needs a more comprehensive evaluation. The authors should expand beyond cluster label changes to include a) differential expression analysis with simulated imperfections, b) impact on spatial statistics and pattern detection, and c) effects on cell-cell interactions. 

      Each of these ideas could support a substantial study. We are unable to do them justice in the limited space available as an addition to the current study.

      (3) The pixel classification workflow and validation process need more detailed documentation. 

      The methods and results together describe the workflow and validation in depth. We are unclear what details are missing.

      (4) The manuscript lacks comparison to existing. QC pipelines such as Squidpy and Giotto. The authors should benchmark MerQuaCo against them and provide integration options with popular spatial analysis tools with clear documentation.

      To the best of our knowledge, Squidpy and Giotto lack QC benchmarks, certainly of the parameters characterized by MerQuaCo. Direct comparison isn’t possible.

    1. Author response:

      The following is the authors’ response to the original reviews

      General Statements:

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      Description of the planned revisions:

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      Description of the revisions that have already been incorporated in the transferred manuscript:

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We thank Reviewer #2 to point it as in our opinion, this is an important point what should be added to the manuscript. We have now included new data (panels 5G, 5H and 5I) in the manuscript showing that SuperPol in vitro exhibits an increased error rate compared to the WT enzyme. From these results obtained in vitro, we concluded that SuperPol shows reduced nascent transcript cleavage, associated with more efficient transcript elongation, but to the detriment of transcriptional fidelity.

      pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. It has been corrected.

      Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      Description of analyses that authors prefer not to carry out:

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model developmentand performance evaluation need to be clarified and extended.

      We are thankful for the positive comments and have made changes following the reviewer's advice, as detailed below.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model. 

      Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns (Fig. 3c). In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary.

      Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults. Supplementary Figure 7 illustrates the regularization path for cell assignment in the fly embryo.  

      The choice of L1 and L2 regularization parameters is crucial for balancing sparsity and smoothness in spatial mapping. 

      For Structured Tissues (brain):

      Moderate L1 to ensure cells are localized.

      Small to moderate L2 to maintain local smoothness without blurring distinct regions.

      For Less Structured (PDAC):

      Slightly lower L1 to allow cells to be associated with multiple regions if boundaries are ambiguous.

      Higher L2 to stabilize mappings in noisy or mixed regions.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For highquality datasets, fewer than 100 markers are typically sufficient for prediction. To select marker genes, we applied the following optional strategies:

      (1) Identifying highly variable genes (HVGs).

      (2) Calculating Moran’s I scores for all genes to assess spatial autocorrelation.

      (3) Generating anchor genes based on the integration of the reference atlas and scRNA-seq data using Seurat.

      We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used. For fair comparison across different methods, the same set of marker genes was used. Under this condition, our method outperformed the others based on KL divergence (Fig. 2b, Fig. 5g). 

      To assess the effect of marker gene quantity, we randomly selected subsets of 2,000, 1500, 1,000, 700, 500, and 200 markers from the original set. As the number of markers decreases, mapping performance declines, which is expected due to the reduction in available spatial information. This result underscores the general dependence of spatial mapping accuracy on both the number and quality of informative marker genes (Supplementary Fig. 10).

      We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. We incorporated a high-resolution Slide-seq dataset from the mouse hippocampus to evaluate the influence of cell type composition on the algorithm’s performance [Stickels et al., 2020]. Most cell types within the CA1, CA2, CA3, and DG regions were accurately mapped to their original anatomical locations (Fig. 5e, f, g).

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships,  we may provide some cell-cell communications in the future.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

      Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. In our framework, each cell is mapped to one or more spatial spots with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain (Supplementary Fig. 5 & 7). For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms. 

      Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3

      Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1.

      Stickels, R.R. et al. (2020) ‘Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-SEQV2’, Nature Biotechnology, 39(3), pp. 313–319. doi:10.1038/s41587-020-0739-1. 

      Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Weakness:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencingbased ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data (Fig. 4c). However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      Thank you for the comment. To clarify how ground truth is defined across different tissues, we provided the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:

      10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.

      Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.

      Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.

      These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribu tion to enhance the clarity and robustness of their investigation.

      Thank you for the comment. In the intestinal dataset, only six large domains were defined. As a result, the task for this dataset is relatively simple—each cell only needs to be assigned to one of the six domains. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and  Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      Thank you for the comment. To improve visualization, we included anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Major cell type assignments for the PDAC samples, along with anatomical structures, are shown in Supplementary Figure 9. Most of these cell types were correctly mapped to their corresponding anatomical regions.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we computed the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset. We identified a higher-quality dataset for the mouse hippocampus and used it to evaluate our algorithm. Additionally, we employed KL divergence as an alternative strategy to validate and benchmark our results (Fig. 5e, f, g). Most CA cells, including CA1, CA2, and CA3 principal cells, were correctly assigned back to the CA region. Dentate principal cells were accurately mapped to the DG region (Fig. 5e, f).

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

      Thank you for the insightful comment. We agree that many spatial datasets used in our study are from tissues with well-defined anatomical regions. To address the applicability of glmSMA in tissues without clearly separated anatomical structures, we applied glmSMA to the Drosophila embryo, which represents a tissue with relatively continuous spatial patterns and lacks well-demarcated anatomical boundaries compared to organs like the brain or intestinal villus.

      Despite this less structured spatial organization, glmSMA demonstrated robust performance in the fly embryo, accurately mapping cells to their correct spatial spots based on gene expression profiles. This result indicates that glmSMA is not strictly limited to highly structured tissues and can generalize to tissues with more continuous or gradient-like spatial architectures. These results suggest that glmSMA has broader applicability beyond highly compartmentalized tissues.

      Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453

      Reviewer #3 (Public review):

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      We thank reviewer #3 for their positive feedback and thoughtful recommendations.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

      Thank you for raising this important point. We agree that glmSMA operates under the assumption that cells with similar gene expression profiles tend to be spatially proximal, and this assumption may not strictly hold in highly heterogeneous tissues where spatial organization is less coupled to transcriptional similarity.

      To address this concern, we specifically tested glmSMA on human PDAC samples, which represent moderately heterogeneous environments characterized by complex tumor microenvironments, including a mixture of ductal cells, cancer cells, stromal cells, and other components. Despite this heterogeneity, glmSMA successfully mapped major cell types to their expected anatomical regions, demonstrating that the method is robust even in the presence of substantial cellular diversity and spatial complexity.

      This result suggests that while glmSMA relies on the assumption of spatialtranscriptomic correlation, the method can tolerate a reasonable degree of spatial heterogeneity without a significant loss of performance. Nevertheless, we acknowledge that in extremely disorganized or highly mixed tissues where transcriptional similarity is decoupled from spatial proximity, the performance may be affected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction, specifically in one subtype (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.

      Strengths:

      The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability, combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.

      We thank the reviewers for their thorough reading of our manuscript and helpful comments.

      Weaknesses:

      (1) Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link. 

      We show that the Lmx1b mutation induces mitochondrial dysfunction with mitochondrial gene expression changes but agree with the referee in that we do not show direct regulation of mitochondrial genes by LMX1B. Emerging data suggest that LMX1B regulates the expression of mitochondrial genes in other cell types [1, 2] making the direct link reasonable. Future work that is beyond the scope of the current paper will focus on sequencing cells at earlier timepoints to help distinguish gene expression changes associated with the V265D mutation from those secondary to ongoing disease and elevated IOP. Additional studies, including ATAC seq at more ages, ChIP-seq and/or Cut and Run/Tag (in TM cells) will be necessary to directly investigate LMX1B target genes.

      As we studied adult mice, mitochondrial gene expression changes could be secondary to other disease induced stresses. Because we did not intend to say we have shown a direct link, we have now added a sentence to the discussion ensure clarity. 

      Lines 932-934: “Although our studies show a clear effect of the Lmx1b mutation on mitochondria, future studies are needed to determine if LMX1B directly modulates mitochondrial genes in V265D mutant TM cells”

      (2) The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped - for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured. 

      We agree that further experiments towards a fuller mechanistic understanding of vitamin B3’s therapeutic effects are needed. Such experiments are planned but are beyond the scope of this paper, which is already very large (7 Figures and 16 Supplemental Figures).

      (3) While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.

      We appreciate the reviewer’s suggestion and agree that additional quantitative analyses will further strengthen the translational relevance of TM3 cells. It is not yet clear if humans have a direct TM3 counterpart or if TM cell roles are compartmentalized differently between human cell types. We are currently limited in our ability to perform these comparative analyses. Specifically, we were unable to obtain permission to use the underlying dataset from Patel et al., and our access to the Van Zyl et al. dataset was through the Single Cell Portal, which does not support more complex analyses (ex. cell identity mapping or gene signature scoring). Differences between human studies themselves also affect these comparisons. Future work aimed at resolving differences and standardizing human TM cell annotations, as well as cross species comparisons are needed (working groups exist and this ongoing effort supports 3 human TM cell subtypes as also reported by Van Zyl). This is beyond what we are currently able to do for this paper. We present a comprehensive assessment using readily available published resources.

      Reviewer #2 (Public review):

      Summary:

      This elegant study by Tolman and colleagues provides fundamental findings that substantially advance our knowledge of the major cell types within the limbus of the mouse eye, focusing on the aqueous humor outflow pathway. The authors used single-cell and single-nuclei RNAseq to very clearly identify 3 subtypes of the trabecular meshwork (TM) cells in the mouse eye, with each subtype having unique markers and proposed functions. The U. Columbia results are strengthened by an independent replication in a different mouse strain at a separate laboratory (Duke). Bioinformatics analyses of these expression data were used to identify cellular compartments, molecular functions, and biological processes. Although there were some common pathways among the 3 subtypes of TM cells (e.g., ECM metabolism), there also were distinct functions. For example:

      TM1 cell expression supports heavy engagement in ECM metabolism and structure, as well as TGFb2 signaling.

      TM2 cells were enriched in laminin and pathways involved in phagocytosis, lysosomal function, and antigen expression, as well as End3/VEGF/angiopoietin signaling.

      TM3 cells were enriched in actin binding and mitochondrial metabolism.

      They used high-resolution immunostaining and in situ hybridization to show that these 3 TM subtypes express distinct markers and occupy distinct locations within the TM tissue. The authors compared their expression data with other published scRNAseq studies of the mouse as well as the human aqueous outflow pathway. They used ATAC-seq to map open chromatin regions in order to predict transcription factor binding sites. Their results were also evaluated in the context of human IOP and glaucoma risk alleles from published GWAS data, with interesting and meaningful correlations. Although not discussed in their manuscript, their expression data support other signaling pathways/ proteins/ genes that have been implicated in glaucoma, including: TGFb2, BMP signaling (including involvement of ID proteins), MYOC, actin cytoskeleton (CLANs), WNT signaling, etc.

      In addition to these very impressive data, the authors used scRNAseq to examine changes in TM cell gene expression in the mouse glaucoma model of mutant Lmxb1-induced ocular hypertension. In man, LMX1B is associated with Nail-Patella syndrome, which can include the development of glaucoma, demonstrating the clinical relevance of this mouse model. Among the gene expression changes detected, TM3 cells had altered expression of genes associated with mitochondrial metabolism. The authors used their previous experience using nicotinamide to metabolically protect DBA2/J mice from glaucomatous damage, and they hypothesized that nicotinamide supplementation of mutant Lmx1b mice would help restore normal mitochondrial metabolism in the TM and prevent Lmx1b-mediated ocular hypertension. Adding nicotinamide to the drinking water significantly prevented Lmxb1 mutant mice from developing high intraocular pressure. This is a laudable example of dissecting the molecular pathogenic mechanisms responsible for a disease (glaucoma) and then discovering and testing a potential therapy that directly intervenes in the disease process and thereby protects from the disease.

      Strengths:

      There are numerous strengths in this comprehensive study including:

      Deep scRNA sequencing that was confirmed by an independent dataset in another mouse strain at another university.

      Identification and validation of molecular markers for each mouse TM cell subset along with localization of these subsets within the mouse aqueous outflow pathway.

      Rigorous bioinformatics analysis of these data as well as comparison of the current data with previously published mouse and human scRNAseq data.

      Correlating their current data with GWAS glaucoma and IOP "hits".

      Discovering gene expression changes in the 3 TM subgroups in the mouse mutant Lmx1b model of glaucoma.

      Further pursuing the indication of dysfunctional mitochondrial metabolism in TM3 cells from Lmx1b mutant mice to test the efficacy of dietary supplementation with nicotinamide. The authors nicely demonstrate the disease modifying efficacy of nicotinamide in preventing IOP elevation in these Lmx1b mutant mice, preventing the development of glaucoma. These results have clinical implications for new glaucoma therapies.

      We thank the reviewer for these generous and thoughtful comments on the strengths of this study.

      Weaknesses:

      (1) Occasional over-interpretation of data. The authors have used changes in gene expression (RNAseq) to implicate functions and signaling pathways. For example: they have not directly measured "changes in metabolism", "mitochondrial dysfunction" or "activity of Lmx1b".

      We thank the reviewer for this feedback. We did not intend to overstate and agree. Our gene expression changes support, but do not by themselves prove, metabolic disturbances. We had felt that this was obvious and did not want to clutter the text. We have revised the manuscript to clarify that our conclusions about metabolic changes and LMX1B activity are based on gene expression patterns rather than direct functional assays and have added EM data (see below under “Recommendations for the authors”).

      We have also added the following to the results:

      Lines 715-721: “Although the documented gene expression changes strongly suggest metabolic and mitochondrial dysfunction, they do not directly prove it. Using electron microscopy to directly evaluate mitochondria in the TM, we found a reduction in total mitochondria number per cell in mutants (P = 0.015, Figure 6G). In addition, mitochondria in mutants had increased area and reduced cristae (inner membrane folds) in mutants consistent with mitochondrial swelling and metabolic dysfunction (all P < 0.001 compared to WT, Figure 6G-H).”

      More detailed EM and metabolic studies are underway but are beyond the scope of this paper.

      (2) In their very thorough data set, there is enrichment of or changes in gene expression that support other pathways that have been previously reported to be associated with glaucoma (such as TGFb2, BMP signaling, actin cytoskeletal organization (CLANs), WNT signaling, ossification, etc. that appears to be a lost opportunity to further enhance the significance of this work.

      We appreciate the reviewer’s suggestions for enhancing the relevance of our work, we had not initially discussed this due to length concerns. We have now incorporated some of this information into the manuscript (see below under “Recommendations for the authors”).

      Reviewer #3 (Public review):

      Summary: In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue.

      This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable crossspecies insights.

      Strengths: 

      (1) Comprehensive dataset with high single-cell resolution

      (2) Use of multiple bioinformatic and cross-comparative approaches

      (3) Integration of 3D imaging of TM and SC for anatomical context

      (4) Convincing identification and validation of three TM subtypes using molecular markers.

      We thank the reviewer for their comments on the strengths of this study.

      Weaknesses:

      (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Additional evidence is needed to clarify whether Lmx1b directly regulates mitochondrial genes (e.g., via ChIP-seq, motif analysis, or ATAC-seq), or whether mitochondrial changes are downstream effects.

      We agree and refer the reviewer to our responses to the other referees including Reviewer 1, Comment 1 and Reviewer 2 comments 1 and 17. As noted there, these mechanistic questions are the focus of ongoing and future studies. We have revised the text where appropriate to ensure it accurately reflects the scope of our current data.

      (2) Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.

      We again refer the reviewer to our other response including Reviewer 1, Comment 1 and Reviewer 2 comments 1 and 17.

      (3) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.

      We refer the reviewer to our response to Reviewer 1, Comment 2.

      (4) Lack of direct evidence that LMX1B regulates mitochondrial genes: While transcriptomic and motif accessibility analyses suggest that LMX1B is enriched in TM3 cells and may influence mitochondrial function, no mechanistic data are provided to demonstrate direct regulation of mitochondrial genes. Including ChIP-seq data, motif enrichment at mitochondrial gene loci, or perturbation studies (e.g., Lmx1b knockout or overexpression in TM3 cells) would greatly strengthen this central claim.

      We refer the reviewer to our response to Reviewer 1, Comment 1.

      (5) Focus on LMX1B in Fig. 5F lacks broader context: Figure 5F shows that several transcription factors (TFs)-including Tcf21, Foxs1, Arid3b, Myc, Gli2, Patz1, Plag1, Npas2, Nr1h4, and Nfatc2exhibit stronger positive correlations or motif accessibility changes than LMX1B. Yet the manuscript focuses almost exclusively on LMX1B. The rationale for this focus should be clarified, especially given LMX1B's relatively lower ranking in the correlation analysis. Were the functions of these other highly ranked TFs examined or considered in the context of TM biology or glaucoma? Discussing their potential roles would enhance the interpretation of the transcriptional regulatory landscape and demonstrate the broader relevance of the findings.

      Our analysis (Figure 5F) indicates that Lmx1b is the transcription factor most strongly associated with its predicted target gene expression across all TM cells, as reflected by its highest value along the X-axis. While other transcription factors exhibit greater motif accessibility (Y-axis), this likely reflects their broader expression across TM subtypes. In contrast, Lmx1b is minimally expressed in TM1 and TM2 cells, which may account for its lower motif accessibility overall (motifs not accessible in cells where Lmx1b is not / minimally expressed).

      Our emphasis on LMX1B is further supported by its direct genetic association with glaucoma. In contrast, the other transcription factors lack clear links to glaucoma and are supported primarily by indirect evidence. Nonetheless, we agree that the transcription factors highlighted in our analysis are promising candidates for future investigation. However, to maintain focus on the central narrative of this study, we have chosen not to include an extended discussion of these additional genes.

      (6) In abstract, they say a number of 9,394 wild-type TM cell transcriptomes. The number of Lmx1bV265D/+ TM cell transcriptomes analyzed is not provided. This information is essential for evaluating the comparative analysis and should be clearly stated in the Abstract and again in the main text (e.g., lines 121-123). Including both wild-type and mutant cell counts will help readers assess the balance and robustness of the dataset.

      We thank the reviewer for noticing this oversight and have added this value to the abstract and results section. 

      Lines 41 and 696: 2,491 mutant TM cells.  

      (7) Did the authors monitor mouse weight or other health parameters to assess potential systemic effects of treatment? It is known that the taste of compounds in drinking water can alter fluid or food intake, which may influence general health. Also, does Lmx1bV265D/+ have mice exhibit non-ocular phenotypes, and if so, does nicotinamide confer protection in those tissues as well? Additionally, starting the dose of the nicotinamide at postnatal day 2, how long the mice were treated with water containing nicotinamide, and after how many days or weeks IOP was reduced, and how long the decrease in the IOP was sustained.

      Water intake was monitored in both treatment groups, and dosing was based on the average volume consumed by adult mice (lines 1017–1018, young pups do not drink water and so drug is largely delivered through mothers’ milk until weaning and so we do not know an accurate dose for young pups). Mouse health was assessed throughout the experiment through regular monitoring of body weight and general condition.

      Depending on genetic context, Lmx1b mutations can cause kidney disease and impact other systems. Non-ocular phenotypes were not the focus of this study and were not characterized.

      We added a comment to the method to clarify the NAM treatment timeline. NAM was administered continuously in the drinking water starting at P2 and maintained throughout the experiment. IOP was measured beginning at 2 months and then at monthly time points. NAM lessened IOP at 2 and 3 months. We terminated IOP assessment at 3 months.

      Lines 1028-1029: “Treatment was started at postnatal day 2 and continued throughout the experiment.”

      (8) While the IOP reduction observed in NAM-treated Lmx1bV265D/+ mice appears statistically significant, it is unclear whether this reflects meaningful biological protection. Several untreated mice exhibit very high IOP values, which may skew the analysis. The authors should report the mean values for IOP in both untreated and NAM-treated groups to clarify the magnitude and variability of the response.

      We have added supplemental table 7 with the statistical information. Regarding the high IOP values observed in a subset of untreated V265D mutant mice, we consistently detect individual mutant eyes with IOPs exceeding 30 mmHg across independent cohorts and time points [3-5]. It is important to note that IOP is subject to fluctuation and in disease states such as glaucoma, circadian rhythms can be disrupted with stochastic and episodic IOP spikes throughout the day. This may be occurring in those untreated mice. This is also why we strive to use sample sizes of 40 or more. Additionally, we observe that some mutant eyes with IOPs measured within the normal range have anterior chamber deepening (ACD) - a persistent anatomical change associated with sustained or recurrent high IOP that stretches the cornea and may posteriorly displace the lens. This suggests mutant mice experience transient IOP elevations that are not always captured at a single time point due to the stochastic nature of these fluctuations. To account for this, we include ACD as an additional readout alongside IOP measurements. The reduction in ACD observed in NAM-treated mice provides independent evidence supporting the biological relevance of NAM-mediated IOP reduction.   

      (9) Additionally, since NAM has been shown to protect RGCs in other glaucoma models directly, the authors should assess whether RGCs are preserved in NAM-treated Lmx1b V265D/+ mice. Demonstrating RGC protection would support a synergistic effect of NAM through both IOP reduction and direct neuroprotection, strengthening the translational relevance of the treatment.

      We again thank the referee. We note the possibility of dual IOP protection and neuroprotection in the manuscript (lines 961–963). The goal of the present study, however, was to determine mechanisms underlying IOP elevation in patients with LMX1B variants. Therefore, we limited our focus to IOP elevation (LMX1B is expressed in the TM but not RGCs). Studies of the RGCs and optic nerve in V265D mutant mice treated with NAM take considerable effort but are underway. They will be reported in a subsequent manuscript. Initial data support protection, but that is a work in progress.  

      Additionally, we recently reported a similar pattern of IOP protection to that reported here using pyruvate - in experiments where we analyzed the optic nerve as the focus of the study was assessment of pyruvate as a resilience factor against high genetic risk of glaucoma [4]. In that case, there was statistically significant protection from glaucomatous optic nerve damage, arguing for translational relevance again with a possible synergistic effect through both IOP reduction and direct neuroprotection.

      (10) Can the authors add any other functional validation studies to explore to understand the pathways enriched in all the subtypes of TM1, TM2, and TM3 cells, in addition to the ICH/IF/RNAscope validation?

      We agree with the reviewer on the importance of further functional validation of pathways active in TM cell subtypes that influence IOP. However, comprehensive investigation of the pathways active in subtypes need to be in future studies. It is beyond the scope of his already large paper.

      (11) The authors should include a representative image of the limbal dissection. While Figure S1 provides a schematic, mouse eyes are very small, and dissecting unfixed limbal tissue is technically challenging. It is also difficult to reconcile the claim that the majority of cells in the limbal region are TM and endothelium. As shown in Figure S6, DAPI staining suggests a much higher abundance of scleral cells compared to TM cells within the limbal strip. Additional clarification or visual evidence would help validate the dissection strategy and cellular composition of the captured region.

      We appreciate the reviewer’s suggestion and have added additional images to Figure S1 to show our limbal strip dissection. However, we clarify that we do not intend to suggest that TM and endothelial cells are the most abundant populations in these dissected strips.  When we say “are enriched for drainage tissues” we mean in comparison to dissecting the anterior segment as a whole. We have clarified this in the text. In fact, epithelial cells (primarily from the cornea) constituted the largest cluster in our dataset (Figure 1A). Additionally, to avoid misinterpretation, we generally refrain from drawing conclusions about the relative abundance of cell types based on sequencing data. Single-cell and single nucleus RNA sequencing results are sensitive to technical factors that alter cell proportions depending on exact methodological details. In our study, TM cells comprised 24.4% of the single-cell dataset and 11.8% of the single-nucleus dataset, illustrating the impact of methodological variability. 

      Lines 163-164: “Individual eyes were dissected to isolate a strip of limbal tissue, which is enriched for TM cells in comparison to dissecting the anterior segment as a whole.”

      Reviewer #1 (Recommendations for the authors):

      To enhance the reproducibility and transparency of the findings presented in this study, we strongly recommend that the authors make all analysis scripts and computational tools publicly available.

      We agree with the reviewer’s emphasis on transparency and are currently building a GitHub page to share our scripts. However, we did not develop any new tools for this study. All tools that we used are publicly available and provided in our methods section. All data will be available as raw data and through the Broad Institute’s Single Cell Portal.

      Reviewer #2 (Recommendations for the authors):

      The authors are to be commended for a well-written presentation of high-quality data, their comparisons of datasets (other mouse and human scRNAseq data), correlation with clinical glaucoma risk alleles, and curative therapy for the mouse model of Lmx1b glaucoma. There are several minor suggestions that the authors might consider to further improve their manuscript:

      (1) Lines 42-43: Although their data strongly support the role of mitochondrial dysfunction in Lmx1b glaucoma, they might want to soften their conclusion "supports a primary role of mitochondrial dysfunction within TM3 cells initiating the IOP elevation that causes glaucoma".

      With the inclusion of EM data supporting mitochondrial dysfunction in Lmx1b mutant TM cells, we have revised this sentence to more accurately reflect our findings.

      Lines 42-44 (previously lines 42-43): “Mitochondria in TM cells of V265D/+ mice are swollen with a reduced cristae area, further supporting a role for mitochondrial dysfunction in the initiation of IOP elevation in these mice.”

      (2) Figure 1: Why is the shape of the "TM containing" cluster in 1A so different than the cluster shown in 1B?

      We isolated cells from the 'TM-containing' cluster and performed unbiased reclustering, which alters their positioning in UMAP space. The figure legend has been updated to clarify this point.

      Lines 143-144 “A separate UMAP representation of the trabecular meshwork (TM) containing cluster following subclustering.”

      (3) Line 160: change "data was" to "data were"

      Corrected

      (4) S4 Fig C: Please comment on why the Columbia and Duke heatmaps for TM3 are not as congruent as the heatmaps for TM1 and TM2.

      We cannot definitively determine the reason for this. However, differences in tissue processing techniques between the Columbia and Duke preparations may contribute. Such variations have been shown to affect cellular transcriptomes in certain contexts. It is possible that TM3 cells are more susceptible to these effects than others. We have added a statement addressing this point to the figure legend.

      Lines 238-240: “Because tissue processing techniques can alter gene expression [52], the heatmap variation between institutes likely reflects differences in processing techniques (Methods) and suggests that TM3 cells are more susceptible to these effects than other cell types.”

      (5) S9 Fig: It is very difficult to see any staining for TM1 CHIL1 (2nd panel), TM2 End3 (2nd panel), and TM3 Lypd1 (both panels)

      We apologize for the difficulty in visualizing these panels. To improve clarity, we have increased the brightness of all relevant marker signals, within standard bounds, to facilitate easier interpretation.

      (6) Line 380: "are significantly higher"; since statistical analysis was not reported, please do not use "significantly"

      Done

      (7) The authors should consider discussing several of their findings that agree with published literature. For example:

      Figure 3B: "Wnt protein binding" (PMID: 18274669), "TGFb "binding" (numerous references), "integrin binding" (work of Donna Peters), "actin binding"/"actin filament binding"/"actin filament bundle" (CLANs references)

      S10 Fig c: "ossification" (work of Torretta Borres)

      S11 Fig A: ID2/ID3 (PMID: 33938911); (B) BMP4 (PMID: 17325163)

      S12 Fig A: MYOC in TM1 cells (numerous references)

      We appreciate the reviewer’s diligent review and comments regarding these pathways. We have added a comment to the discussion regarding the agreement of these pathways.

      Lines 855-858: In addition, the expression of genes that we document generally agrees with the literature. For example, the following genes and signaling molecules have been reported in TM cells, WNT signaling [78], TGF-β signaling [79-85], integrin binding [86-88], actin cytoskeletal networks [89], calcification genes [90, 91], and Myocilin [91-94].

      (8) Line 541: was confocal microscopy used to measure the "3D shapes" of nuclei or was this done with a single image to determine sphericity?

      This analysis was performed using confocal microscopy and 3D reconstructed models of the TM nuclei. We have added text to clarify this in the figure legend 

      Lines 553-556: “To rigorously assess whether TM1 nuclei are more spherical, we analyzed their reconstructed 3D shapes from whole mounts images by confocal microscopy, comparing them to TM3 nuclei using the ‘Sphericity’ tool in Imaris.”

      (9) Line 545: please add a close parentheses after "scoring 1"

      Done

      (10) S15 Fig: (A) There does not appear to be "good agreement" (line 653) between the datasets for TM1. (C) please provide a better explanation on how to interpret these "Confusion Matrix" results.

      We understand the referee's concern, the patterns likely appear different to the referee due to limited sampling in snRNA-seq data. Based on our results, TM1 seems particularly susceptible, possibly because these cells do not tolerate the isolation process as well. Although we are confident that TM1 shows good agreement between the two techniques based on our experience, we have revised the language in the text to “generally” to reflect this nuance.

      Lines 633-635 (previously line 653): The generated clusters and their marker genes generally agreed with our scRNA-seq analyses (Fig 5A-B, S15A Fig).

      We have also added additional clarification for how to interpret the Confusion Matrix. 

      Lines 669-672: “Colors indicate the fraction of cells identified in each ATAC cluster (row) which are also identified in each RNA cell type (columns), where darker colors represent stronger correspondence between RNA and ATAC clusters.”

      (11) Line 676: The transition from discussing the sc/snRNAseq data to the work in Lmx1b mutant mice is quite abrupt and could use a better transition to introduce this metabolism work.

      We have revised this transition for improved flow but prefer to keep all transitions brief due to the paper's length.

      Lines 691-694 (previously line 676): To evaluate the utility of our new TM cell atlas, we used it to examine how Lmx1b mutations affect the TM cell transcriptome and to identify potential mechanisms underlying IOP elevation. We selected LMX1B because it causes IOP elevation and glaucoma in humans and was identified as a highly active transcription factor in our TM cell dataset.

      (12) Lines 696-697: It appears counter-intuitive that upregulation of ubiquitin pathways would lead to proteostasis (proteosome protein degradation requires ubiquination).

      We have clarified that the protein tagging pathway was significantly upregulated. However, polyubiquitin precursor itself was downregulated. In general, the statistical significance of the protein tagging pathway suggests perturbation of the system tagging proteins for degradation. We have clarified this in the text. 

      Lines 711-714 (previously lines 696-697): “In addition, mutant TM3 cells showed an upregulation of protein tagging genes. However, there is a downregulation of the polyubiquitin precursor gene (Ubb, P = 4.5E-30), indicating a general dysregulation of pathways that tag proteins for degradation.”

      (13) Line 715: Please justify why "perturbed metabolism" was chosen to pursue vs the other differentially expressed pathways

      We chose to narrow our focus on TM3 cells because of the enrichment for Lmx1b expression.Most pathways identified in our analysis of TM3 cells implicate mitochondrial metabolism.Therefore, we chose to further explore this avenue. We clarified that perturbed metabolism was the strongest gene expression signature in the text. 

      Lines 753-754 (previously line 715): “Our findings most strongly implicate perturbed metabolism within TM3 cells as responsible for IOP elevation in an Lmx1b glaucoma model.”

      (14) Line 759: The authors clearly demonstrate that Lmx1b is most expressed in TM3 cells; however, they did not demonstrate that "Lmx1b was most active"

      ATAC analysis showed that Lmx1b was most active in TM cells overall. We inferred its activity in TM3 because Lmx1b is most enriched in that subtype. This has been clarified in the text.

      Lines 799-800 (previously line 759): “More specifically, we demonstrate that Lmx1b is the most active TM cell TF and is enriched in TM3 cells,…”

      (15) Lines 830-835: Please include references documenting increased TGFβ2 concentrations in POAG aqueous humor and TM, effects of TGFβ2 on TM ECM deposition, and TGFβ2 induced ocular hypertension ex vivo and in vivo.

      Done.

      (16) Line 875: The authors provide no direct evidence for enhances "oxidative stress" in Lmx1b TM3 cells

      The mitochondrial abnormalities and changed pathways support oxidative stress, but we have not directly tested this. Experiments are currently underway to evaluate its role, but these additional analyses are beyond the scope of this paper. We removed oxidative stress from the sentence.

      Lines 920-922 (previously line 875): “Importantly, in heterozygous mutant V265D/+ mice, TM3 cells had pronounced gene expression changes that implicate mitochondrial dysfunction, but that were absent or much lower in other cells including TM1 and TM2.”

      (17) Line 880: Similarly, the authors have not directly assessed effects on metabolism in TM3 cells; they only have shown changes in the expression of mitochondrial genes that may affect metabolism

      We have no way to specifically isolating TM3 cells to test this. Future work is underway to test this more broadly in isolated TM cells but is beyond the scope of this is already large paper. Considering our gene expression data and the addition of supporting EM data, we have qualified the text.

      Lines 930-931 (previously 880): “Our data extend these published findings by showing that inheritance of a single dominant mutation in Lmx1b similarly affects mitochondria in TM cells.”

      (18) Line 892: What markers were used to detect "cell stress"?

      We have revised the text. Although our RNA data show stress gene changes, characterization of these markers is beyond the scope of the current study and will be included in a subsequent paper.

      Lines 945-948 (previously line 892): “However, these processes were not limited to TM3 cells or even to cell types that express detectable Lmx1b, suggesting that they are secondary damaging processes that are subsequent to the initiating, Lmx1b-induced perturbations in TM3 cells.”

      Additional author driven change

      While revising and reviewing our data, we identified a coding error that resulted in the WT and V265D mutant group labels being switched in Figure 6. Importantly, the significance of the differentially expressed genes (DEGs), the implicated biological pathways, and the interpretation of pathway directionality in the manuscript remain accurate. The only issue was the incorrect labeling in the figure. We have corrected the labels in Figure 6 to accurately reflect the data. As noted above, all data and code will be made available to ensure full reproducibility of our results.

      References

      (1) Doucet-Beaupre H, Gilbert C, Profes MS, Chabrat A, Pacelli C, Giguere N, et al. Lmx1a and Lmx1b regulate mitochondrial functions and survival of adult midbrain dopaminergic neurons. Proc Natl Acad Sci U S A. 2016;113(30):E4387-96. Epub 2016/07/14. doi: 10.1073/pnas.1520387113. PubMed PMID: 27407143; PubMed Central PMCID: PMCPMC4968767.

      (2) Jimenez-Moreno N, Kollareddy M, Stathakos P, Moss JJ, Anton Z, Shoemark DK, et al. ATG8-dependent LMX1B-autophagy crosstalk shapes human midbrain dopaminergic neuronal resilience. J Cell Biol. 2023;222(5). Epub 2023/04/05. doi: 10.1083/jcb.201910133. PubMed PMID: 37014324; PubMed Central PMCID: PMCPMC10075225.

      (3) Cross SH, Macalinao DG, McKie L, Rose L, Kearney AL, Rainger J, et al. A dominantnegative mutation of mouse Lmx1b causes glaucoma and is semi-lethal via LDB1mediated dimerization [corrected]. PLoS Genet. 2014;10(5):e1004359. Epub 2014/05/09. doi: 10.1371/journal.pgen.1004359. PubMed PMID: 24809698; PubMed Central PMCID: PMCPMC4014447.

      (4) Li K, Tolman N, Segre AV, Stuart KV, Zeleznik OA, Vallabh NA, et al. Pyruvate and related energetic metabolites modulate resilience against high genetic risk for glaucoma. Elife. 2025;14. Epub 2025/04/24. doi: 10.7554/eLife.105576. PubMed PMID: 40272416; PubMed Central PMCID: PMCPMC12021409.

      (5) Tolman NG, Balasubramanian R, Macalinao DG, Kearney AL, MacNicoll KH, Montgomery CL, et al. Genetic background modifies vulnerability to glaucoma-related phenotypes in Lmx1b mutant mice. Dis Model Mech. 2021;14(2). Epub 2021/01/20. doi: 10.1242/dmm.046953. PubMed PMID: 33462143; PubMed Central PMCID: PMCPMC7903917.

    1. Author response:

      Below we outline our provisional responses to the major points raised in the public reviews, and our planned revisions:

      (1) Mechanistic model of how ZDHHC18/MARCH8 engage the cGAS–DNA condensate (Reviewer #1 & #2

      We will add a dedicated subsection and a working-model figure describing our current view: IDRs of ZDHHC18 (Golgi) and MARCH8 (endosomes) engage pre-formed cGAS–DNA condensates at organelle membranes, and thereby tune cGAS activity through PTMs. We will explicitly discuss bridge-like versus allosteric modes by perform additional LLPS experiment (e.g. FRAP assay) to detect any IDR-driven changes in condensate properties, and explain how these scenarios fit our data.

      (2) Selectivity beyond ZDHHC18/MARCH8 (Reviewer #1)

      We will expand the text to explain existing evidence indicating that, in addition to ZDHHC18 or MARCH8, other post-translational modification (PTM) enzymes and/or membrane-associated scaffolds may also modulate cGAS. We will summarize our current datasets that support this possibility and outline how this selectivity relates to organelle identity.

      (3) Why membrane association suppresses cGAS activity (Reviewer #1)

      We will provide a concise mechanistic rationale—integrating our published work—to explain how membrane-proximal sequestration can limit cGAS catalysis despite cGAS–DNA coexistence within condensates. Specifically, we will discuss (i) IDR-dependent changes in condensate properties, and (ii) PTMs by ZDHHC18/MARCH8 that allosterically reduce catalytic efficiency; we will clearly cross-reference our prior publications that bear on these points.

      (4) Reconciling Fig. S7 (DNA-dependent binding) with Fig. 5 (recruitment to IDR droplets) (Reviewer #2)

      We will add text to clarify experimental context and readouts to prove that there is no real contradiction between Fig. S7 and Fig. 5. In the experiment shown in Fig. 5, PEG (a macromolecular crowding agent) was added to the system, which facilitates the formation of IDR phase-separated droplets. Under these conditions, cGAS partitions into the IDR condensates, leading to the observed recruitment. In contrast, Fig. S7 examines the direct physical interaction between cGAS and the IDRs using biochemical pull-down assays and shows that no direct interaction occurs in the absence of DNA. These two results reflect different experimental contexts and are therefore not mutually exclusive.

      (5) Planned additional tests to address specificity and mechanism (Reviewer #2)

      DNA pull-down: to test whether IDRs alter cGAS–DNA affinity, we will compare cGAS binding to DNA with/without MEMCA IDRs (and with charged-residue mutants).

      Domain mapping: to determine which region of cGAS engages MEMCA IDRs, we will map binding using cGAS N-terminus/core-domain truncations and key surface mutants.

      Physiological in vitro LLPS: we will repeat cGAS–DNA–IDR LLPS assays under physiological buffer conditions and report partition coefficients, FRAP, and phase diagrams to ensure physiological relevance.

      (6) Image clarity and data presentation (Reviewer #2):

      We will improve image resolution, add zoomed-in insets with organelle markers, and provide more significant Cy5-ISD signal.

      (7) Nuclear localization of cGAS and system considerations (Reviewer #3)

      We will explicitly document the nuclear signal of cGAS observed in our confocal experiments, detail the cell lines and expression systems used. We will also clarify cGAS nuclear localization in the cell lines used.

      (8) Endogenous validation and cell line consistency (Reviewer #3):

      We will perform experiments in primary cells (knockout macrophages) to address the concern of relying on overexpression.

      (9) Language and grammar (Reviewer #3):

      We will thoroughly revise the manuscript for grammar and clarity.

      Together, these planned revisions will strengthen the mechanistic basis of our findings and provide direct evidence for the physiological role of organelle-tethered IDRs in regulating cGAS activity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.

      We thank the reviewer for this positive comment. We fully agree, as our fragmentation experiments on division-formed colonies clearly demonstrate their strong mechanical resistance in naturally occurring flows.

      (2) The authors then claim that the fragmentation of aggregates due to fluid flows occurs through erosion of small pieces. Because their experimental setup does not allow them to explicitly observe this process (for example, by watching one aggregate break into pieces), they implement an idealized model to show that the nature of the changes to the size histogram agrees with an erosion process. However, in Figure 2C there is a noticeable gap between their experiment and the prediction of their model. Additionally, in a similar experiment shown in Figure S6, the experiment cannot distinguish between an idealized erosion model and an alternative, an idealized binary fission model where aggregates split into equal halves. For these reasons, this claim is weakened.

      The two idealized models of colony fragmentation, namely erosion of single cells and fragmentation into equal sizes (or binary fission), lead to distinguishable final size distributions. We believe that our experiments for division-formed colonies support the hypothesis of the erosion mechanism. Specifically, Figure 2E shows that colony fragmentation resulted in a decrease of large colonies and a strong increase of single cells and dimers (two cells). In our view, the strong increase of single cells and dimers provides quite convincing (but indirect) evidence supporting the erosion mechanism. This is described on lines 112-121. To further address the reviewer’s concern, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between these two fragmentation models for large division-formed colonies fragmented at a high dissipation rate of ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. Furthermore, we have included the new Supplementary Figure S9, which details the model predictions for the colony size distribution at various time points.

      The ideal equal fragments model (i.e., where every fracture event produces two identical fragments with half the original biovolume) does not capture the biovolume transfer from large colonies to single cells, as observed for the experimental results in panel D of Figure 2 and panel E of Figure S9. In contrast, the erosion model, in panel D of Figure 2 and panel D of Figure S9, provides a good prediction of the experimental results within the experimental uncertainty. The different fragmentation models are discussed in lines 226-228 of the revised manuscript and lines 865-873 of the SI.

      (3) Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by and large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested.

      We thank the reviewer for this positive comment. We agree that our experimental results show clear evidence that aggregated colonies have a weaker structure in comparison to division-formed colonies, thus supporting the hypothesis that clonal expansion is the main mechanism for colony formation under most natural settings. The range of energy dissipation rates of our experimental setup covers almost entirely the region for which aggregated and division-formed colonies differ in their fragmentation behavior (Zone III of Figure 5). Within this zone, aggregated colonies are fragmented and only the division-formed colonies are able to withstand the hydrodynamic stresses. Furthermore, we show that this fragmentation behavior has a low sensitivity to the total biovolume fraction, as displayed in the Supplementary Figures S2 and S4 and discussed in lines 151-154 and 160-163. We agree that our cone-and-plate setup covers a limited parameter range, and we have added a detailed discussion of these limitations in the revised manuscript, under section Materials and Methods in lines 462-473.

      (4) The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful and represent a significant conceptual advance. However, there is a region of this phase map that the authors have left untested experimentally. The lowest energy dissipation rate that the authors tested in their experiment seemed to be \dot{epsilon}~1e-2 [m^2/s^3], and the highest particle concentration they tested was 5e-4, which means that the authors never tested Zone II of their phase map. Since this seems to be an important zone for toxic blooms (i.e. the "scum formation" zone), it seems the authors have missed an important opportunity to investigate this regime of high particle concentrations and relatively weak turbulent mixing.

      We agree with the reviewer that Zone (II) of Figure 5 is of great importance to dense bloom formation under wind mixing and that this parameter range was not covered by our experiments using a cone-and-plate shear flow. The measuring range of our device was motivated by engineering applications such as artificial mixing of eutrophic lakes using bubble plumes, as well as preliminary experiments which demonstrated that high levels of dissipation rate were required to achieve fragmentation. The range of dissipation rates that can be achieved by the cone-and-plate setup is limited at the lower end by the accumulation of colonies near the stagnation point at the conical tip and at the upper end by the spillage of fluid out of the chamber. We now discuss this measuring range in lines 462-473 of the revised manuscript.

      Although our setup does not cover Zone (II), we now refer to recent results in the literature for evidence of aggregation-dominance at Zone (II). The experimental study of Wu et al. (2024) (reference number 64 of the revised manuscript) investigated the formation of Microcystis surface scum layers in wind-mixed mesocosms. Their study identified aggregation of colonies in the scum layer, resulting in increases of colony size at rates faster than cell division. These results agree with our model, and the parameters range investigated fall within the Zone II. We have included in the revised version, lines 328-337, a detailed discussion elucidating the parameter range covered in our experiments and the findings of Wu et al. (2024).

      Other items that could use more clarity:

      (5) The authors rely heavily on size distributions to make the claims of their paper. Yet, how they generated those size distributions is not clearly shown in the text. Of primary concern, the authors used a correction function (Equation S1) to estimate the counts of different size classes in their image analysis pipeline. Yet, it is unclear how well this correction function actually performs, what kinds of errors it might produce, and how well it mapped to the calibration dataset the authors used to find the fit parameters.

      We agree with the reviewer that more details of the correction function should be included. We have included in the revised version of the Supporting Information, in lines 785-796, a more detailed explanation of the correction function. Furthermore, a direct comparison of raw and corrected histograms of the size distribution and its associated uncertainty is presented in the new Supplementary Figure S8.

      (6) Second, in their models they use a fractal dimension to estimate the number of cells in the group from the group radius, but the agreement between this fractal dimension fit and the data is not shown, so it is not clear how good an approximation this fractal dimension provides. This is especially important for their later derivation of the "aggregation-to-cell division" ratio (Equation 8)

      We agree with the reviewer that more details on the estimation of fractal dimension are needed. The revised version, under Materials and Methods in lines 508-515, now includes the detailed estimation procedure, the number of colonies analysed, and the associated uncertainty.

      Reviewer #1 (Recommendations For The Authors):

      In light of the weak evidence for claim #2 outlined above, I believe the paper would benefit from a more explicit comparison in Figure 2C of the two models - idealized erosion, and idealized binary fission. With such a comparison, the authors would have stronger footing to claim that one process is more important than the other.

      As mentioned in our answer above to comment #2 of public review, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between the erosion and equal fragments (binary fission) models for large division-formed colonies fragmented under ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. The comparison is further detailed in the new Supplementary Figure S9 for representative time points. Only the erosion models can recover the biovolume transfer from large colonies to single cells, as observed for the experimental results in Figure 2D and further detailed in Figure S9D. We believe that the revised version of Figure 2 and the new Supplementary Figure S9 provide strong evidence in support of the erosion fragmentation model.

      Would the authors comment on their chosen range of experimental dissipation rates? For instance, was their goal more to investigate industrial/engineering applications where the goal is to disrupt the cyanobacteria, but not really typical natural conditions under which the groups might form?

      The choice of experimental dissipation rates in our experiment was such that it covers engineering applications such as artificial mixing of eutrophic lakes using bubble plumes. We have now clarified in the Introduction, on lines 37-39, that artificial mixing has been successfully applied in several lakes to suppress cyanobacterial blooms. Furthermore, we have now clarified in the caption of Figure 5 that the bars on the right side indicate typical values of dissipation rates induced by natural wind-mixing, bubble plumes in artificially mixed lakes, and laboratory-scale experiments such as cone-and-plate systems and stirred tanks. The dissipation rates induced by the bubble plumes in artificially mixed lakes could potentially fragment aggregated cyanobacterial colonies and thus disrupt bloom formation. However, our preliminary experiments demonstrated that high levels of dissipation rate were required to achieve fragmentation, therefore we’ve focused on the upper range of values (0.01 to 10 m<sup>2</sup>/s<sup>3</sup>).

      The dissipation rates generated by the cone-and-plate approach are indeed higher than the dissipation rates under typical natural conditions in lakes. We have now added a detailed discussion of the range of dissipation rates generated by the cone-and-plate approach in the revised manuscript, under section Materials and Methods in lines 462-473, where we also explain that these values are higher than the natural dissipation rates generated by wind action in lakes. However, the more generic insights obtained by our study, shown in Figure 5, are relevant for dissipation rates of natural lakes (e.g., Zone II). Therefore, in our discussion of Figure 5 we have now included the recent findings of Wu et al. (2024) (reference number [64] of the revised manuscript), who studied bloom formation of Microcystis in mesocosm experiments at dissipation rates representative of natural conditions; see also our reply to the next comment.

      The authors should consider testing the space of Zone II on their phase map, for instance at very high particle concentrations and even lower rotational speeds, in order to show that their derivations match experiments.

      Good point. As mentioned in our answer above to comment #4 of the public review, Zone II lies beyond the measuring range of our experimental setup. Instead, we refer to the recent study of Wu et al. (2024) (reference number [64] of the revised manuscript) which demonstrated that dense scum layers of Microcystis colonies are aggregation-dominated. These mesocosm experiments agree with our model predictions and their parameter range falls within Zone II. We have included in the revised version, lines 328-337, a detailed discussion where we elucidate the parameter range covered in our experiments and compare our predictions for Zone II with the recent findings of Wu et al. (2024).

      The authors should show their calibration data and fit for the correction function of equation S1. Additionally, you may consider showing "raw" and "corrected" histograms of the size distribution, to demonstrate exactly what corrections are made.

      As mentioned in our answer above to comment #5 of the public review, we have included in the revised version of the Supporting Information the new Supplementary Figure S8, which shows the raw and adjusted histograms of the size distribution, including the associated uncertainties. Furthermore, the correction function is now explained in detail in the new Supporting Information Text in lines 785-796.

      The authors might consider commenting on Figure S3 a bit more in the main text. Even at very high dissipation rates, the cyanobacterial groups don't plummet to size 1, but stay in an equilibrium around 10-20x the diameter of a single cell. What might this mean for industrial applications trying to break up the groups?

      We agree with the reviewer that further discussion of Figure S3, panels E and F, is warranted. In the revised version of the manuscript, under section Fragmentation of Microcystis colonies occurs through erosion in lines 133-137, we have now included a discussion of this figure. Figure S3F shows that more than 90% of the total biovolume ends up in the category “small colonies” (mostly single cells and dimers); hence, most of the initially large colonies do fragment to single cells or dimers. Only about 5-10% of the biovolume remains as “large colonies” of 10-20 cells. Although it is challenging to draw definitive conclusions about the behavior of these remaining large colonies, as they account for only a minor fraction of the suspension, one hypothesis is that variability in mechanical properties between colonies results in a subset of colonies exhibiting exceptional resistance even to very high dissipation rates (see lines 133-137).

      Minor comments:

      Typo Caption of Figure 2: Should read [m^2/s^3] for units

      Thanks for catching this typo. The units in the caption of Figure 2 has been corrected to [m^2/s^3].

      There is no Equation 10 in Materials and Methods as indicated in the rheology section.

      We thank the reviewer for pointing out the lack of clarity in this algebraic manipulation. In fact, the yield stress has to be substituted in the current Equation 11 (previously Eq.10), from which the critical dissipation rate must be substituted in Equation 3. The result is the critical colony size (l* = 2.8) mentioned in line 243 of the revised manuscript. The correct equation numbers and algebraic substitutions are now indicated in lines 241-243 of the revised version of the manuscript.

      <Reviewer #2 (Public review):

      Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. As the importance of adhesion had been described elsewhere, it is not clear what this study reveals about cyanobacterial colonies that was not known before.

      We would like to emphasize several key findings that our study reveals about the impacts of fluid flow on cyanobacterial colonies:

      (I) Quantification of mechanical strength in cyanobacterial colonies: Our results demonstrate the high mechanical strength of cyanobacterial colonies, as evidenced by the requirement of high shear rates to achieve fragmentation. This is new knowledge, that was not known before for cyanobacterial colonies. To this end, our study highlights the resilience of these colonies against naturally occurring flows and bridges the gap between theoretical assumptions about colony strength and experimentally measured mechanical properties.

      (II) The discovery that the mechanical strength of colonies differs between colonies formed by cell division and colonies formed by aggregation. This is again new knowledge, that was not known before for cyanobacterial colonies.

      (III) Validation of a hypothesis regarding colony formation: Using a fluid-mechanical approach, we confirm the findings of recent genetic studies (references 25 and 67 of the revised version of the manuscript) which indicated that colony formation occurs predominantly via cell division rather than cell aggregation under natural conditions (except in very dense blooms).

      (IV) Practical guidelines for cyanobacterial bloom control: Our findings provide valuable insights into the design of artificial mixing systems applied in several lakes. Artificial mixing of lakes is based on fundamentals of fluid flow, aiming at preventing aggregation of buoyant cyanobacteria in scum layers at the water surface. Our results show that the dissipation rates generated by bubble blumes in artificially mixed lakes can fragment cyanobacterial colonies formed by aggregation, but are not intense enough to cause fragmentation of division-formed colonies (see Figure 5 and lines 348-360).

      The agreement between model and experiments is impressive, but the role of the fit parameters in achieving this agreement needs to be further clarified.

      The influence of the fit parameters (namely the stickiness α1 and the pairs of colony strength parameters S1,q1,S2,q2) is discussed in the sections Dynamical changes in colony size modelled by a two-category distribution in lines 247-253 and Materials and Methods in lines 559-565. We kept the discussion concise to maintain readability. However, we agree with the reviewer that additional details about the importance of the fit parameters and the sensitivity of the results to these parameters could be beneficial. In the revised version of the section Materials and Methods in lines 560-563, we have included a detailed discussion of the fit parameters.

      The article may not be very accessible for readers with a biology background. Overall, the presentation of the material can be improved by better describing their new method.

      We apologize for the limited readability of the description of the experimental setup and model used. In the revised version of the manuscript and the SI, we have detailed further the new methods presented here. The modifications include a detailed description of the operating range of the cone-and-plate shear setup (subsection Cone-and-plate shear of the section Materials and Methods, in lines 462-473). Furthermore, we think that incorporation of the recent experimental results of Wu et al. (2024), on lines 331-337 of the manuscript, will appeal to readers with a biology background. Their mesocosm experiments support our model prediction that aggregation is the dominant mechanism for colony formation in region (II) of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors seem too modest in claiming technological advance. They should describe the technological advance of combining microscopy with rheometry, in such a way that this invites others to apply this or similar approaches on biological samples. Even though I feel that the advancement of knowledge of this system by their method is relatively modest, there may be more advances in other systems.

      We appreciate the positive view of the reviewer towards the importance of this technology and we agree that its advantages should be advertised to researchers investigating similar systems. We have now given more attention to the technological advance of combining microscopic imaging with rheometry in the final paragraph of the Conclusions (lines 386400), where we now also briefly discuss an interesting recent study of marine snow (Song et al. 2023, Song and Rau 2022, reference numbers 70 and 71 of the revised manuscript), which used a similar combination of microscopy and rheometry as in our study. Furthermore, in the Methods section, we now briefly explain how the rheometry can be adjusted to investigate other systems (lines 474-480).

      (2) It seems reasonable -also based on what we already know about these aggregates - to assume that the main difference in shear sensitivity between field samples and cultures lies in the production of extracellular polysaccharide substance (EPS). To go beyond what is already known, the study could try to provide more direct and quantitative evidence for EPS involvement. For example, using a chemical quantification of EPS levels, or perturbing EPS levels using digestive enzymes.

      We agree with the reviewer that further characterization of the EPS is highly relevant to understand the mechanical strength of colonies. However, we believe that chemical quantification and/or degradation of EPS lies beyond the scope of our article and should be addressed by future studies.

      (3) Assuming EPS is indeed the reason for the differences in shear resistance: the authors speculate the reason why the field samples have more EPS lies in chemical composition (Calcium/nitrogen levels). In addition, there could be grazing that is known to promote aggregation (possibly increasing EPS), or just inherent genetic differences between strains. I am not necessarily expecting the authors to explore this direction experimentally, but it seems certainly feasible and would make the final result less speculative.

      We agree with the reviewer that there are more biotic and abiotic factors that can influence EPS amount and composition. The influence of grazing and other relevant factors on cell adhesion is discussed in references [26-29], cited in our introduction in lines 50-53. As discussed in our answer to recommendation #2, we believe that a quantitative investigation of these various factors is beyond the scope of this work and should be addressed in future studies.

      (4) A cool finding seems to be the critical relative diameter (Fig 2E), a colony size that seems invariant under shear. I was slightly surprised that the authors seem to take little effort to understand this critical diameter mechanistically (for example by predicting it, or experimentally perturbing it). Again, not a necessary requirement, but this is where the study could harness its technological advantage to provide a more quantitative understanding of something that goes beyond the existing knowledge of the system.

      We apologize to the reviewer if our descriptions and discussions of Figure 2 were unclear. One of the key conclusions from our experiments is that the critical relative diameter depends on the dissipation rate, as shown in Figure 2F. This dependence is also incorporated into the model through the constitutive equation (2). Furthermore, we expect the mechanical resistance of colonies, quantified by the critical relative diameter, to be affected by other biotic and abiotic factors that influence EPS amount and composition.

      (5) The jump from 0.019 to 1.1 m²/s³ seems large. What was the reason for not exploring intermediate values? The authors should also define low, modest and intense dissipation rates more clearly. Currently, they seem somewhat arbitrarily defined, i.e. 0.019 m²/s³ is described as low (methods) and moderate (results). In Fig 2, the authors further talk about low dissipation rates without a quantitative description.

      We thank the reviewer for pointing out the lack of clarity in the choice of parameter range and the nomenclature. Regarding the former, the suspension of division-formed colonies of Microcystis strain V163 displayed negligible fragmentation for dissipation rates between 0.019 to 1.1 m<sup>2</sup>/s<sup>3</sup>, as seen in Figures S2A and S3A. Due to the low sensitivity of the fragmentation results in this region, we don’t expect change in behavior for intermediate values. Regarding the nomenclature, we have corrected the inconsistencies throughout the text. We have chosen to name the dissipation rate values as: low for values typical of windmixing, moderate for values typical of the core of bubble plumes, and intense for values typical of propellers. Whenever mentioned in the text, the numerical value of dissipation rate is also included to avoid doubt.

      (6.) The structure and narrative of the paper can be improved. The article first describes all lab culture experiments and then the model, while the first figure already shows model fits. Perhaps it would be better to first describe the aggregation experiments, to constrain the appropriate terms of the model, and then move to fragmentation.

      We appreciate the recommendation of the reviewer regarding the structure. We have chosen to describe first the fragmentation experiments (Fig. 2), as these can be understood without introducing the aggregation effects. In contrast, the steady state results in the aggregation experiments (Fig. 3) come from the balance between aggregation and fragmentation. Therefore, we judged the current order to be more appropriate. The model fits are combined with the experimental results in Figures 2 and 3 to have a concise display. We have ensured that all the concepts required to understand each figure panel are explained prior to their discussion.

      (7) The number of data points that go into the histogram needs to be indicated. The main reason is that the authors report the distribution in terms of the biovolume fraction, suggesting the numerical counts are converted into volume. This to me seems like the most sensible parameter, but I could not find how this conversion is calculated (my apologies if I missed it). This seems especially relevant because a single large colony can impact this histogram quite considerably.

      We apologize for the lack of clarity in the calibration and conversion steps of the size distribution. As discussed above in the answer to comment #5 of the reviewer #1, more details of the calibration process have been added to the revised version of the Supporting Information Text in lines 785-796. Furthermore, the new Supplementary Figure S8 presents examples of the raw and adjusted size distribution, including the total number of counted colonies per histogram and the associated uncertainties in the concentration and biovolume distributions.

      (8) Over the timescales measured here, colonies could start sinking (or floating), possibly in a size-dependent manner, that could lead to a bias due to boundary effects. Did the authors consider this potential artifact?

      The sinking or floating of colonies is a relevant process which was taken into account in the choice of our parameter range for the dissipation rate. The minimum dissipation rate used in our experiments ensures that the upward inertial velocity near stagnation is sufficient to counteract the sedimentation of colonies. A detailed discussion of the choice of the parameter range is now included in the revised version of the Materials and Methods in lines 462-473.

      (9) "On the one hand, sequencing of the genetic diversity within Microcystis colonies supports the hypothesis that colony formation undernatural conditions is primarily driven by cell division [25]. On the other hand, cell aggregation can occur on a shorter time scale and may offer improved protection against high grazing pressure [26]." This appears somewhat constructed, as what is described as "on the other hand" is not evidence against the genetic diversity.

      We agree that the suggested dichotomy in this text appeared somewhat constructed, and we have now removed the wording “on the one hand” and “on the other hand”. The studies from reference [25] demonstrated that the genetic diversity between independent Microcystis colonies is much greater than the diversity within colonies. If cell aggregation was the dominant mechanism, a similar genetic diversity would be observed between and within colonies, which contrasts the findings from reference [25]. We have adjusted the text in the revised manuscript, in lines 46-54, to clarify this point.

      (10) The phase diagram seems largely based on extrapolations that are made outside of the measurement regime (e.g. dark red bars indicating the dissipation rate, Fig 5 - by the way 1 this color scheme could use some better contrast, by the way 2 Fig S7 suggests a wider dissipation rate range as indicated in Fig 5, why?). Hence there seems to be the need to more clearly lineate experimental results, simulations, and extrapolations in the phase diagram.

      We agree with the reviewer that further clarifications should be given about the parameter range covered in our experiments and apologize for the lack of readability in the color scheme of Fig 5. In lines 329-337, 346-347, 353-355, we have highlighted the parameters range covered by our experiments as well as the range covered by previous studies of windmixed mesocosm (namely reference [64] of the revised manuscript). Regarding the color scheme of Figure 5, we have modified the legend of the figure to improve readability. The color contrast was increased and leader lines were added to connect the colored bars with the respective label.

      (11) Unfortunately, the manuscript did not contain line numbers.

      We apologize to the reviewer for the lack of line numbers in our initial version. The revised version of the manuscript now contains line numbers, both in the main text and the supporting information.

      (12) Fig 2D. Caption is too minimal. Y-axis could better be named "Fraction of colonies" as both small and large colonies are plotted.

      The caption for Figure 2D was extended to better describe the plot. We have kept the y-axis label as “Fraction of small colonies”, since this is the quantity displayed by the three curves in the plot.

      (13) An inset should have axis labels.

      All the insets in our plots display the same variables as their respective plots. In order to keep the plots light and preserve readability, we therefore prefer to present the axis labels only along the x-axis and y-axis of the main plots, which implies by convention that the same axis labels also apply to the insets. To the best of our knowledge, this is a common approach.

      (14) Page 5, first words. Likely Fig 3A, not 2A was meant.

      We thank the reviewer for pointing out this readability issue. We intend to compare both Figures 2A and 3A. The text of the revised manuscript, in lines 146-148, has been adjusted with the correct figure numbers.

      (15) Introduction, second last paragraph, third last line. "suspension leaded to a broad distribution" I assume you meant "... led to a ..."

      We thank the reviewer for pointing out this typo. It has been corrected (line 122).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for positive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which helps us improve our manuscript. 

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable. 

      We understand your concern about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. Similar approach was widely used to estimate how biogeographic barriers facilitated the divergent vertebrate communities across the world  (e.g., Williams et al. 2024). We agree that such an approach must be used carefully. In the revision, we have explicitly clarified why this counterfactual comparison is useful – namely it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths (Gilbert and Lambert 2010, Sanmartín 2012, Bull et al. 2021). We acknowledge that the counterfactual results are theoretical and have explicitly emphasised the assumptions involved (i.e., species–environment relationships hold between pre- and post- lift environments) in the main text (Lines 91- 98). Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). 

      References:

      Bull, J. W., N. Strange, R. J. Smith, and A. Gordon. 2021. Reconciling multiple counterfactuals when evaluating biodiversity conservation impact in social-ecological systems. Conservation Biology 35:510-521.

      Gilbert, D., and D. Lambert. 2010. Counterfactual geographies: worlds that might have been. Journal of Historical Geography 36:245-252.

      Sanmartín, I. 2012. Historical Biogeography: Evolution in Time and Space. Evolution: Education and Outreach 5:555-568.

      Williams, P. J., E. F. Zipkin, and J. F. Brodie. 2024. Deep biogeographic barriers explain divergent global vertebrate communities. Nature Communications 15:2457.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      We appreciate the reviewer’s comment here. We would like to clarify that our conclusions regarding longitudinal shifts in migratory distributions are based on distribution models derived from eBird data of 50 species, not merely on migration tracks from seven species. These species-level spatiotemporal models allow us to infer large-scale biogeographic patterns across the Qinghai-Tibet Plateau (QTP).

      The original seven tracking species were used specifically for analysing the relationship between migration directions (azimuths) and environmental variables, offering independent support for the patterns revealed in the eBird-based distribution models. Recognising the reviewer’s concern on sample size and coverage, we have now expanded this part by incorporating migration tracks from 12 additional species, derived through georeferenced digitisation of published migratory maps. Importantly, this expansion did not change our conclusions, i.e., the monsoons instead of the high elevations act as a prominent role in shaping the current migration direction of birds in the QTP. While the overall conclusion remains unchanged, the expanded dataset led to slight changes in difference between spring and autumn migration. We have updated the Figure 2 and the corresponding results and conclusions throughout the manuscript. We have also clarified in the Discussion that regions of the QTP with relatively less data might lead to underestimation of some migration routes to make sure readers are aware of these data limitations (Lines 211-218).

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we have clearly stated the assumptions of niche conservatism in the Introduction (Lines 91-98).

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study. 

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. Our objective is not to determine one-to-one migratory links between specific populations, but to identify general broad-scale directional shifts when birds cross the QTP during their migration. We regret any confusion caused by our earlier wording. To make this clearer, we have now emphasised that our interests focus on the migratory direction and their environmental correlates, rather than population assignments. We have also rephrased the relevant text to explicitly clarify that our study operates at the species level and at large spatial scales (Lines 253–257). We exemplify how distribution of eBird observations and GPS tracking data of four species can be different from each other whilst showing similar migration patterns (Figure S10). We have also explicitly stated in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis could only suggest plausible routes and region-to-region linkages (Lines 200-202).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      We thank the reviewer’s honest assessment and understand the concern regarding the scope of our contribution. Our intention was not to provide an exhaustive account of all aspects of the QTP as a migratory barrier, but to address a specific and underexplored question: how the uplift of the plateau and the resulting monsoon system may have influenced the orientation of avian migration routes. By integrating both satellite tracking and community-contributed data, we have explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift have shaped migratory patterns of birds. We have also discussed the study’s limitations – including the small number of tracking species (Lines 205218), the use of occurrence data as a proxy for breeding and wintering regions (Lines 200-202), the uneven sampling coverage in the QTP (Lines 202-205) and the assumptions behind the counterfactual scenario (Lines 91-98). This ensures that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We thank the reviewer for this suggestion. We agree that radar holds promise for understanding certain aspects of bird migration, particularly for detecting flight intensity, altitudes, and timing. However, the radar systems are currently challenging to resolve migration at the level of species, populations, or individuals, which are central to questions of migratory connectivity and route selection. Most radar signals cannot distinguish between species in mixed flocks, nor can they link breeding and wintering sites for tracked individuals. In addition, the spatial coverage of radar installations remains limited, especially across remote and high-elevation regions like the Qinghai-Tibet Plateau, where infrastructure and continuous power supply are still logistically prohibitive. 

      The eBird dataset used in our study is itself a form of field-based observation, contributed by tens of thousands of birdwatchers across continents, including the QTP region (Figure S11). While eBird cannot provide individual-level tracking, it captures spatiotemporal patterns of occurrence at broad scales, making it a valuable complement to satellite tracking data. We would also emphasis that our team has extensive field experience in the Qinghai-Tibet Plateau (about twenty years), including multi-year expeditions to deploy satellite tags and observe migration at stopover sites. 

      We agree that more direct tracking (e.g. GPS tagging) would be an ideal way to validate migration pathways and population connectivity. Using the satellite-tracking data, we have showed that most tracking species shifted their migration direction when facing the QTP (Figure S6). In this revision, as stated we managed to add a number of 12 more species with satellite tracking routes. We have also noted that future studies should build on our findings by using dedicated tracking of more individual birds and monitoring of migration over the QTP. We have cited recent advances in these techniques and suggested that incorporating more tracking data could further test the hypotheses generated by our work (Lines 205-218).

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We used this sentence to introduce migration. We have removed “important” to reduce ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We stated that this claim is invoked for long-distance migrants before this sentence and have rewritten the sentence to highlight that this interpretation is for long-distance migrants. 

      L 158 what is a migration circle? I do not know such a term.

      We have amended it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We thank the reviewer for raising this important concern. We have presented this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on endogenous reserved energy gained prior to reproduction — rather than an ‘income’ strategy where birds ingest nutrients mainly collected during the period of reproductive activity. This collaborates with studies on breeding strategies of migratory birds in Asian flyways. However, we note that this interpretation would require further study.” By adding this caution, we made it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We have also doublechecked that the rest of the discussion around this point is framed appropriately. Moreover, to help illustrate why we raised this ecological interpretation, we would also draw attention to examples of satellite tracking points from several species (e.g., Beijing Swift, Demoiselle Crane) in the following, which show obvious shifts in migratory direction near the QTP region. These turning points suggest potential behavioral responses to environmental constraints, such as climatic corridors or energy availability, which could help motivate our discussion of possible capital breeding strategies in these species.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we have expanded the motivation and description of the theoretical model, specifically insisting on the experimental evidence supporting its rationale and assumptions. These changes in the revised manuscript are implemented in the two first paragraphs of Section “Theoretical model” and in a more detailed description and justification of the different mathematical terms that appear in that section. We have made an effort to map in our narrative different terms to mechanistic processes in the actomyosin network. Even if the nature of the manuscript is inevitably theoretical, we think that the revised manuscript will be more accessible to a broader spectrum of readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      (A) This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We understand the point of the referee. While it is unavoidable to present the continuum hydrodynamic theory behind our results, we have made an effort in the revised manuscript to (1) motivate the essential features required from a theoretical model of the actomyosin cytoskeleton capable of describing its nematic self organization (two first paragraphs of Section “Theoretical model”), and to (2) explicitly explain the physical meaning of each of the mathematical terms in the theory, and when appropriate, relate them to molecular mechanisms in the cytoskeleton. We hope that the revised manuscript addresses the concern of the referee.

      Regarding the comparison with experiments, they are indeed qualitative because the main point of the paper is to establish a physical basis for the self-organization of dense nematic structures in actomyosin gels. Somewhat surprisingly, we argue that a compelling mechanism explaining the tendency of actomyosin gels to form patterns of dense nematic bundles has been lacking. As we review in the introduction, these patterns are qualitatively diverse across cell types and organisms in terms of geometry and dynamics, and for this reason, our goal is to show that the same material in different parameter regimes can exhibit such qualitative diversity. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured and are expected to vary wildly between cell types. In fact, estimates in the literature often rely on comparison with hydrodynamic models such as ours. For this reason, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. Second, the patterns of nematic bundles found across cell types depend on the interaction between (1) the intrinsic tendency of actomyosin gels to form such structures studied here and (2) other elements of the cellular context. For instance, polymerization and retrograde flow from the lamellipodium, the physical barrier of the nucleus, and the interaction with the focal adhesion machinery are essential to understand the emergence of stress fibers in adherent cells. Cell shape and curvature anisotropy control the orientation of actin bundles in parallel patterns in the wings and trachea of insects. Nuclear positions guide the actin bundles organizing the cellularization of Sphaeroforma arctica [11]. Here, we focus on establishing that actomyosin gels have an intrinsic ability to self organize into dense nematic bundles, and leave how this property enables the morphogenesis of specific structures for future work. We have emphasized this point in the revised section of conclusions.

      (B) It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin gels originating from living cells. To our knowledge, the ability of reconstituted actomyosin gels from purified proteins to sustain the kind of contractile dynamical steady-states observed in living cells is very limited. In the revised manuscript, we cite a very recent preprint presenting very exciting but partial results in this direction [49]. Instead, reconstituted in vitro systems encapsulating actomyosin cell extracts robustly recapitulate contractile steady-states. This point has been clarified in the first paragraph of Section “Theoretical model”.

      (C) The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree with the referee and in the revised manuscript we have avoided the term “sarcomeric” because it refers to very specific organizations in cells. What we previously called “sarcomeric patterns”, where bands of high density exhibit nematic order perpendicular to the axis of the bands, is not a structure observed to our knowledge in cells. It is introduced to delimit the relevant region in parameter space. In the revised manuscript, we refer to this pattern as “banded pattern with perpendicular nematic organization” or “banded pattern” in short.

      (D) Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We thank the referee for raising this point, which was not sufficiently clarified in the original manuscript. We first note that in incompressible active nematic models, active tension is deviatoric (traceless and anisotropic) because an isotropic component would simply get absorbed by the pressure field enforcing incompressibility. Being compressible, our model admits an active tension tensor with deviatoric and isotropic components. We consider always a contractile (positive) isotropic component of active tension, but the deviatoric component can be either contractile (𝜅 > 0) or extensile (𝜅 < 0), where we follow the common terminology according to which in contractile/extensile active nematics the active stress is proportional to q with a positive/negative proportionality constant [see e.g. https://doi.org/10.1038/s41467018-05666-8]. Furthermore, as clarified in the revised manuscript, total active stresses accounting for the deviatoric and isotropic components are always contractile (positive) in all directions, as enforced by the condition |𝜅| < 1.

      For fibrillar patterns, we need 𝜅 < 0, and therefore active stresses are larger perpendicular to the nematic direction. This means that the anisotropic component of the active tension is extensile, although, accounting for the isotropic component, total active tension is contractile (see Fig. 1c). This is now clarified in the text following Eq. 7 and in Fig. 1.

      However, following fibrillar pattern formation and as a result of the interplay between active and viscous stresses, the total stress can be larger along the emergent dense nematic structures (“contractile structures”) or perpendicular to them (“extensile structures”). To clarify this point, in the revised Fig. 4 and the text referring to it, we have expanded our explanation and plotted the difference between the total stress component parallel to the nematic direction (𝜎∥) and the component perpendicular to the nematic direction (𝜎⊥), with contractile structures satisfying 𝜎∥ − 𝜎⊥ > 0 and extensile structures satisfying 𝜎∥ − 𝜎⊥ < 0. See lines 280 to 303. This is consistent with the common notion of contractile/extensile systems in incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8].

      (E) Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, as discussed in our response to comment (A) by this referee. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, possibly interacting with the adhesion machinery, finding dynamical interactions as those suggested by the referee. As an example, we show a video of a simulation where at the edge of the circular domain, there is an actin influx modeling the lamellipodium, and in four small regions friction is higher simulating focal adhesions. Under these boundary conditions, the model presented in the paper exhibits the kind of dynamical reorganizations alluded by the referee.

      Author response video 1.

      We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish.

      (F) Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. To our knowledge, these models have not been able to reproduce the heterogeneous nonequilibrium contractile states involving sustained self-reinforcing flows underlying the pattern formation mechanism studied in our work. The scope of the discrete network simulations has been clarified in lines 340 to 349 in the revised manuscript.

      While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-of-equilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate some aspects of our continuum modeling with discrete simulations. We have emphasized the complementarity of the two approaches in the conclusions.

      Reviewer #1 (Recommendations For The Authors):

      Questions on the theory:

      Does rho describe the density of actin or myosin? The authors say that they are modeling actomyosin material as a whole, but the actin and myosin should be modeled separately. Along, similar lines, does Q define the ordering of actin or myosin?

      Active gel models of the actomyosin cytoskeleton have been formulated with independent densities for actin and for myosin or using a single density field, implicitly assuming a fixed stoichiometry. Super-resolution imaging of the actomyosin cytoskeleton also suggest that in principle it makes sense to consider different nematic fields for actin and for myosin filaments. In the revised manuscript, we now explicitly mention that our density and nematic field are effective descriptions of the entire actomyosin gel (lines 82-84).

      A more detailed model would entail additional material parameters, not available experimentally, which may help reproduce specific experiments but that would make the systematic study of the different behaviors much more difficult. Our approach has been to keep the model minimal meeting the fundamental requirements outlined in the first paragraphs of Section “Theoretical model”.

      Should the active stress depend on material density? It seems strange (from Eq. 3) that active stress could be non-zero even where density is zero, since sigma_act does not depend on rho.

      Yes, active stress is assumed to be proportional to density. Eq. 3 in the original manuscript was misleading (it was multiplied by rho in Eq. 2). In the revised manuscript, we have explained with a bit more detail the theoretical model, clarifying this point.

      The authors should clearly explain their rationale for retaining certain types of nonlinear terms while ignoring others in theory. For instance, the nonlinearities in the equations of motion are sometimes quadratic in the fields, while there are also some cubic terms. Please remark up to what order in the fields the various interactions are modeled.

      We thank the referee for raising this point. The nonlinearities in the theory are easily explained on the basis of a small number of choices. We have added a new paragraph towards the end of Section “Theoretical model” (lines 145 to 152) providing a rationale for the origin and underlying assumptions leading to different nonlinearities.

      To connect with experiments and the biological context, please explain the biological origin of various terms in the model: (1) L-dependent terms in Eq. 2 and 4, (2) Flowalignment of nematic order and experimental evidence in support of it, (3) densitydependent susceptibility terms in Eq. 4

      (1) Unfortunately, the L-dependent terms are very bulky, but are very standard in nematic theories. The best way to understand their physical significance is through the expression of the nematic free-energy, which is now given and explained in the revised manuscript (Eq. 3). The resulting complicated expression for the molecular field and the nematic stress (Eqs. 4 and 5) are mathematical consequences of the choice of nematic free energy. In the revised manuscript, we also attempt to provide a basis for these terms in the context of the actin cytoskeleton. (2) To our knowledge, the best reference supporting this term from experiments is Reymann et al, eLife (2016). In the revised manuscript, we have provided a physical interpretation. (3) We have expanded the motivation and plausible microscopic justification of this term.

      There are different 'activity' terms in the model. Their biophysical origin is not made clear. For example, the authors should make clear if these activities arise from filament or motor activity. Relatedly, the authors should provide a comprehensive discussion of the signs of the different active parameters and their physical interpretations.

      In an active gel model, activity parameters are phenomenological and how they map to molecular mechanisms is not precisely known, although conventionally contractile active tension is ascribed to the mechanical transduction of chemical power by myosin motors. The fact is that, besides myosin activity, there are many nonequilibrium processes in the actomyosin cytoskeleton that may lead to active stresses including (de)polymerization of filaments or (un)binding of crosslinkers. In the revised manuscript, we have added sentences illustrating how different terms may result from microscopic mechanisms, but providing a precise mapping between our model and nonequilibrium dynamics of proteins is beyond the scope of our work, although our discrete network simulations address this issue to a certain degree.

      Following the suggestion of the referee, our description of the theory now discusses much more extensively the signs of activity parameters and their physical interpretations, e.g. the text following Eq. 7.

      Throughout the paper, various activity terms are varied independently of each other. Is that a reasonable assumption given that activities should depend on ATP and are thus not independent of one another?

      We agree that, ultimately, all active process depend on the conversion of chemical energy into mechanical energy. However, recent work has highlighted how active tension also depends on the microscopic architecture of the network controlled by multiple regulators of the actomyosin cytoskeleton (e.g. Chug et al, Nat Cell Biol, 2017). It is reasonable to expect that, for a given rate of ATP consumption, chemical power will be converted into mechanical power in different ways depending on the micro-architecture of the cytoskeleton, e.g. the stoichiometry of filaments, crosslinkers, myosins, or the length distribution of filaments (very long filaments crosslinked by myosins may be difficult to reorient but may contract efficiently).

      We have added a paragraph in Section “Theoretical model” with a discussion, lines 153 to 156.

      Sarcomeres are muscle fibers that exhibit alternating polarity pattern. Such patterning is not evident in what the authors call 'sarcomeres' in Fig. 2. I believe the authors should revise their terminology and not loosely interpret existing classifications in the field.

      We thank the referee for raising this point. We have changed the terminology.

      Fig 2a: Is the cartoon for filament alignment incorrect for kappa>0?

      The cartoon is correct. In the revised manuscript we have explained more clearly the physical meaning of kappa in the text following Eq. 7. In the caption of Fig. 1 and of Fig. 2a, we have also clarified that when the absolute value of kappa is <1, then active tension is positive in all directions.

      Within the section "Requirements for fibrillar and banded patterns", it will be useful to show the figures for varying the different active parameters in the main figures.

      We have followed the referee’s suggestion and moved Supp. Fig. 1 of the original manuscript to the main figures.

      How do the authors decide if bundles are contractile or extensile? Why are contractile bundles under tension while extensile bundles are under compression? I would expect the opposite.

      We agree that this point deserves a more detailed explanation. In the revised manuscript and in the new Figure 4, we further develop this point. The fibrillar pattern forms when kappa<0. We further assume that -1<kappa<0, so that active tension is positive in all directions. In this regime, the deviatoric (anisotropic) part of active tension is extensile. However, following pattern formation and because of the interplay between active and viscous stresses, the total stress in the emerging bundles may become extensile or contractile, depending on whether the largest component of stress is perpendicular or along the bundle axis. This is now presented in the updated figure, with new panels presenting maps of the total tension. The text discussing this point has been rewritten and we hope that the new version is much clearer (lines 280 to 303).

      A contractile bundle tends to shorten, but it cannot do it because of boundary conditions or the interaction with other bundles. As a result they are in tension. Conversely, an extensile bundle tries to elongate, but being constrained, it becomes compressed. As an analogy, consider the cortex of a suspended cell. The cortex is contractile, but it cannot contract because of volume regulation in th cell, which is typically pressurized. As a result, tension in the cortex is positive, as shown by Laplace’s law [10.1016/j.tcb.2020.03.005]. We have tried to clarify this point in the revised manuscript.

      Can the authors reproduce alternating density patterns using the cytosim simulations? This is an important step in establishing the correspondence between the continuum theory and the agent-based model.

      We have addressed this point in our response to public comment (F) of this referee.

      The authors do not provide code or data.

      The finite element code with an input file require to run a representative simulation in the paper is now made available, see Ref. [74].

      The customizations of Cytosim needed to account for nematic order in our discrete network simulations are available, see Ref. [98].

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we have highlighted the novelty, particularly in the last paragraph of the introduction, the first two paragraphs of Section “Theoretical model”, and in the conclusions. Despite a very large literature on theoretical models of stress fibers, actin rings, and active nematics, we argue that the active self-organization of dense nematic structures from an isotropic and low-density gel has not been compellingly explained so far. Many models assume from the outset the presence of actin bundles, or explain their formation using localized activity gradients. The literature of active nematics has extensively studied symmetry breaking and the self-organization. However, most of the works assume initial orientational order. Only a few works study the emergence of nematic order from a uniform isotropic state, but consider dry systems lacking hydrodynamic interactions or incompressible and density-independent systems [37,38]. Yet, pattern formation in actomyosin gels is characterized by large density variations, and by highly compressible flows, which coordinate in a mechanism relying on an advective instability and self-reinforcing flows.

      Our theoretical model is not particularly novel, and as we mention in the manuscript, it can be particularized to different models used in the literature. However, we argue that it has the right minimal features to capture nematic self-organization in actomyosin gels. To our knowledge, no previous study explains the emergence of dense and nematic structures from a low-density isotropic gel as a result of activity and involving the advective instability typical of symmetry-breaking and patterning in the actomyosin cytoskeleton. These are important qualitative features of our results that resonate with a large experimental record, and as such, we believe that our work provides a new and compelling mechanism relying on self-organization to explain the prominence and diversity of patterns involving dense nematic bundles in the actomyosin cytoskeleton across species.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that this was a weakness of the original manuscript. In the revised manuscript, within reasonable space constraints given the size and dynamism of the field of active nematics, we have placed our work in the context of this field (end of introduction and first two paragraphs of Section “Theoretical model”). The published version of our companion manuscript [45] also contributes to providing a clear context to our theoretical model within the field.

      Reviewer #2 (Recommendations For The Authors):

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article. I explain my questions comments below.

      We have responded to this comment above.

      (i) Active nematics including density variations have been dealt quite extensively in the literature. For example, the works of Sriram Ramaswami have dealt with this system including linear stability analysis, simulations etc. In what way is the present work different from the system that they have considered?

      (ii) Active flows leading to self organization has been a topic of discussion in many works. For example: (i) Annual Review of Fluid Mechanics, Vol. 43:637-659, 2010, https://doi.org/10.1146/annurev-fluid-121108-145434 (ii) S Santhosh, MR Nejad, A Doostmohammadi, JM Yeomans, SP Thampi, Journal of Statistical Physics 180, 699-709 (iii) M. G. Giordano1, F. Bonelli2, L. N. Carenza1,3, G. Gonnella1 and G. Negro1, Europhysics Letters, Volume 133, Number 5. In what way this work is different from any of these?

      (iii) I am confused about the models used in the paper. There is significant literature from Prof. Mike Cates group, Prof. Julia Yeomans group, Prof. Marchetti's group who all use similar governing equations. In the present paper, I find it hard to understand whether the model used is similar to the existing ones in literature or are there significant differences. It should be clarified.

      Response to (i), (ii) and (iii).

      We completely agree with this referee (and also the previous referee), that the contextualization of our work in the field of active nematics was very insufficient. In the revised manuscript, the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model” now address this point. In short, previous active nematic models predicting patterns with density variations have been either for dry active matter (disregarding hydrodynamic interactions), or for suspensions of active particles moving in an incompressible flow. None of these previous works predict nematic pattern formation as a result of activity relying on the advective instability and self-reinforcing compressible flows, leading to high density and high order bundles surrounded by an isotropic low density phase. Yet, these are fundamental features observed in actomyosin gels. Many works deal with symmetry-breaking of a system with pre-existing order, but very few address how order emerges actively from an isotropic state. We thank the referee for pointing at the paper by Santhosh et al, who nicely make this argument and is now cited. Our mechanism is fundamentally different from that in Santhosh, whose model is incompressible and ignores density variations.

      We hope that the revised manuscript addresses this important concern.

      (i) >(iv) Below Eqn 6, it starts by saying that the “...origin..is clear...” Its not. I don't understand the physical origin of the instability, and this should be clarified, may be with some illustrations.

      We apologize for this unfortunate sentence, which we have rewritten in the revised manuscript (lines 181 to 185).

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel, as mentioned in the revised manuscript. We have emphasized this point in different places in the revised manuscript. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. The companion publication has now been accepted in the New Journal of Physics, with significant changes to better connect the work to the field of active nematics. A preprint reflecting those changes is available in Ref. [64], but we hope to reference the published paper that will come out soon.

      In the revised manuscript, we have significantly rewritten the Section “Theoretical model” to frame the continuum model in the context of the field of active nematics. While our model and results have commonalities with previous work, there are also important differences. We have highlighted the novelty of the present work along with the relation with previous studies and theoretical models in the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model”. Furthermore, as suggested by the referee, we have made an effort to connect our results with previous work by Kumar, Mietke, Doostmohammadi and others.

      Regarding the last point alluded by the referee (“extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis”), the picture raised by the referee would be nuanced for our compressible system as compared to the incompressible systems discussed in that reference. As we have elaborated in our response to point (D) of Referee #1, our systems are overall contractile (with positive active tension in all directions), but the deviatoric component of the active tension can be either extensile or contractile. In our “extensile” models (left in Fig. 2c), material is drawn to laterally to the nematic axis but it is not expelled along this axis. Instead, it is “expelled” by turnover. In the revised manuscript, we have added a comment about this.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. In the revised manuscript, we have expanded and clarified the characterization of emergent contractile/extensile networks by reporting the relative magnitude of stress along and perpendicular to the nematic direction. Our revised manuscript clearly shows that even though all of our simulations describe locally contractile systems with extensile anisotropic active tension, the emergent meso-structures can be either extensile or contractile, with the extensile ones exhibiting the usual bend-type instability (a secondary instability in our system) described classically for extensile active nematic systems. We have rewritten the text discussing this (lines 280 to 303), where we have placed these results in the context of recent work reporting the nontrivial relation between the contractility/extensibility of the local units vs the nematic pattern.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa may not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. As discussed in our response to referee #1, we believe that studying the formation of patterns using the discrete network simulations is far beyond the scope of our work. We discuss in lines 332 to 341, as well as in the last paragraph of the conclusions, the scope and limitations of our discrete network simulations.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

      Reviewer #3 (Recommendations For The Authors):

      • The statement "the porous actin cytoskeleton is not a nematic liquid-crystal because it can adopt extended isotropic/low-order phases" is difficult to understand and should be clarified, as the next paragraph starts formulating a nematic active liquid crystal theory. Do the authors mean a crystal that "Tends to be in a disordered phase?", according to its equilibrium properties? It would still be a "nematic liquid crystal", only its ground state is not a nematic phase.

      We agree with the referee, and we hope that changes in the introduction and in Section “Theoretical model” address this comment.

      • I could not find what Frank energy is precisely used, that would be helpful information.

      In the revised manuscript, we have provided the expression for the nematic free energy in Eq. 3.

      • The Significance of green/purple arrows in Fig 2a sketch unclear, green arrows also in b,c, do they represent the same quantity? From the simulations images it is overall it is very difficult to see how the flows are oriented near the high-density regions (i.e. if they are towards / away from the strip).

      We thank the referee for bringing this up. The colorcodings of the sketches were confusing. The modified figures (Fig. 1(c) and Fig. 2(a)) present now a clearer and unified representation of anisotropic tension. The green arrows in Fig. 2(c) represent the out-of-equilibrium flows in the steady state. We agree that the zoom is insufficient to resolve the flow structure. For this reason, in the revised Fig. 2, we have added additional panels showing the flow with higher resolution.

      • It is currently unclear how the linear stability results - beyond identification of the parameter \delta - inform any of the remaining manuscript. Quantitative comparisons of the various length scales seen in simulated patterns (e.g. Fig. 2b, 3c etc) with linear predictions and known characteristic length scales would be instructive mechanistically, would make the overall presentation more compelling and probes limitations of linear results.

      In the revised manuscript, we have provided further information so that the readers can appreciate the predictions and limitations of the linear stability results. We have added a sentence and a Figure to show that, in addition to the critical activity, the linear theory provides a good prediction of the wavelengh of the pattern. See lines 199 to 201.

      • It is not clear what is meant by "[bundle-formation] requires that active tension perpendicular to nematic orientation is larger than along this direction", and therefore also not why that would be "counter-intuitive". If interpreted naively, I would say that a large tension brings in more filaments into the bundle, so that may well be an obviously helpful feature for bundle formation and maintenance. In any case, it would be helpful if clarity is improved throughout when arguments about "directions of tensions" are made.

      We have significantly rewritten the first paragraphs of section “Microscopic origin…” to clarify this point (lines 330 to 339). This paragraph, along with other changes in the manuscript such as the explanation of Eq. 7 or the discussion about the stress anisotropy in the new version of Fig. 4 (see lines 280 to 303), provide a better explanation of this important point.

      • All density color bars: Shouldn't they rather be labelled \rho/\rho_0?

      Yes! We have corrected this typo.

      • Scalar product missing in caption definition of order parameter Fig. 2

      We have corrected this typo.

      • Fig. 3a: I suggest to put the expression for q0 in the caption

      We have changed q_0 by S_0 and clarified its meaning in the caption of what now is Fig 4.

      • Paragraph on bottom right of page 6 should several times probably refer to Fig. 3c(...), instead of Fig. 3b

      We have corrected this typo.

    1. Author response:

      We thank all three reviewers for their thoughtful and constructive evaluations of our manuscript, “Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons.” We are encouraged that the reviewers recognize the value, specificity, and utility of these new lines for the basal ganglia and dopamine research communities. Below, we summarize our planned revisions and clarifications in response to the reviewers’ comments.

      (1) Novelty and comparison with existing lines

      We appreciate Reviewer 1’s point regarding the existence of previously generated Cre and Flp lines targeting similar neuronal populations. Our project was initiated six years ago, and during the course of generating and characterizing all five lines, we became aware that similar individual lines have since been developed by other groups. Nevertheless, our study provides a coordinated and independently validated set of lines created using a standardized knock-in (KI) strategy and distributed through Jackson Laboratories for unrestricted community use. Importantly, whereas previous BAC transgenic approaches rely on random insertion, which can lead to position effects and ectopic expression, our design places the recombinase coding sequence immediately downstream of the endogenous stop codon using a self-cleaving T2A peptide. This ensures expression under native promoter and regulatory control, preserving physiological gene regulation.

      To address the Reviewers’ points, we will (i) expand the Introduction and Discussion to clarify the rationale and advantages of endogenous promoter–driven recombinase expression over BAC-based systems, emphasizing that our lines provide a uniform, promoter-controlled, and publicly accessible toolkit for the community, (ii) and explore including a comparative table summarizing differences in construct design, expression fidelity, and recombination efficiency across published lines (e.g., PMID 33979604, 38965445).

      (2) Quantification, validation, and comparison of Cre vs FlpO

      We agree with Reviewers 1 and 2 that further quantification and discussion of Cre versus FlpO fidelity will strengthen the manuscript. The observed difference in expression breadth between Cre and FlpO lines likely reflects a fundamental property of the recombinases themselves rather than a discrepancy in targeting. Cre recombinase is significantly more enzymatically efficient than FlpO, meaning that even very low endogenous levels of gene expression (e.g., Drd1a or Adora2a) can drive Cre-dependent recombination, whereas FlpO requires higher expression thresholds. Consequently, reporter-based readouts will inherently appear broader for Cre lines, despite both being driven by the same endogenous promoters.

      To address these points, we will (i) provide quantitative co-labeling analyses for the DAT-FlpO line with TH immunostaining to assess efficiency and specificity, (ii) clarify in the Results and Discussion that differences between Cre and FlpO expression patterns largely stem from differences in recombinase kinetics and sensitivity, not mismatched promoter activity, (iii) and include representative high-resolution images and relevant statistics in the revised figures. Importantly, we would like to note that RNAscope may not be an ideal validation approach in this context, as in situ transcript detection cannot capture the enzymatic threshold differences that determine reporter recombination and thus will not help address observed differences between Cre and FlpO lines. Finally, we are actively performing electrophysiological comparisons between Cre and FlpO lines to rigorously quantify potential physiological differences between them. Updated analyses will be incorporated as available or described as ongoing future work.

      (3) Discussion of scope and interpretation

      We appreciate the reviewers’ suggestions to better contextualize the scope of this resource. We will revise the Discussion to (i) highlight that the Cre–FlpO pairings enable powerful intersectional and cross-line strategies for dissecting basal ganglia and midbrain circuitry, (ii) and clarify that our goal was to generate a rigorously validated foundational resource, with detailed functional comparisons and manipulation studies to be explored in subsequent work.

      In summary, we thank the reviewers for their insightful feedback. The planned revisions and clarifications will underscore the unique strengths of our knock-in design, explore potential Cre–FlpO differences, and highlight the value of this standardized and accessible toolkit for the neuroscience community.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Strengths: 

      Overall, this manuscript is well-written and contains a large amount of high-quality data and analyses. At its core, it helps to shed light on the overlapping roles of Edc3 and Scd6 in sculpting the yeast transcriptome. 

      Weaknesses: 

      (1) While the data presented makes conclusions about mRNA stability based on corresponding ChIP-Seq analyses and analyzing other mutants (e.g. Dcp2 knockout), at no point is mRNA stability actually ever directly assessed. This direct assessment, even for select transcripts, would further strengthen their conclusions. 

      We appreciate the reviewer’s concern but wish to emphasize that we conducted ChIP-Seq analysis of RNA Polymerase II occupancies in the CDSs of all genes, known to be a reliable indicator of transcription rate, and found only small increases in Pol II occupancies that cannot account for the increased transcript levels of the cohort of mRNAs up-regulated in the scd∆6edc3∆ double mutant (Fig. 3E). This provides strong evidence that increased transcription is not the main driver of increased mRNA abundance in this mutant.  Bolstering this conclusion, we showed that the Hap2/Hap3/Hap4/Hap5 complex of transcription factors responsible for induction of Ox. Phos. genes was not activated in scd6Δedc3Δ cells in glucose medium (Fig. 6F(ii)); nor was the Adr1 activator of CCR genes activated (Fig. S9C(i)), ruling out transcriptional induction of their target genes in glucose-replete scd6Δ/edc3Δ cells and instead favoring reduced degradation as the mechanism underlying derepression of Ox. Phos. and CCR gene transcripts in this mutant. In Fig. 3B, we further showed that the majority of mRNAs up-regulated in the scd6Δedc3Δ double mutant are also derepressed by dcp2Δ, and in Fig. 3D that the mRNAs up-regulated in scd∆6edc3∆ cells exhibit a higher than average codon protection index (CPI) indicating a heightened involvement of decapping and co-translational degradation by Xrn1 in their decay. To provide additional support for our conclusion, we have conducted new experiments to measure the abundance of capped mRNAs genome-wide by CAGE sequencing of total mRNA in both WT and scd∆6edc3∆ cells.  As established previously, normalizing CAGE TPMs to total mRNA TPMs determined by RNA-Seq, dubbed the C/T ratio, provides a reliable measure of the capped proportion of each transcript.  The new data presented in Fig. 3C indicate that the mRNAs up-regulated in the scd∆6edc3∆ mutant have significantly lower than average C/T ratios in WT cells, whereas the C/T ratios for the down-regulated transcripts are higher than average, and that these differences between the two groups and all expressed mRNAs are diminished in the scd∆6edc3∆ double mutant. These are the results expected if the up-regulated mRNAs are selectively targeted for decapping in WT cells dependent on Edc3/Scd6, whereas the downregulated mRNAs are targeted by Edc3/Scd6 less than the average transcript. In the original version of the paper, we came to the same conclusion by analyzing our previous CAGE data for the dhh1∆ mutant for the same transcripts dysregulated scd∆6edc3∆ cells, now presented as supportive data in Fig. S3F. Finally, we added the fact that among all four Dhh1 target mRNAs examined in the previous study of He et al. (2022) and found here to be up-regulated selectively in the scd6∆edc3∆ double mutant (Fig. S10), two of them (SDS23 and HXT6) were shown directly to have longer half-lives in dhh1∆ vs. WT cells by He et al. (2018). Hence, the combined evidence is compelling that selective up-regulation of particular mRNAs in the scd∆6edc3∆ mutant results from diminished decapping/decay rather than enhanced transcription; and we feel that the additional supporting evidence that would be provided by measuring half-lives of a small group of up-regulated transcripts would not justify the considerable effort required to do so.  Moreover, the standard approach for such experiments of impairing transcription with an inhibitor of Pol II or a Pol II Ts<sup>-</sup> mutation has been criticized because of the known buffering (suppression) of mRNA decay rates in response to impaired transcription.

      (2) Scd6 and Edc3 show a high level of functional redundancy, as demonstrated by the double mutant. As these proteins form complexes with other decapping factors/activators, I'm curious if depleting both proteins in the double mutant destabilizes any of these other factors. Have the authors ever assessed the levels of other key decapping factors in the double mutants (i.e. Dhh1, Pat1, Dcp2...etc)? I wonder if depleting both proteins leads to a general destabilization of key complexes. It would also be interesting to see if depleting Edc3 or Scd6 leads to a concomitant increase in the other protein as a compensatory mechanism. 

      We thank the reviewer for this insight.  Examining our Ribo-Seq and TMT-MS data revealed that Dhh1 expression and steady-state abundance are increased ~2-fold in the scd6∆edc3∆ strain, indicating that the up-regulation of many of the same mRNAs by scd6∆edc3∆ and dhh1∆ does not result indirectly from reduced levels of Dhh1 in the scd6∆edc3∆ mutant. The predicted increased in Dhh1 expression might signify a compensatory response to the absence of Scd6/Edc3.  We also observed an ~40% reduction in Dcp2 translation (RPFs) and mRNA abundance in the scd6∆edc3∆ strain, which might contribute to the up-regulation of mRNAs dysregulated in this mutant. However, our new immunoblot analyses revealed no significant reduction in steady-state Dcp2 levels in scd6∆edc3∆ cells (Input lanes in Figs. 3F and S4C(i)-(ii)). Moreover, our previous finding that the majority of mRNAs subject to NMD, up-regulated by both upf1∆ and dcp2∆, are not upregulated by scd6∆edc3∆ implies that Dcp2 abundance in scd6∆edc3∆ cells is adequate for normal levels of NMD and favors a direct role for Scd6/Edc3 in accelerating degradation of most transcripts up-regulated in this mutant. We have added these points to the DISCUSSION.

      (3) While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role in enforcing the regulation that they see. Their double mutant now puts them in a perfect position to carry out such experiments. 

      We agree with the reviewer that our scd6∆edc3∆ strain provides an opportunity to dissect the Scd6 and Edc3 proteins to determine which domains and motifs of each protein are most critically required for their functions in activating mRNA decay. However, if conducted thoroughly, this would entail an extensive analysis requiring a combination of genetics, biochemistry and genomics.  Considering the large amount of data already presented in 43 and 34 panels of main and supplementary figures, respectively, we feel that these additional experiments would be conducted more appropriately as a stand-alone follow-up study.

      Reviewer #2 (Public review): 

      Weaknesses: 

      The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies in the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies). Also, these rescue assays could provide a good platform to further characterise the protein-protein interactions between Edc3, Scd6, and Dhh1. 

      We responded to this point immediately above in responding to Rev. #1.

      Reviewer #3 (Public review): 

      Weaknesses: 

      The limitations of the study include the use of indirect evidence to support claims that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, which is inferred from correlations in mRNA abundance and ribosome profiling data rather than direct biochemical evidence. 

      While the reviewer makes a valid point, it is important to note that the greater correlations between effects of scd6∆edc3∆ with those conferred by dhh1∆ vs. pat1∆ also extended to changes in metabolites (Fig. 7A-C). To provide more direct evidence that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, we have now conducted co-immunoprecipitation experiments (presented in new Figs. 3F and S5) demonstrating that association of Dhh1 with Dcp2 is diminished in the scd6∆edc3∆ double mutant but not in either scd6∆ or edc3∆ single mutant, thus providing biochemical support for our proposal.

      Also, there is limited exploration of other signals as the study is focused on glucose availability, and it is unclear whether the findings would apply broadly across different environmental stresses or metabolic pathways. Nonetheless, the study provides new insights into how mRNA decapping and degradation are tightly linked to metabolic regulation and nutrient responses in yeast. The RNA-seq and ribosome profiling datasets are valuable resources for the scientific community, providing quantitative information on the role of decapping activators in mRNA stability and translation control. 

      While not disputing the facts of this comment, we think it is unjustified to label as a weakness that our study focused on glucose-grown cells considering the large amount of new data and insights made possible by our multi-omics approach, presented in >70 separate figure panels and nine supplementary datafiles, which the reviewer has characterized as being valuable to the scientific community.  Parallel studies in non-preferred carbon or nitrogen sources are underway and represent large-scale investigations in their own right, for which the current dataset in glucose-replete cells provides the critical reference condition.

      Reviewer #1 (Recommendations for the authors): 

      The authors made a note that a set of 37 mRNAs is repressed exclusively by Edc3 with little contribution by Scd6, a list that includes the RPS28B mRNA. Edc3 has been previously reported to promote the decay of this mRNA in a deadenylation-independent fashion by binding to an element in its 3'UTR (PMIDs 15225544, 24492965). Can the authors comment on whether Edc3 may be binding to similar elements in the 3'UTRs of these transcripts in their shortlist? This could be an interesting topic matter for discussion as well. 

      While an interesting idea, this seems unlikely because the 3’UTR sequence in RPS28B mRNA was shown to bind Rps28 protein itself to confer heightened decapping and decay dependent on Edc3 in a negative autoregulatory loop that exerts tight control over Rps28 protein levels.  It would be surprising if Edc3mediated repression of the other 36 mRNAs would involve Rps28 as none of them encode cytoplasmic ribosomal proteins. Nevertheless, we searched for a conserved motif among the 3’UTRs of the 37 mRNAs using the MEME suite and found enrichment for motifs identified for RNA binding proteins Hrp1 and Nab2 and two novel motifs, but none of these motifs could be recognized within in the Rps28 autoregulatory loop.  We have chosen not to comment on these findings in the revised manuscript to avoid lengthening it unnecessarily with inconclusive observations.

      Reviewer #2 (Recommendations for the authors): 

      The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by the transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies on the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies); or expressing truncated mutants of EDC3 (pLfz614-7) or SCD6 (pLfz615-5), to show that they can act as dominant negative competitors, either on the binding to Dhh1 and Dcp2. 

      We addressed this comment above in our response to this Reviewer.

      Reviewer #3 (Recommendations for the authors): 

      (1) Labels such as "mRNA_up_s6,e3" are not defined in figures or the text. I suggest clearer sample labeling throughout. 

      The labels had been defined at first mention in the RESULTS but are now indicated there more explicitly, as well as in the legend to Fig. 1.

      (2) In Figure 1D it is surprising that the mRNA profile has a peak in the 5' UTR. I would expect to see such a peak in ribosome footprinting data. Is it possible these are incorrectly labeled?

      The figure is correctly labeled. Generally, one does not expect to see RPFs in the 5’UTR region unless there is an efficiently translated uORF, which appears not to be the case for MDH2.

      In general, the information in this panel and C is inadequate. None of the numbers are clearly explained in the figure legend or in the figure. 

      We had cited the legend to Fig. S3C for details of all such gene browser images but have now inserted this information into the Fig. 1D legend, at the first occurrence of such data in the regular figures. 

      (3) Figures 1C and 1D are in the wrong order.

      Corrected.

      (4) Figure 2D is a very complicated Venn Diagram. I suggest using UpSet plots as an alternative to Venn diagrams to more clearly convey overlaps between sets.  

      We provided additional explanatory text in the Fig. 2D legend to facilitate understanding.

      (5) The use of the same color scheme to represent different sets in panels of the same figure is a source of confusion. E.g. the cyan in Figures 2A, 2D, and 2E indicates unrelated categories, but one would think they are related.

      The use of the same cyan color in these three figure panels actually does designate results for the same set of 591 mRNAs up-regulated in the three mutants.  The application of the color schemes is now mentioned explicitly in Figs. 1, 2, and S3.

      (6) Reporting of p-values = 0 in figures is not useful.

      Corrected.

      (7) The whole manuscript is extremely long which reduces the overall impact. For example, the introduction is six pages long. I suggest reducing redundant text and being more concise to enhance readability. 

      We tried to streamline the text wherever possible, in particular shortening the Introduction by two pages.

      (8) Many abbreviations are used throughout the text that are not introduced the first time they are used. 

      Corrected throughout.

      (9) The ERCC normalization is unclear. Were the spike-ins added before cell lysis to allow estimation of per-cell RNA counts or to the extracted RNA? If added to extracted RNA rather than cells it is not clear to me how the claim can be made regarding increased mRNA abundance in the mutants. 

      We thank the reviewer for this comment. As we explained in the Methods, 2.4 µl of 1:100 diluted ERCC RNA Spike-In Control Mix 1 was added to 1.2 µg of each total RNA sample prior to cDNA library preparation.  Because the majority of total mRNA is comprised of rRNA, this normalization yields the abundance of each mRNA relative to rRNA. Owing to repression of rESR mRNAs encoding ribosomal proteins and biogenesis factors in the scd6∆edc3∆ strain (Fig. S3D), the ribosome content per cell is expected to be reduced in this mutant vs. WT. We showed previously that the isogenic dcp2∆ mutant that elicits an ESR response of similar magnitude, showed a 30% reduction in bulk ribosomal subunits per cell compared to same WT strain examined here {Vijjamarri, 2023 #7866}.  Assuming a similar reduction in ribosome abundance in the scd6∆edc3∆ mutant, the changes in mRNA per cell conferred by the scd6∆edc3∆ mutation are expected to be 0.7-fold of the ERCCnormalized values given in Fig. 3E, yielding fold-changes of 2.00 and 0.62 for the mRNA_up and mRNA_dn, groups, respectively, which still differ substantially from the corresponding changes in normalized Rpb1 occupancies of 1.2 and 0.93, respectively.  We have added this new analysis to the text of RESULTS.

      (10) The use of the terms "up-regulated" and "derepressed" throughout is confusing. Both refer to observed increased abundance of mRNAs, but they imply different causes which are never clearly defined. 

      We changed all occurrences of “derepressed” to “up-regulated”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study on potassium ion transport by the protein complex KdpFABC from E. coli reveals a 2.1 Å cryo-EM structure of the nanodisc-embedded transporter under turnover conditions. The results confirm that K+ ions pass through a previously identified tunnel that connects the channel-like subunit with the P-type ATPase-type subunit. 

      Strengths: 

      The excellent resolution of the structure and the thorough analysis of mutants using ATPase and ion transport measurements help to strengthen new and previous interpretations. The evidence supporting the conclusions is solid, including biochemical assays and analysis of mutants. The work will be of interest to the membrane transporter and channel communities and to microbiologists interested in osmoregulation and potassium homeostasis. 

      Weaknesses: 

      There is insufficient credit and citation of previous work. 

      The manuscript has been thoroughly revised with special attention to acknowledging all past work relevant to the study.

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm. 

      Strengths: 

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the ratelimiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying welldefined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology. 

      Weaknesses: 

      While the results are overall compelling, several aspects of the work raised questions. First, the authors determined the structure of the pump in nanodiscs under turnover conditions and observed several structural classes, including E1-P, which is detailed in the paper. Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes? 

      As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story that is focused on the conduction pathway for K<sup>+</sup> ions. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF<sub>4</sub> and BeF<sub>x</sub>, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.

      The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? 

      Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess. 

      To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in an expanded "Activity Assays" section of Methods.

      In addition, we have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

      The authors propose that the tunnel connecting the subunits is filled with water and lacks potassium ions. This is an important mechanistic point that has been debated in the field. It would be interesting to calculate the volume of the tunnel and estimate the number of ions that might be expected in it, given their concentration in bulk. It may also be helpful to provide additional discussion on whether some of the observed densities correspond to bound ions with low occupancy.  

      As suggested, we calculated the internal volume of the tunnel within KdpA (from the S4 K<sup>+</sup> site to the KdpA/KdpB subunit interface) based on the profile derived from Caver. Based on this volume (4.9 x 10<sup>-25</sup> L), a single K<sup>+</sup> ion within this cavity would correspond to 3.4 M, which is near saturation for a solution of KCl. We added this information together with an acknowledgment of low-occupancy K<sup>+</sup> to the fourth paragraph of the Discussion:

      " Fourth, based on the volume of the cavity in KdpA, a single K<sup>+</sup> ion would correspond to a concentration of 3.4 M, suggesting that multiple ions would exceed the solubility limit especially in the absence of counterions. Finally, map densities within the tunnel were either of comparable strength or weaker than surrounding side chain atoms, unlike at S3 and canonical binding sites. Although it is possible that weaker density could represent low occupancy K<sup>+</sup> ions, we favor a mechanism whereby individual K<sup>+</sup> ions occupy the tunnel transiently as they transit between the selectivity filter and the canonical binding site."

      In order to make this analysis, we developed a python script to calculate the volume of the tunnel as defined by the Caver software (this software is available via github.com/dls4n/tunnel). In turn, this enabled us to distinguish water molecules that were actually in the tunnel rather than bound more deeply within the structure of KdpA. As a result, we updated the water distribution plot in Fig. 4b. Notably, the 17 water molecules within this cavity would correspond to 57.8 M, which is reasonably near the expected 55 M for an aqueous solution.

      Reviewer #3 (Public review): 

      Summary: 

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wild-type protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway. 

      Strengths:  

      Although this structure is not so different from previous structures, its high resolution (2.1 Å) is impressive and allows the resolution of many new densities in the potassium transport pathway. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. 

      Weaknesses: 

      The structures are supported by solid membrane electrophysiology. These data exhibit some weaknesses, including a lack of information to assess the rigor and reproducibility (i.e., the number of replicates, the number of sensors used, controls to assess proteoliposome reconstitution efficiency, and the stability of proteoliposome absorption to the sensor). 

      To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in the "Activity Assays" section of Methods.

      Reviewing Editor Comments

      After discussing the evaluations, the Reviewers and Reviewing Editor have identified the following essential revisions that would need to be addressed to improve the eLife assessment:

      (1) Work from others in the field should be adequately described and acknowledged: 

      (a) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex." 

      The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially. 

      Explicit reference to this work has been added to as follows:

      “A series of X-ray and cryo-EM structures of KdpFABC from E. coli (Huang et al., 2017; Silberberg et al., 2022, 2021; Stock et al., 2018; Sweet et al., 2021) indicate a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex. As first proposed by Stock et al. (Stock et al., 2018), there is now a consensus that K<sup>+</sup> enters the complex from the extracellular side of the membrane through the selectivity filter of KdpA, but is blocked from crossing the membrane.”

      (b) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work." 

      This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned 

      The Q116R mutant was used by Huang et al., but indeed not used for the Stock et al paper. We have replaced the sentence in the manuscript with the following:

      “Use of the KdpD knockout strain allowed us to produce WT and mutant protein free from Ser162 phosphorylation.”

      (c) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned. 

      Reference to the Silberberg 2022 paper, where E1~P was the most highly populated state, has been added. The percentage of particles was removed as we are still processing data from the other states, which will we hope will be described in a future manuscript.

      (d) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling."  The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning. 

      To correct this oversight, we made the following changes to the text: 

      On pg. 7 in the Results section, we refer to the 2005 paper from Bramkamp & Altendorf:

      “Consistent with earlier work on this mutant (Bramkamp and Altendorf, 2005), the D583A mutant displayed substantial ATPase activity (30% of WT) but no transport, indicating that this particular mutation interfered with energy coupling.”

      At the end of pg. 10 in the Discussion, we revised the paragraph discussing D583 and Lys586 to explicitly refer to the mechanism of transport described in the 2021 paper from Silberberg et al, including proximal and distal binding sites as well as uncoupling due to the D583A mutation.

      “Similar to the Glu370/Arg493 charge pair in KdpA, Asp583 and Lys586 are the only charged residues in the membrane core of KdpB. Although they are not seen to interact directly in our structure, they coordinate accessory waters associated with the canonical binding site. Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K<sup>+</sup> ions. Based on these simulations, these authors proposed a mechanism whereby neutralization of this site either by ion binding or by D583A substitution served to stimulate ATPase activity. Indeed, earlier work on D583A (Bramkamp and Altendorf, 2005) as well as current data demonstrate uncoupling, in which K<sup>+</sup> independent ATPase activity was observed even though transport was abolished. A plausible explanation for this stimulation is seen in the behavior of Lys586 in previous structures of the E2·Pi state (7BGY and 7BH2) (Sweet et al., 2021). In these structures, M5 undergoes a conformational change that pushes the side chain of Lys586 into the CBS. As a consequence of the D583A mutation, this Lys could be freed to act as a built-in counter ion as in related P-type ATPases ZntA (Wang et al., 2014) and AHA2 (Pedersen et al., 2007). In regard to the proximal binding site and the partnering “distal binding site” on the KdpA-side of the subunit interface, our structure does not show densities at either site and thus does not provide any support for the related mechanism. In any case, in the WT complex it seems likely that Asp583 exerts allosteric control over Lys586 and ensures that its movement into the binding site is coordinated with the transition from E1~P to E2·Pi, thus leading to displacement of K<sup>+</sup> from the CBS and release to the cytoplasm. “

      (e) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. " 

      Incorrect, see Silberberg 2021. 

      On this point, we beg to differ. Although this 2021 paper shows densities in experimental cryo-EM maps and effects of mutations to residues at the KdpA and KdpB interface, the intra-tunnel transport mechanism is based on computational analysis (MD simulations) and not experimental evidence. We softened the statement to read as follows:

      “Although it has been postulated to conduct K<sup>+</sup>, direct experimental evidence has been hard to come by.”

      (f) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures. 

      Phe232 is shown as a point of reference for the KdpA/KdpB subunit interface. We added a reference to Phe232 in the Results section labeled “Intersubunit tunnel” as well as the paragraph in the Discussion addressed in point d) above.

      " These densities, which we have modeled as water, are most prevalent near the vestibule, which is the wider part of the tunnel, but then disappear completely at the subunit interface near Phe232, which is the narrowest part of the tunnel and also distinctly hydrophobic (Fig. 4)."

      " Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K<sup>+</sup> ions."

      (g) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). " 

      KcsA is not a member of the SKT superfamily. 

      Thanks. This is correct, although the SKT superfamily is believed to have evolved from KcsA. KcsA has been removed from the sentence and a reference added to a review of the SKT superfamily:

      “which also includes bona fide K<sup>+</sup> channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”

      (2) Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes? 

      As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF<sub>4</sub> and BeF<sub>x</sub>, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.

      (3) The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. 

      To address this concern, we have generated supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also added a detailed description of replicates, sensor stability and the experimental protocols in the "Activity Assays" section of Methods. In addition, we have highlighted observations of pre-steady state binding currents that were seen for some mutants (e.g., Q116R assayed with Rb<sup>+</sup>, NH<sub>4</sub><sup>+</sup> and Na<sup>+</sup>), in which an initial, transient current response was observed without an ensuing transport current. The depiction of this raw data has allowed us to explain our use of the current response at 1.25 s, after decay of this binding current, as a measure of transport rate. This approach is consistent with recommendations by the manufacturer, as documented in their 2023 publication (Bazzone et al. https://doi.org/10.3389/fphys.2023.1058583).

      (4) Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess. 

      We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

      (5) What are the different lines in Figure 1 - Supplement 1, panel G? 

      This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.

      (6) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2. 

      The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the PostAlbers cycle. Reference is made to the Silberberg 2022 paper, which made a similar observation in a detergent-solubilized sample.

      (7) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 µM in Figure 2e. 

      This was a typographical error. The text now states that Km for Q116E is <100 M.

      (8) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed. 

      The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.

      (9) Figure 3 would benefit from a slice through the protein to orient the viewer. 

      Thanks for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.

      (10) The differences between R493E, Q, and M do not appear to be significant. 

      The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001).  The associated text on pg. 6 has been slightly modified as follows:

      “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”

      (11) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?) 

      These mutations have historical significance, having been generated by random mutagenesis during early characterization of the Kdp system by Epstein and colleagues. A sentence containing relevant references has been added to this paragraph to provide this context:

      “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”

      Below are the recommendations from each of the reviewers, some of which were not included as essential revisions, but that can also be helpful to further strengthen the manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      It is essential that the authors correct their selective, incomplete, and in places inappropriate references to work from others in the field. 

      Specific points: 

      (1) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex." 

      The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially. 

      (2) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work." 

      This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned 

      (3) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned. 

      (4) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling."  The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning. 

      (5) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. " 

      Incorrect, see Silberberg 2021. 

      (6) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures. 

      References have been added to address all of these points. See item 1) under Reviewing Editor’s Comments above.

      Other points: 

      (7) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). " 

      KcsA is not a member of the SKT superfamily. 

      KcsA has been removed from the sentence and a reference added to a review of the SKT family:

      “which also includes bona fide K<sup>+</sup> channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”

      (8) Page 9 " Our demonstration of coupled transport of NH4+ and Rb+ G232D not only confirms that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous."  Check grammar. 

      This sentence has been updated as follows:

      “Our observation that G232D is capable of coupled transport for NH<sub>4</sub><sup>+</sup and Rb<sup>+</sup> confirms not only that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous.

      Reviewer #2 (Recommendations for the authors): 

      (1) From an editorial point of view, I suggest a few changes to enhance readability and clarity for non-specialists. A description of the overall transport cycle at the start of the paper (perhaps as a supplementary figure) could help put the work into perspective for general readers who may not be familiar with P-type ATPase mechanisms. It is unclear what "single" and "double" occupancy refer to in the structural classes description. Why is only one structural class described in detail? I would suggest moving the discussion of what is going on with the Nterminus of KdpB to the Results section, where it is described, and shortening the corresponding paragraph in the Discussion. I would furthermore suggest adding a figure that illustrates the proposed regulatory role of the terminus and how phosphorylation might affect it. Otherwise, this section of the results reads very hollow. 

      A diagram showing the Post-Albers cycle is shown as part of Fig. 1 and is described at the end of the second paragraph. This sentence only mentioned KdpB, which may have caused confusion. We therefore changed the sentence to read as follows:

      “Like other P-type ATPases, KdpFABC employs the Post-Albers reaction cycle (Fig. 1) involving two main conformations (E1 and E2) and their phosphorylated states (E1~P and E2-P) to drive transport (Albers, 1967; Post et al., 1969).”

      Single and double occupancy was meant to refer to the number of KdpFABC complexes residing in a nanodisc. This can be seen in the class averages in Fig. 1 - figure supplement 2. The legends to Fig. 1 figure supplements 1 and 2 have been revised to explain this observation more explicitly:

      "Slight asymmetry of the main peak is consistent with a subpopulation of nanodiscs containing two KdpFABC complexes (Fig. 1 - figure supplement 2)."

      and

      "A subset of these particles were further classified to generate four main classes representing nanodiscs with a single copy of KdpFABC in either E1 or E2 conformations, nanodiscs with two copies of KdpFABC which were mainly E1 conformation, and junk."

      As stated above, the class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We continue to analyze the cryo-EM data and aim to produce a second manuscript that will include descriptions of other conformations together with the additional biophysical analysis related to their function.

      With regard to the N-terminus, we have gone on to generate a truncation of residues 2-9 in KdpB. After expression and purification, this construct remained coupled with ATPase and transport activities similar to WT, which makes proposals of a regulatory effect less compelling. Because of the novelty of observing the N-terminus and the possibility that it plays a subtle role in the kinetics of the cycle not revealed under the current assay conditions, we have retained a brief discussion of this structural observation, but moved it into the Results section as suggested.

      "Given the regulatory roles played by N- and C-termini of a variety of other P-type ATPases (Bitter et al., 2022; Cali et al., 2017; Lev et al., 2023; Timcenko et al., 2019; Zhao et al., 2021), we generated a construct in which residues 2-9 of the N-terminus of KdpB were truncated. However, ATPase and transport activities remained coupled at levels similar to WT, indicating that any functional role of the N-terminus is relatively subtle and not manifested under current assay conditions."

      (2) The wording "exceedingly strong densities" seems ambiguous. 

      We have changed this to “strong” in the Abstract and "exceptionally strong" in the Discussion. The precise values for these densities are shown in density histograms in Fig. 2 – figure supplement 1 and Fig. 5 – figure supplement 2. In the text, the densities are described as follows:

      Results sections describing the selectivity filter:

      "In fact, this S3 site contains the strongest densities in the entire map, measuring 7.9x higher than the threshold used for Fig. 2a (Fig. 2 – figure suppl. 1a)."

      Results section describing the CBS:

      "Given that this is the strongest density in KdpB, measuring 5.6x higher than the map densities shown in Fig. 5 (Fig. 5 – figure suppl 2b), we have modeled it as K<sup>+</sup>."

      (3) What are the different lines in Figure 1 - Supplement 1, panel G? 

      This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.

      (4) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2. 

      The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. We will consider citing an updated value in a future publication once this analysis is complete. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the Post-Albers cycle. Reference was made to the Silberberg 2022 paper, where a similar observation was made.

      (5) Panel 1d is called out of order after panel 1e. Please label Ser 162 in the panel. 

      The order of these panels have been switched and Ser162 has been labelled as suggested.

      (6) Several panels in Figure 1- Supplement 1 are neither referenced nor described. 

      This figure supplement is referred to multiple times in the Results and the Methods sections of the text as well as in the figure legends. Although each panel is not individually referenced, all of this information is relevant at different points in the manuscript and is explained in the legend.

      (7) Is the coordinating geometry for the S3 site consistent with what was previously observed for KcsA and relatives? 

      The general arrangement of carbonyl atoms in the S3 site is the same in KcsA and KdpA, described by the MacKinnon group as a square antiprism. However, KcsA has strict four-fold symmetry and KdpA does not. As a result, there are small discrepancies between the coordinating geometries in the two structures. This point was made graphically in our original report on the X-ray structure of KdpFABC (Huang et al. 2007, Extended Data Fig. 3), though the positions of the carbonyls are more accurately determined in the current structure due to increased resolution. We added a sentence to the Selectivity Filter section of the Results stating the following:

      "This coordination geometry is also consistent with that seen in the K<sup>+</sup> channel KcsA, though the strict four-fold symmetry of that homo-tetramer produces a more regular structure, as indicated by the smaller variance in liganding distance (2.77 Å with s.d. 0.075 Å in 1K4C) and as depicted by Huang et al. in Extended Data Fig. 3 (Huang et al., 2017)."

      (8) Label G232D in Figure 2a. 

      G232 is out of the plane shown in Fig. 2a. However, we have added a label for Cys344 to help identify the selectivity filter strands that are shown. Note, however, that G232 is visible and labeled in Fig. 2 - figure suppl. 1. This has now been noted in the legend for Fig. 2.

      (9) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 uµ in Figure 2e. 

      This was a typographical error. The text now states that Km for Q116E is <100 M.

      (10) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed. 

      The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.

      (11) Figure 3 would benefit from a slice through the protein to orient the viewer. 

      Thank you for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.

      (12) The differences between R493E, Q, and M do not appear to be significant. 

      The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001).  The associated text on pg. 6 has been slightly modified as follows:

      “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”

      Reviewer #3 (Recommendations for the authors): 

      Overall, the text was very clear, experiments were rationalized well, and conclusions were justified. A few small comments: 

      (1) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?) 

      These mutations are of historical importance, having been generated by random mutagenesis during early characterization of the Kdp system. A sentence containing relevant references has been added to this paragraph to provide this information as context:

      “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”

      (2) Typo: page 14, "diluted" 

      This typo has been corrected.

      (3) The Methods section for SSM electrophysiology could use some additional description of how the data/statistics were collected. How many replicates? Were all replicates from a single sensor/ were multiple sensors examined? Were controls done to test whether the same number of liposomes remain absorbed by the sensor over the length of the experiment? 

      We have extended our description of experimental protocols in the "Activity Assays" section of Methods. This includes the number and type of replicates as well as a discussion of binding currents that were seen for some mutants. Furthermore, a new series of supplementary figures for Figs. 2, 4, 5, and 6 show all of the raw traces for the SSME measurements (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1, Figure 5 - figure supplement 3, Figure 6 - figure supplement 2).

      We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review):

      (1)  The sharpening model of expectation can predict surround suppression. The authors could further clarify how the cancellation model predicts a monotonic profile of expectation (Figure 1C) with the highest response at the expected orientation, while the cancellation model suggests a suppression of neurons tuned toward the expected stimulus.

      We thank the reviewer for the comment. We would like to emphasize that as the expected signal is suppressed, the relative weight or salience of unexpected inputs increases. We have clarified this interpretation in the manuscript as follows:

      “Here, given these two mechanisms making opposite predictions about how expectation changes the neural responses of unexpected stimuli, thereby displaying different profiles of expectation, we speculated that if expectation operates by the sharpening model with suppressing unexpected information, we should observe an inhibitory zone surrounding the focus of expectation, and its profile then should display as a center-surround inhibition (Fig. 1c, left). If, however, expectation operates as suggested by the cancelation model with highlighting unexpected information, the inhibitory zone surrounding the focus of expectation should be eliminated, and the profile should instead display a monotonic gradient (Fig. 1c, right).”

      (2) I'm a bit concerned about whether the profile solely arises from modulation of expectation. The two auditory cues are each associated with a fixed orientation, which may be confounded by other cognitive processes like visual working memory or attention (which I think the authors also discussed). Although the authors tried to use SFD task to render orientation task-irrelevant, luminance edges (i.e., orientation) and spatial frequency in gratings are highly intertwined and orientation of the gratings may help recall the first grating's SF (fixed at 0.9 c/{degree sign}), especially given the first and second grating's orientations are not very different (4.8{degree sign}).

      We agree that dissociating expectation from attention and other top-down processes remains a key challenge in visual expectation research (see Summerfield & Egner, 2009; Summerfield & de Lange, 2014; de Lange et al., 2018). As is generally acknowledged, expectation reflects the probability of a sensory event, while selective attention relates to its behavioral relevance. To minimize attentional influences, our task design ensured that grating orientation was not taskrelevant: on each trial, participants discriminated either orientation or spatial frequency difference, such that orientation itself did not require attentional allocation, a point already discussed in the manuscript.

      Regarding visual working memory, we argue that even if participants recalled the first grating’s spatial frequency in the SFD task, they were not required to retain its precise spatial frequency (or orientation), as their task was simply to judge whether the second grating appeared denser or sparser. In other words, orientation (or spatial frequency) itself was not task-relevant. Moreover, although not included in the manuscript, we conducted a post-experiment debriefing in which participants were asked whether they noticed any association between the auditory tone and the grating orientation. None of the participants reported this relationship correctly, suggesting that the tone-orientation mapping remained implicit and was unlikely to be driven by strategic attention or memory.

      However, we acknowledge that certain confounding processes such as statistical learning or implicit mapping acquisition cannot be fully ruled out given the current paradigm. Future studies using methods with higher temporal resolution (e.g., EEG/MEG) may help to dissociate these mechanisms more precisely.

      (3) For each of the expected orientations (20{degree sign} or 70{degree sign}), the unexpected ones are linearly separable (i.e., all unexpected ones lie on one side of the expected angle). This might further encourage people to shift their attended or expected orientation, according to the optimal tuning hypothesis. Would this provide an alternative explanation to the tuning shift that the authors found?

      We thank the reviewer for pointing out the relevance of the optimal tuning hypothesis. We acknowledge that the optimal tuning theory (Navalpakkam & Itti, 2007) is an important framework, particularly in visual search paradigms, where attentional templates may shift away from non-target features to enhance discriminability.

      In our task, this hypothesis would predict a shift of expectation toward <20° in E20° trials and >70° in E70° trials, given that all unexpected orientations lie on one side of the expected angle. Importantly, the optimal tuning hypothesis predicts such shifts not only in Δ20°, Δ25°, and Δ30° trials but also in the Δ0° trials. In this regard, the observed shift in Δ20° and Δ30° (Experiment 2) and Δ25° (Experiment 3) trials is broadly consistent with the predictions of the optimal tuning account. However, we did not observe a corresponding shift away from nontarget features in the Δ0° condition, suggesting limited behavioral evidence for optimal tuning effects under our current task settings.

      It is important to note that most previous studies supporting optimal tuning (e.g., Navalpakkam & Itti, 2007; Scolari & Serences, 2009; Geng, DiQuattro, & Helm, 2017; Yu & Geng, 2019) have used visual search paradigms that differ from our design in several critical ways, including the number of stimuli presented, their spatial arrangement (eccentricity), task demands, and so on. Therefore, it is difficult to determine whether the optimal tuning hypothesis could serve as an alternative explanation within the context of our current study. We agree that future studies could further examine how such task parameters influence the presence or absence of optimal tuning.

      (4) It is great that the authors conducted computational modeling to elucidate the potential neuronal mechanisms of expectation. But I think the sharpening hypothesis (e.g., reviewed in de Lange, Heilbron & Kok, 2018) focuses on the neural population level, i.e., narrowing of population tuning profile, while the authors conducted the sharpening at the neuronal tuning level. However, the sharpening of population does not necessarily rely on the sharpening of individual neuronal tuning. For example, neuronal gain modulation can also account for such population sharpening. I think similar logic applies to the orientation adjustment experiment. The behavioral level shift does not necessarily suggest a similar shift at the neuronal level. I would recommend that the authors comment on this.

      We thank the reviewer for this to-the-point comment. As de Lange et al. (2018) noted, “there is not always a direct correspondence between neural-level and voxel-level selectivity patterns.” That is, neuronal tuning, population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying mechanisms and do not necessarily align in a one-toone fashion. We fully acknowledge that population-level tuning effects may also result from various neuronal mechanisms such as gain modulation (for review, see Salinas & Thier, 2000), shifts in preferred orientation (Ringach, et al., 1997; Jeyabalaratnam et al., 2013), asymmetric broadening of tuning curves (Schumacher et al., 2022), or tuning curve sharpening (Ringach, et al., 1997; Schoups et al., 2001).  

      In our modeling, we implemented sharpening and shifts of neuronal tuning curves as a conceptual model simplification, intended to explore potential mechanisms underlying expectation-related center-surround suppression effects. While sharpening-based accounts (e.g., Kok et al. 2012) have often been emphasized, we stress that other mechanisms, such as gain modulation or tuning shifts, may also contribute. Our goal is not to provide a definitive account, but to highlight such plausible mechanisms and encourage future investigation. We have revised the Discussion to emphasize that multiple mechanisms may underlie the observed effects.

      “We note that our implementation of sharpening and shifts at the neuronal level serves as a conceptual model simplification, as population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying neuronal mechanisms and do not necessarily align in a one-to-one fashion. Here, we stress that other potential mechanisms beyond sharpening, such as tuning shifts, may also contribute to visual expectation.” 

      (5) If the orientation adjustment experiment suggests that both sharpening and shifting are present at the same time, have the authors tried combining both in their computational model?

      We agree with the reviewer that it is necessary to consider the combined model. Accordingly, we implemented a computational model incorporating sharpening of the expected orientation channel together with shifting of the unexpected orientation channels. This model

      successfully captured the sharpening of the expected-orientation channel and the shift of the unexpectedorientation channels (Supplementary Fig. 3). For the expected orientation (Δ0°) , results showed that the amplitude change was significantly higher than zero on both OD (t(23) = 2.582, p = 0.017, Cohen’s d = 0.527) and SFD (t(23) = 2.078, p = 0.049, Cohen’s d = 0.424) tasks (Supplementary Fig. 3e, vertical stripes); the width change was significantly lower than zero on both OD (t(23) = -2.438, p = 0.023, Cohen’s d = 0.498) and SFD (t(23) = -2.578, p = 0.017, Cohen’s d = 0.526) tasks (Supplementary Fig. 3e, diagonal stripes). For unexpected orientations (Δ10°-Δ40°), however, the amplitude and width changes were not significant with zero on either OD (amplitude change: t(23) = 0.443, p = 0.662, Cohen’s d = 0.091; width change: t(23) = -1.819, p = 0.082, Cohen’s d = 0.371) or SFD (amplitude change: t(23) = 1.130, p = 0.270, Cohen’s d = 0.231; width change: t(23) = -1.710, p = 0.101, Cohen’s d = 0.349) tasks (Supplementary Fig. 3f). In the meantime, the location shift was significantly different than zero for unexpected orientations (Δ10°-Δ40°, OD task: t(23) = 3.611, p = 0.001, Cohen’s d = 0.737; SFD task: t(23) = 2.418, p = 0.024, Cohen’s d = 0.493 (Supplementary Fig. 3g). These results provided further evidence that tuning sharpening and tuning shift jointly contribute to center– surround inhibition in expectation.  

      Reviewer#1 (Recommendation for the Author):

      (1) A direct comparison between tasks (baseline vs. expectation conditions) would have strengthened the findings. Specifically, contrasting performance in the orientation discrimination task with the spatial frequency discrimination task could have provided clearer evidence that participants actually used the auditory cues to attend to the expected orientation. This comparison would be particularly important for validating cue manipulation in the orientation discrimination task.

      We agree that a direct comparison between the orientation discrimination (OD) and spatial frequency discrimination (SFD) tasks could further clarify how expectation (auditory cues) differentially modulates orientation relevance. However, the primary goal of the current study was to examine expectation effects within each task separately and to demonstrate that such effects are independent of attentional modulation driven by the task-relevance of orientation.

      In addition, the OD and SFD tasks differ not only in the relevant task features (orientation vs. spatial frequency discrimination), but also in stimulus properties and difficulty, for example, the arbitrary use of 20–70° as the orientation range and ~0.9 cycles/° as the spatial frequency setting, a direct comparison could introduce confounding factors unrelated to expectation.

      Importantly, Previous studies (e.g., Kok et al., 2012, 2017; Aitken et al., 2020) and our current results show that participants performed significantly better when the auditory cue matched the expected orientation, supporting the validity of our expectation manipulation.

      (2) An interesting consideration is why the center-surround inhibition profile of expectation was independent of the task-relevance of orientation. Previous studies (e.g., Kok et al., 2012) have found that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant. This could be useful to discuss the possible discrepancies.

      We thank the reviewer for this inspiring comment. Kok et al. (2012) showed that both orientation and contrast tasks elicited similar fMRI decoding results, regardless of task relevance, suggesting neural mechanisms of expectation operate independently of whether orientation is task relevant. Behaviorally, they reported better performance for expected versus unexpected trials in the orientation task (3.4° vs. 3.8°, t(17) = 2.8, p = 0.013), and a marginal trend (although not significant) in the contrast task (4.3% vs. 5.0%, t(17) = 1.9, p = 0.075). If any differences between the two tasks exist, they may lie in the correlation between behavioral and fMRI effects, a question that goes beyond the scope of the current study. Therefore, it is hard to strongly conclude that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant in their paper.

      Our study differs from theirs in at least two important ways, which may account for the clearer expectation facilitatory effect we observed in the expectation (Δ0°) condition. First, in our study, the orientation-irrelevant task involved spatial frequency discrimination (SFD) rather than contrast discrimination. Compared to contrast, spatial frequency has been shown to exhibit a clear cueing effect, as reported in Fang & Liu (2019). Second, our design included a baseline condition, which was absent in their study. We computed discrimination sensitivity (DS) to quantify how much the discrimination threshold (DT) changed relative to baseline. By using this baseline-referenced approach, we observed a significant facilitatory expectation effect in the Δ0° condition, an effect that shifted from marginal significance in their orientation-irrelevant task to clear significance in our study.

      (3) The authors might consider briefly explaining how the orientation adjustment paradigm used in this study is particularly effective for examining the potential co-existence of tuning sharpening and tuning shift computations, and how this approach complements traditional orientation discrimination tasks in characterizing expectation-related mechanisms.

      We thank the reviewer for this valuable suggestion. We agree that further clarification is needed to better connect the two experiments. To explain this, we have elaborated further in the manuscript.

      “To further explore the co-existence of both Tuning sharpening and Tuning shift computations in center-surround inhibition profile of expectation, participants were asked to perform a classic orientation adjustment experiment. Unlike profile experiment (discrimination tasks), the adjustment experiment provides a direct, trial-by-trial measure of participants’ perceived orientation, capturing the full distribution of responses. This enables the construction of orientation-specific tuning curves, allowing us to detect both tuning sharpening and tuning shifts, thereby offering a more nuanced understanding of the computational mechanisms underlying expectation.”

      (4) These interesting findings raise important questions about their relationship to existing hybrid models of attentional modulation. Could the authors discuss how their results might align with or extend previous work demonstrating combined feature-similarity gain and surround suppression effects for orientation (e.g., Fang & Liu, 2019)? Could a hybrid model potentially provide a better account of these data than the pure surround suppression model?

      We thank the reviewer for this valuable comment. We agree that hybrid model should be mentioned in the manuscript and we have elaborated further in the Discussion.

      “For example, within the orientation space, the inhibitory zone was about 20°, 45°, and 54° for expectation evident here, feature-based attention[21], and visual perceptual learning[35], respectively; within the feature-based attention, it was about 30° and 45° in color [77] and motion direction [53] spaces, respectively These variations hint at the exciting possibility that the width of the inhibitory surround may flexibly adapt to stimulus context and task demands, ultimately facilitating our perception and behavior in a changing environment. This principle is consistent with the hybrid model of feature-based attention [53,54,75], where attention is deployed adaptively to prioritize task-relevant information through feature-similarity gain which filters out the most distinctive distractors, and surround suppression which inhibits similar and confusable ones, thereby jointly shaping the attentional tuning profile.”

      (5) On page 19, there appears to be a missing symbol in the description of the Tuning Sharpening model. The text states: 'the tuning width of each channel's tuning function is parameterized by ??', where the question marks seem to indicate a missing parameter symbol.

      We appreciate the reviewer’s careful attention. Yes, the "ơ" is missing, which was likely caused by a formatting issue. We have corrected it.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first show that their behavioral task has good internal consistency and external validity, showing that perceived control during the task was linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task led to reduced subjective stress, showing a direct impact of control on stress perception. However, this work has minor limitations due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies.

      Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and is particularly relevant to mental health interventions.

      We thank the reviewer for their clear and accurate summary of the findings. 

      Strengths:

      The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such an anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher difficulty in the task led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.

      We thank the reviewer for their positive comments.

      Weaknesses:

      One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and also after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases of perceived stress between relevant time points, but future work could improve upon this design by including more complete stress manipulations and measuring implicit physiological signs of stress.

      We thank the reviewer for urging us to expand on this point. The reviewer is right that stress was merely anticipatory and is in that sense different to the canonical TSST. However, there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). Importantly, and as the reviewer notes in relation to our study specifically, the anticipatory TSST had a significant impact on subjective stress in the expected direction demonstrating that it was effective at eliciting subjective stress. We have elaborated further on this in our manuscript (pages 8 and 9) as follows: 

      “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”

      Reviewer #2 (Public review):

      Summary:

      The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by the participants by changing the level of difficulty of a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online while simultaneously recording subjective ratings on perceived control, anxiety, and stress. In the second study, the authors used the wheel-stopping task to induce a high sense of controllability and test whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, like looking at pleasant videos.

      We thank the reviewer for their accurate summary.

      Strengths:

      (1) The authors validate a method to manipulate stress.

      (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.

      (3) The studies involved big sample sizes.

      We thank the reviewer for noting these positive aspects of our study. 

      Weaknesses:

      (1) The study was not preregistered.

      This is correct.

      (2) The control manipulation is conflated with task difficulty, and, therefore the reward rate. Although the authors acknowledge this limitation at the end of the discussion, it is a very important limitation, and its implications are not properly discussed. The discussion states that this is a common limitation with previous studies of control but omits that many studies have controlled for it using yoking.

      We agree that these are very important issues to consider in the interpretation of our findings. It is important to note, that while our task design does not separate these constructs, we are able to do so in our statistical analyses. For example, our measure of perceived difficulty was included in analyses assessing the fluctuations in stress and control in which subjective control still had a unique effect on the experience of stress over and above perceived difficulty, suggesting that subjective control explains variance in stress beyond what is accounted for by perceived difficulty. Similarly, we have also included additional analyses in which we include the win rate (i.e. percentage of trials won) as a covariate when assessing the relationship between subjective control, perceived difficulty and subjective stress, in which subjective control and perceived difficulty still uniquely predict subjective stress when controlling for win rate. This suggests that there is unique variance in subjective control, separate from perceived task difficulty and win rate that is relevant to stress. We have included these analyses (page 16 of manuscript) as follows:

      “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both  win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.”

      As well as expanding on this in the Discussion (pages 27 and 28) as follows:

      “While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses. Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”

      Further, in terms of the wider literature on these issues, we have added more to this point in our discussion, especially in relation to previous literature that also varies control by reward rate (e.g. Dorfman & Gershman, 2019, who use a reward rate of 80% in high control conditions and 50% in low control conditions). This can be found in the manuscript on page 27 as follows: 

      “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition).”

      (3) The methods are not always clear enough, and it is difficult to know whether all the manipulations are done within-subjects or some key manipulations are done between subjects.

      We have added more information in the methods section (page 8) clarifying withinsubject manipulations (WS task parameters) and between-subject manipulations (stressor intensity task, WS task version in Study 1, and WS task/video task in Study 2). Additionally, as recommended by Reviewer 1, we have provided more information in the methods section and Table S3 regarding the details of on-screen written feedback provided to participants after each trial of the WS Task.

      (4) The analysis of internal consistency is based on splitting the data into odd/even sliders. This choice of data parcellation may cause missed drifts in task performance due to learning, practice effects, or tiredness, thus potentially inflating internal consistency.

      We agree that this can indeed be an issue, though drift is likely to be present in any task including even in mood in resting-state (Jangraw et al., 2023). To respond to this specific point, we parcellated the timepoints into a 1<sup>st</sup>/2<sup>nd</sup> half split and report the ICC in the supplementary information. While values are lower, indeed likely due to systematic drifts in task performance as participants learn to perform the task (especially for Study 2 since the order of parameters were designed to get easier throughout the experiment), the ICC values are still high. Control sliders: Study 1 = 0.82, Study 2: = 0.68; Difficulty sliders: Study 1: = 0.84, Study 2 = 0.57; Stress sliders: Study 1 = 0.45, Study 2 = 0.71. As seen, the lowest ICC is for stress sliders in Study 1. This may be because the first 3 sliders (included in the 1<sup>st</sup> half split) were all related to the stress task (initial, post-stress, task, post-debrief) and the final 4 sliders (in the 2<sup>nd</sup> half split) were the three sliders during the WS task and shortly afterwards. 

      (5) Study 2 manipulates the effect of domain (win versus loss WS task), but the interaction of this factor with stressor intensity is not included in the analysis.

      We agree that this would be a valuable analysis to include. We have run additional analyses (section Sensitivity and Exploratory Analyses, pages 24 and 25), testing the interaction of Domain (win or loss) with stressor intensity (and time) when predicting the stress buffering and stress relief effects. This revealed no significant main effects of domain or interactions including domain, suggesting that domain did not impact the stress induction or relief differently depending on whether it was followed by the high or low stressor intensity condition. While the control by time interaction (our main effect of interest) still held for stress induction in this more complex model, the control by time interaction did not hold for the stress relief. However, this more complex model did not provide a better fit for the data, motivating us to continue to draw conclusions from the original model specification with domain as a covariate (rather than an interaction).

      We outline these analyses on page 24 of the manuscript, as follows:

      “Third, we included the interaction of domain with stressor intensity and with time, to test whether the win or loss domain in the WS task significantly impacted stress induction or stress relief differently depending on stressor intensity. There were no significant effects or interactions of domain (Table S14) for stress induction or stress relief, and the main effect of interest (the interaction between time and control) still held for the stress induction (β= 10.20, SE=4.99 p=.041, Table S14), though was no longer significant for the stress relief  (β= 6.72, SE=4.28, p=.117, Table S14). This more complex model did not significantly improve model fit (χ<sup>²</sup>(3)= 1.46, p=.691) compared to our original specification (with domain as a covariate rather than an interaction) and had slightly worse fit (higher AIC and BIC) than the original model (AIC = 5477.2 versus 5472.7, BIC = 5538.5 versus 5520.8).”

      This study will be of interest to psychologists and cognitive scientists interested in understanding how controllability and its subjective perception impact how people respond to stress exposure. Demonstrating that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consecutive to the performance of the WS task on the same day and its generalizability is not warranted.

      We thank the reviewer for this assessment and agree that we cannot assume these findings would generalise to more prolonged effects on stress responses.

      Reviewer #3 (Public review):

      Summary:

      This is an interesting investigation of the benefits of perceiving control and its impact on the subjective experience of stress. To assess a subjective sense of control, the authors introduce a novel wheel-stopping (WS) task where control is manipulated via size and speed to induce low and high control conditions. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further show that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a more typical stress buffering mechanism of watching neutral/calming videos.

      We agree with this accurate summary of our study. 

      Strengths:

      There are several strengths to the manuscript that can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test an important and significant question. Additionally, it is a well-powered investigation that allows for confidence in replicability and the ability to show both high internal consistency and high external validity with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature while also providing a significant contribution to the field with respect to understanding the benefits of perceiving control.

      We thank the reviewer for this positive assessment. 

      Weaknesses:

      There are also some questions that, if addressed, could help our readership.

      (1) A key manipulation was the high-intensity stressor (Anticipatory TSST signal), which was measured via subjective ratings recorded on a sliding scale at different intervals during testing. Typically, the TSST conducted in the lab is associated with increases in cortisol assessments and physiological responses (e.g., skin conductance and heart rate). The current study is limited to subjective measures of stress, given the online nature of the study. Since TSST online may also yield psychologically different results than in the lab (i.e., presumably in a comfortable environment, not facing a panel of judges), it would be helpful for the authors to briefly discuss how the subjective results compare with other examples from the literature (either online or in the lab). The question is whether the experienced stress was sufficiently stressful given that it was online and measured via subjective reports. The control condition (low intensity via reading recipes) is helpful, but the low-intensity stress does not seem to differ from baseline readings at the beginning of the experiment.

      We agree that it would be helpful to expand on this further. Similar to the comment made by Reviewer 1, we wish to point out that there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). We have elaborated further on this in our manuscript on pages 8 and 9 as follows:

      “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”

      (2) The neutral videos represent an important condition to contrast with WS, but it raises two questions. First, the conditions are quite different in terms of experience, and it is interesting to consider what another more active (but not controlled per se) condition would be in comparison to the WS performance. That is, there is no instrumental action during the neutral video viewing (even passive ratings about the video), and the active demands could be an important component of the ability to mitigate stress. Second, the subjective ratings of the stress of the neutral video appear equivalent to the win condition. Would it have been useful to have a high arousal video (akin to the loss condition) to test the idea that experience of control will buffer against stress? That way, the subjective stress experience of stress would start at equivalent points after WS3.

      We agree with the reviewer that this is an important issue to clarify. In our deliberations when designing this study, we considered that that any task with actionoutcome contingencies would have a degree of controllability. To better distinguish experiences of control (WS task) to an experience of no/neutral control (i.e., neither high nor low controllability), we decided to use a task in which no actions were required during the task itself. Importantly, however, there was an active demand and concentration was still required in order to perform the attention checks regarding the content of the videos and ratings of the videos. 

      Thank you for the suggestion of having a high arousal video condition. This would indeed be interesting to test how experiencing ‘neutral’ control and high(er) stress levels preceding the stressor task influences stress buffering and stress relief, and we have included this suggestion for future research in the discussion section (page 28) as below:

      “Another avenue for future research would be to test how control buffers against stress when compared to a neutral control scenario of higher stress levels, akin to the loss domain in the WS Task, given that participants found the video condition generally relaxing. However, given that we found no differences dependent on domain for the stress induction in the WS Task conditions, it is possible that different versions of a neutral control condition would not impact the stress induction.”

      (3) For the stress relief analysis, the authors included time points 2 and 3 (after the stressor and debrief) but not a baseline reading before stress. Given the potential baseline differences across conditions, can this decision be justified in the manuscript?

      We thank the reviewer for raising this. Regarding the stress relief analyses (timepoints 2 and 3) and not including timepoint 1 (after the WS/video task) stress in the model, we have added to the manuscript that there was no significant difference in stress ratings between the high control and neutral control (collapsed across stress and domain) at timepoint 1 (hence why we do not think it’s necessary to include in the stress relief model). Nevertheless, we have now included a sensitivity analysis to test the Timepoint*Control interaction of stress relief when including timepoint 1 stress as a covariate. The timepoint by control interaction still holds, suggesting that the initial stress level prior to the stress induction does not impact our results of interest. The details of this analysis are included in the Sensitivity and Exploratory Analyses section on page 24:

      “Although there were no significant differences between control groups in subjective stress immediately after the WS/video task (t(175.6)=1.17, p=.244), we included participants’ stress level after the WS/video task as a covariate in the stress relief analyses (Table S12). The results revealed a main effect of initial stress (β= 0.643, SE=0.040, p<.001, Table S12) on the stress relief after the stressor debrief. Compared to excluding initial stress as in the original analyses (Table 4), there was now no longer a main effect of domain (β= 0.236, SE=2.60, p=.093, Table S12), but the inference of all other effects remained the same. Importantly, there was still a significant time by control interaction (β= 9.65, SE=3.74, p=.010, Table S12) showing that the decrease in stress after the debrief was greater in the highly controllable WS condition than the neutral control video condition, even when accounting for the initial stress level.”

      (4) Is the increased control experience during the losses condition more valuable in mitigating experienced stress than the win condition?

      We agree that this would be helpful to clarify. To test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) to test for a Domain*Time interaction. This revealed no significant Domain*Time interaction, suggesting that the stress buffering or stress relief effect was not dependent on domain in the high control conditions. These analyses are outlined in the Sensitivity and Exploratory Analyses section on page 25:

      “Finally, to test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) for the stress induction and stress relief to test for an interaction of domain and time. For the stress induction, there was no significant two-way interaction of domain and time (β= -1.45, SE=4.80, p=.763), nor a significant three-way interaction of domain by time by stressor intensity (β= -3.96, SE=6.74, p=.557, Table S15), suggesting that there were no differences in the stress induction dependent on domain. Similarly for the stress relief, there was no significant two-way interaction of domain and time (β= -5.92, SE=4.42, p=.182), nor a significant three-way interaction of domain by time by stressor intensity interaction (β= 8.86, SE=6.21, p=.154, Table S15), suggesting that there were no differences in the stress relief dependent on the WS Task domain.

      (5) The subjective measure of control ("how in control do you feel right now") tends to follow a successful or failed attempt at the WS task. How much is the experience of control mediated by the degree of experienced success/schedule of reinforcement? Is it an assessment of control or, an evaluation of how well they are doing and/or resolution of uncertainty? An interesting paper by Cockburn et al. 2014 highlights the potential for positive prediction errors to enhance the desire for control.

      We thank the reviewer for this comment. Similar to comments regarding reward rate, our task does not allow us to fully separate control from success/reinforcement because of the manipulation of difficulty. However, we did undertake sensitivity analyses and the inclusion of overall win rate accounted for limited variance when predicting stress over and above subjective control and difficulty (page 16). 

      “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both  win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.” 

      (6) While the authors do a very good job in their inclusion and synthesis of the relevant literature, they could also amplify some discussion in specific areas. For example, operationalizing task controllability via task difficulty is an interesting approach. It would be useful to discuss their approach (along with any others in the literature that have used it) and compare it to other typically used paradigms measuring control via presence or absence of choice, as mentioned by the authors briefly in the introduction.

      We are delighted to expand on this particular point and have done so in the Discussion on page 27:

      “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition). While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses.” 

      (7) The paper is well-written. However, it would be useful to expand on Figure 1 to include a) separate figures for study 1 (currently not included) and 2, and b) a timeline that includes the measurements of subjective stress (incorporated in Figure 1). It would also be helpful to include Figure S4 in the manuscript.

      We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (now top panel within Figure 4). 

      Reviewer #1 (Recommendations for the authors):

      (1) Study 2 shows a greater decrease in subjective stress after the high-control task manipulation than after the pleasant video. One possible confound is whether the amount of time to complete the WS task and the video differ. It could be helpful to look at the average completion time for the WS task and compare that to the length of the videos. Alternatively, in future studies, control for this by dynamically adjusting the video play length to each participant based on how long they took to complete the WS task.

      This is an interesting suggestion. As a result, we have included the time taken as a covariate in the stress induction and stress relief analyses to ensure that any differences in time between the WS task and video task were not accounting for any of the stress induction or relief analyses. Controlling for the total time taken did not impact the stress induction or relief results. This is included in the Sensitivity and Exploratory Analyses section on page 24:

      “Our second sensitivity analyses was conducted because the experiment took longer to complete for the video condition (mean = 54.3 minutes, SD = 12.4 minutes) than the WS task condition (mean = 39.7 minutes, SD = 12.8 minutes, t(186.19)=-9.32, p<.001). We therefore included the total time (in ms) as a covariate in the stress induction and stress relief analyses for Study 2. This showed that accounting for total time did not change the results of interest (Table S13), further highlighting that the time by control interactions were robust.”

      (2) Because participants received feedback about their success/failure in the WS task, a confounding factor could be that they received positive feedback on highly controllable trials and negative feedback on low control trials (and/or highly difficult trials). This would suggest that it is not controllability per se that contributes to stress perception but rather feedback valence. The authors show that this is a likely factor in their results in Study 2, which shows significant effects of the loss domain on perceived control and stress. Was a similar analysis done in Study 1? Do participants receive feedback in Study 1? It would be helpful to include this information somewhere in the manuscript. I would be curious to know whether *any* feedback at all influences controllability/stress perceptions.

      We thank the reviewer for this interesting suggestion. It is an interesting question as to whether feedback valence is related to stress in Study 1, and we have added this point to the Discussion on pages 27 and 28. To speak to this point, when we include the overall win rate (which captures the subsequent feedback received) when predicting subjective stress, win rate is not a significant predictor of stress over and above perceived difficulty and subjective control, suggesting that overall feedback valence may not be related to stress in Study 1. We take this as evidence that feedback may not be as important in terms of accounting for the relationship between stress and control. However, we unfortunately do not have any data in which there was no feedback provided to speak to this conclusively. This would be an interesting future study. The excerpt below is added to pages 27 and 28 of the discussion section:

      “Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”

      To respond specifically to the reviewer’s question about the feedback given to participants, written feedback was provided on screen to participants on a trial-bytrial basis also in Study 1 (i.e. for both studies), and we have provided more clarity about this in the manuscript on page 8 as well as providing additional details in Table S3:

      “After each trial, participants were shown written feedback on screen as to whether the segment had successfully stopped on the red zone (or not), and the associated reward (or lack of). See Table S3 for details.”

      (3) I'm not sure how to interpret the fact that in Figure S1, the BICs are all essentially the same. Does this mean that you don't really need all of these varying aspects of the task to achieve the same effects? Could the task be made simpler?

      The similarity of BIC values suggests that a simpler WS task would have produced a worse account of the data approximately in keeping with the extent to which it is a simpler model. Here, the BIC scores for the models are similar, suggesting that adding these parameters adds explanatory power in keeping with what would have been expected from adding a parameter, but not more. We do note that the BIC is a relatively strict and conservative comparison. The fact that the most complex model overall narrowly improves parsimony; combined with the interpretable parameter values and the prior expectations given the task setup led us to focus on this most complex model.  

      (4) A minor point, but the authors refer to their sample as "neurotypical." Were they assessed for prior/current psychopathology/medications? If not, I might use a different term here (perhaps "non-clinical sample"), since some prior work has shown that online samples actually have higher instances of psychopathology compared to community samples.

      We have changed the phrasing of ‘neurotypical’ to a ‘non-clinical sample’ as recommended.

      Reviewer #2 (Recommendations for the authors):

      Figure 4S is very informative and could be presented in the main text.

      We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (top panel of Figure 4). 

      References:

      Dorfman, H. M., & Gershman, S. J. (2019). Controllability governs the balance between Pavlovian and instrumental action selection. Nature Communications, 10(1), 5826. https://doi.org/10.1038/s41467-019-13737-7

      DuPont, C. M., Pressman, S. D., Reed, R. G., Manuck, S. B., Marsland, A. L., & Gianaros, P. J. (2022). An online Trier social stress paradigm to evoke affective and cardiovascular responses. Psychophysiology, 59(10), e14067. https://doi.org/10.1111/psyp.14067

      Jangraw, D. C., Keren, H., Sun, H., Bedder, R. L., Rutledge, R. B., Pereira, F., Thomas, A. G., Pine, D. S., Zheng, C., Nielson, D. M., & Stringaris, A. (2023). A highly replicable decline in mood during rest and simple tasks. Nature Human Behaviour, 7(4), 596–610. https://doi.org/10.1038/s41562-023-015197

      Meier, M., Haub, K., Schramm, M.-L., Hamma, M., Bentele, U. U., Dimitroff, S. J., Gärtner, R., Denk, B. F., Benz, A. B. E., Unternaehrer, E., & Pruessner, J. C. (2022). Validation of an online version of the trier social stress test in adult men and women. Psychoneuroendocrinology, 142, 105818. https://doi.org/10.1016/j.psyneuen.2022.105818

      Nasso, S., Vanderhasselt, M.-A., Demeyer, I., & De Raedt, R. (2019). Autonomic regulation in response to stress: The influence of anticipatory emotion regulation strategies and trait rumination. Emotion, 19(3), 443–454. https://doi.org/10.1037/emo0000448

      Schlatter, S., Schmidt, L., Lilot, M., Guillot, A., & Debarnot, U. (2021). Implementing biofeedback as a proactive coping strategy: Psychological and physiological effects on anticipatory stress. Behaviour Research and Therapy, 140, 103834. https://doi.org/10.1016/j.brat.2021.103834

      Steinbeis, N., Engert, V., Linz, R., & Singer, T. (2015). The effects of stress and affiliation on social decision-making: Investigating the tend-and-befriend pattern. Psychoneuroendocrinology, 62, 138–148. https://doi.org/10.1016/j.psyneuen.2015.08.003

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation. 

      Strengths: 

      The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology. 

      We thank the reviewer for appreciating the novelty of our study in exploring IRG1 evolution.  

      Weaknesses: 

      The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed. 

      Regarding submitochondrial localization of IRG1, we want to draw attention to the published data that a protease protection assay for wild-type mammalian IRG1 has been performed by Lian et al. 2023 (Extended Data Fig. 4), which convincingly demonstrated an outer-mitochondrial membrane localization of endogenous mouse IRG1 in mouse DC2.4 cells upon LPS stimulation that induces IRG1 expression. 

      Regarding complementary microscopy evidence, the same paper performed two-color,  DNA-paint super-resolution imaging to demonstrate an enrichment of IRG1 to mitochondria with a lack of co-localization of the inner membrane/matrix marker Cox IV. 

      Given the direct visualization of sub-mitochondrial localization, we consider applying super-resolution microscopy to revisit the sub-mitochondrial localization of di[erent IRG1 constructs in the study.   

      Reference:

      Lian H, Park D, Chen M, Schueder F, Lara-Tejero M, Liu J, Galán JE. Parkinson's disease kinase LRRK2 coordinates a cell-intrinsic itaconate-dependent defence pathway against intracellular Salmonella. Nat Microbiol. 2023 Oct;8(10):1880-1895. doi: 10.1038/s41564-023-01459-y. Epub 2023 Aug 28. PMID: 37640963; PMCID: PMC10962312.

      Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory e[ect on SDH. 

      We thank the excellent point raised by the reviewer. Indeed, itaconate has been proposed to inhibit matrix SDH exhibiting anti-inflammation function (Lampropoulou, Cell Metab 2016). While the mitochondrial transport of itaconate has not been fully characterized in vivo or in cells, a specific itaconate transport activity has been shown for the mitochondrial 2-oxoglutarate transporter OGC using in vitro proteoliposome system (Mills et al. Nature 2018). 

      We plan to discuss this important point on mitochondrial itaconate transport in the revision. 

      Reference: 

      Lampropoulou V, Sergushichev A, Bambouskova M, Nair S, Vincent EE, Loginicheva E, Cervantes-Barragan L, Ma X, Huang SC, Griss T, Weinheimer CJ, Khader S, Randolph GJ, Pearce EJ, Jones RG, Diwan A, Diamond MS, Artyomov MN. Itaconate Links Inhibition of Succinate Dehydrogenase with Macrophage Metabolic Remodeling and Regulation of Inflammation. Cell Metab. 2016 Jul 12;24(1):158-66. doi: 10.1016/j.cmet.2016.06.004. Epub 2016 Jun 30. PMID: 27374498; PMCID: PMC5108454.  

      Mills EL, Ryan DG, Prag HA, Dikovskaya D, Menon D, Zaslona Z, Jedrychowski MP, Costa ASH, Higgins M, Hams E, Szpyt J, Runtsch MC, King MS, McGouran JF, Fischer R, Kessler BM, McGettrick AF, Hughes MM, Carroll RG, Booty LM, Knatko EV, Meakin PJ, Ashford MLJ, Modis LK, Brunori G, Sévin DC, Fallon PG, Caldwell ST, Kunji ERS, Chouchani ET, Frezza C, Dinkova-Kostova AT, Hartley RC, Murphy MP, O'Neill LA. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. Nature. 2018 Apr 5;556(7699):113117. doi: 10.1038/nature25986. Epub 2018 Mar 28. PMID: 29590092; PMCID: PMC6047741.

      Reviewer #2 (Public review): 

      Summary: 

      The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-o[ between these two functions. 

      Strengths: 

      The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis. 

      Weaknesses: 

      The work concerning sub-mitochondrial localisation is confusing and needs better analysis. 

      We thank the reviewer for the constructive feedback. As in our response to reviewer 1, we want to draw attention to the published data in which the outer mitochondrial membrane localization of IRG1 has been demonstrated by protease protection assay and explored using super-resolution imaging by Lian et al. 2023 (Extended Data Fig. 4). Given the direct visualization of sub-mitochondrial localization by super-resolution imaging, we plan to revisit and to apply the method to di[erent IRG1 constructs used in the paper.

      Reviewer #3 (Public review): 

      Summary: 

      IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes. 

      The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1. 

      Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria. 

      Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity. 

      Strengths 

      (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field. 

      (2) Manuscript clearly written and understandable across disciplines. 

      (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function. 

      (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular e[ects of itaconate. 

      We thank the reviewer for the positive comments with the strengths.  

      Weaknesses: 

      (1) Biochemical and functional analysis of di[erent CAD mRNA and proteins lacks depth. 

      We plan to explore two types of experiments: 

      First, we plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in synthesize itaconate. The positive data will also answer question (3) below.

      Second, we plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1. The data will also address question (4) below. 

      (2) The submitochondrial localization assay lacks a native human IRG1 control. 

      As in our response to reviewer 1, we believe Lian et al. 2023. provided strong evidence supporting an outer mitochondrial membrane localization of wild-type endogenous, mouse IRG1. Given the direct visualization using suer-resolution imaging, we plan to revisit submitochondrial localization of di[erent IRG1 constructs using super-resolution imaging. 

      (3) CAD activity shown for IRG1L but not paleo-IRG1. 

      We plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in producing itaconate.

      (4) Itaconate production by early metazoans after PAMP stimulation? 

      We plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1.

      (5) No measurement of energy metabolism (trade-o[s?). 

      Because PAMP signaling might trigger other downstream e[ects that also impair mitochondrial function, for instance nitric oxide that inhibits complex IV, we plan to avoid PAMP condition and direct test the e[ect of itaconate production. We plan to compare the impact on mitochondrial bioenergetics, if the same CAD enzymes (thus with the same activity) can be expressed at the same level intra-mitochondrially and extramitochondrially, for instance in the case of MTS-hACOD1 and hACOD1.

    1. Author response:

      We thank the reviewers for their insightful comments on our manuscript. Here we briefly highlight our responses to several issues raised by reviewers, and also provide a summary of planned changes to be made with the next draft.

      Reviewer 1:

      (1) The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We also report model correlations for each model separately in Fig S2. We will clarify this in our revised manuscript.

      (2) We agree with the reviewer that including a context-free grammar model as a comparison would be informative. We will incorporate this in the revised manuscript.

      (3) The reviewer raises questions about the low correlation between behavioural and brain similarities. While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We will provide additional discussion of the relationship between behavioural judgements and brain data in the revised manuscript.

      (4) The reviewer suggests contrasting our models with a ‘semantic ground truth’, as in our design matrix shown in Fig 1. While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. In particular, sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.

      (5) In the revised draft we will modify the location of Fig. 5 so that it flows better with the text.

      (6) We agree that the discussion of the differences between brain regions could be expanded. We will include this in the revised version of our manuscript. The reviewer questions our inclusion of the simple-average and group-average RSA analysis as they show similar results. We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.

      (7) We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.

      Reviewer 2:

      (1) The reviewer argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs, however this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest. Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. Most of these comparisons will not be between minimal pairs, but we selected sentences so as to provide a range of semantic similarities (low to high), while also providing for semantic contrasts of theoretical interest (such as the ‘swapped’ and ‘substituted’ sentence pairs). We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies. We will add additional detail in the revised manuscript providing additional explanation for how stimuli were chosen, and contrasting this with minimal pair approaches.

      (2) The reviewer notes that low RSA correlations do not imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning. The reviewer also notes that transformer embeddings are highly anisotropic, however we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli.

      (3) The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.

      Reviewer 3:

      (1) The reviewer emphasises the need for nuance in our conclusions, given that some of the transformers achieve higher correlations when assessed over the full set of sentences. We agree with this comment, and will modify the discussion section in the revised manuscript to address this point. Having said that, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We will highlight this issue more clearly in our revised manuscript.

      (2) The reviewer notes that despite our existing controls, residual confounds of sentence length may remain. We agree that this is a potential issue, and will add discussion to the revised manuscript. We also will present further supplementary analyses which we believe indicate that sentence length effects do not drive our main results. At the same time, we believe the fact that our results are robust to simultaneously controlling for sentence length and the ‘minimum length effect’ (Fig. S5) indicates they are not primarily driven by sentence length effects.

      (3) The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. An alterative approach of first embedding a graph into a vector and then training an encoding model on the graph embeddings has a similar limitation of being dependent not just on the graph representation, but also on the way it was embedded into a vector and the way the encoding model was trained. Arguably this process is more opaque than similarity methods, since it is unclear to what extent the graph embeddings preserve the logic and properties of a graph-based representation. Further, it not clear whether there is any single method which can overcome the difficulty of comparing distinct formalisms for representing semantics. The reviewer also highlights how the correlations measured for the syntax model differ greatly depending on whether the Smatch or WWLK similarity metrics are used. We believe this highlights the need for careful examination of commonly used graph similarity metrics, as has been noted in previous research. We will include additional discussion of this issue in our revised manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.

      Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.

      Major Comments:

      (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.

      This is an important observation. We have already addressed this issue in Figures 3E-G and Supplementary Figure 4E-G, where we plotted the number of OFF-targets for each ON-target probe. If we select longer genes to ensure an equal number of designed probes with strong signals, we will still end up with the same number of ON-target probes. Consequently, Figures 3B-D and 3E-G would show similar trends, albeit with different values on the y-axis. Additionally, we will conduct an analysis using Stellaris at its highest probe design stringency setting to compare the software under its strictest design conditions. Additional experiments are outside the scope of the current manuscript.

      (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.

      Three biological replicates were utilized for the ARF4 experiments. As stated in the original submission, the average data from all three replicates is presented in Figure 4, while the data for each individual replicate can be found in Figure S5. Statistical analyses were conducted for both the pooled data in Figure 4 and the individual data in Figure S5. The results of all statistical calculations are detailed in Supplemental Table 1. We will update the text to clearly indicate the number of biological replicates and the outcomes of the statistical analysis.

      (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.

      Thank you for your suggestion. Currently, we lack the expertise in our lab to conduct such experiments, so they are beyond the scope of this manuscript. However, we will create additional supplementary figures to demonstrate that the likelihood of false positives is low, based on the assumption that current publicly available BLAST algorithms, genome annotations, and reference transcription expression data are accurate.

      We will include a comparison in our supplementary materials showing the off-target RNA that can bind the highest number of probes simultaneously for each software. Additionally, we will perform a correlation analysis to illustrate the relationship between spot intensity for different software and the number of probes they design. This will help us estimate how the number of probes bound to RNA correlates with expected spot intensity ranges.

      Using this information, along with autofluorescence background intensity measurements from no-probe controls, we will estimate the minimum number of probes that need to bind to targets to be detected as single spots. If this minimum is higher than the maximum number of simultaneous off-target probe bindings, we anticipate that the detected spot signal will primarily reflect ARF4 rather than other transcripts.

      (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.

      Thank you for pointing this out. We will revise the manuscript to clarify: "We did not include RNA secondary and tertiary structures in the model because the use of formamide in RNA-FISH experiments denatures these structures, allowing oligonucleotides to access the RNA."

      (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.

      Thank you for highlighting the literature that supports this limitation. We will include Buxbaum et al. (2014, Science) and additional studies that discuss how RNA-protein interactions can affect RNA-FISH experiments.

      Reviewer #2 (Public review):

      Summary:

      Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.

      We appreciate the reviewer’s thoughtful feedback. We will revise the text to ensure that all claims are backed by computational or experimental evidence. For claims that do not have supporting results, we will relocate them to the discussion section as potential future extensions. Since our probe design is open access, both we and the community can further develop our codes as needed.

      Strengths:

      (1) The problem is well-articulated in the abstract and the introduction.

      (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.

      (3) The authors compared multiple probe design software packages both computationally and experimentally.

      (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).

      (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.

      We like to thank the reviewer for pointing out the strength of the manuscript.

      Weaknesses:

      (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.

      Thank you for acknowledging the clarity of the abstract and introduction. We will revise the abstract to provide more specific details on how TrueProbes outperforms other software. Additionally, we will include specific computational and experimental metrics that demonstrate TrueProbes' improved performance compared to other software.

      (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.

      In Figure 3, we aim to convey two main points. 

      The first point is to compare the number of ON-target probes designed by each software using their most stringent design criteria (Figure 3A). Currently, we are using a medium strict design criterion for Stellaris (level 3). As shown in the new supplementary figure XX, when we apply the most stringent design criteria for Stellaris (level 5), the number of ON-target probes decreases to XX probes. This clearly indicates that, based on theoretical calculations, TrueProbes can design more probes than any of its competitors.

      The second point is to compare the number of OFF-targets produced by each probe design. To illustrate this, we used two different metrics. In Figures 3B-D, we compare the total number of probes bound to OFF-target RNA. However, since each software generates a different number of ON-target probes, the number of OFF-targets may vary simply due to the differences in ON-target probe counts. Therefore, we introduced a second metric to compare OFF-targets. In Figures 3E-G, we present the number of OFF-targets normalized by the number of ON-targets. Using this metric, TrueProbes shows the lowest number of OFF-targets. We will updat the manuscript to clarify this point.

      Regarding the experiments and their comparison to theoretical calculations: The theoretical calculations consider only the reference DNA and RNA genomes along with the oligonucleotide sequences for the probes. We then use a thermodynamic model to identify ON- and OFF-targets. Thus, these theoretical calculations represent an upper bound on the maximum possible number of ON-targets and the minimum number of OFF-targets. All other design software evaluated in this manuscript relies on the same or less reference data and makes certain assumptions. None of these methods quantitatively compare their computational designs with experimental results; they simply design probes based on unverified assumptions, conduct experiments, and present spot data to conclude that their probe designs are effective.

      We will update the manuscript to clarify the goals of the theoretical model and its relationship to the experiments. Future work will be necessary to enhance our theoretical model to fully account for additional aspects of RNA-FISH experiments (e.g., formaldehyde crosslinking, hybridization conditions, washing steps) to better predict the experimental data shown in Figure 4. We will also adjuste our claims to accurately reflect the current capabilities of our theoretical framework and its relation to experimental outcomes.

      (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.

      The predictions in Figure 3 regarding the number of probe off-target binding events, based on reference gene expression data, do not necessarily encompass all the information required to predict RNA-FISH signal intensity. Therefore, these predictions should not be expected to translate directly into the experimental results shown in Figure 4, particularly concerning the background signal.

      While our software aims to minimize off-target probe binding, this does not automatically lead to a reduction in off-target background signal. Numerous other factors influence the spot background and overall signal-to-noise ratio (SNR) performance, beyond just probe-target binding interactions. Although we strive to minimize off-target background through probe binding, this approach is not designed to directly predict the SNR. Extending the computational analysis of probe binding dynamics to RNA-FISH signal intensity dynamics is beyond the scope of this study.

      We have revised our text to clearly separate computational results from experimental results into two distinct sections. We will use different terminology to describe the outcomes of computational performance versus experimental performance, reducing potential confusion between these two aspects. Additionally, we will clarify our conceptual overview in Figure 1 regarding traditional probe design limitations related to sensitivity and specificity. We will specify how the signal from the number of probes bound to ON-target RNA, relative to those bound to OFF-targets and cellular autofluorescence, translates—either linearly or non-linearly—into the signal-to-noise ratio.

      (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.

      Thank you for highlighting these valuable experiments. Currently, our lab lacks the expertise to generate tissue samples beyond culturing cells. Additionally, implementing a two-color probe design in tissues containing different cell types with unknown expression levels presents further challenges. Due to these limitations, designing and conducting two-color experiments in tissue samples is beyond the scope of the current manuscript, but we plan to pursue this in the future.

      (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.

      Thank you for bringing this to our attention. TrueProbes is currently designed and tested specifically for primary smRNA-FISH probes. Our focus is on demonstrating a new approach to designing these probes without the added complexities of secondary probes and multiplexing. Future work will expand on this foundation to incorporate secondary probe detection and transcript multiplexing.

      (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.

      The supplemental information of the article will be updated to include figures that illustrate predictions for each capability currently offered by TrueProbes, along with the scripts used to generate these predictions. Any capabilities that do not have corresponding scripts will be removed from this section and instead referred to as potential improvements or future additions to the TrueProbes framework in the discussion section.

      (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?

      Thermodynamic equations are calculated at steady state because RNA-FISH hybridization reactions typically last from eight to twenty hours. This duration allows probes adequate time to localize to their targets and reach binding equilibrium, based on current estimates of DNA oligonucleotide association and dissociation rate constants. We will address the potential violation of the well-mixed assumption in the assumptions and limitations section, specifically discussing how RNA localization can affect the spatial distribution of both on-target and off-target probes within cells, which may disrupt the well-mixed condition.

      Low molecule numbers are not a significant concern, as probe DNA oligonucleotide concentrations in RNA-FISH protocols are much higher than the number of transcripts present in cells, by several orders of magnitude.

      The assumptions and limitations section will be revised to clearly state: “Probe hybridization reactions were computed at steady state because most RNA-FISH protocols utilize probe hybridization incubation steps lasting over eight hours, which should provide sufficient time to reach equilibrium based on current estimates of forward and reverse reaction rate constants. Predictions from the equilibrium model may be less accurate for RNA-FISH experiments with shorter hybridization times, where non-steady state dynamics can result in different transient outcomes depending on the duration of hybridization.”

      Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.

      Strengths:

      A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.

      Weaknesses:

      However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.

      We will revise our text to make our claims more specific and clearer, avoiding overgeneralizations and ensuring that all claims are adequately supported by the data we present.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      We thank the Reviewer for their positive assessment.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, this manuscript aimed to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABA<sub>A</sub> receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. This rationale will be incorporated into the revised manuscript, which will also make reference to the need to identify the relative contributions of the various neuronal populations within the IL. 

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript will provide the trial-by-trial performance during the post-extinction retrieval tests and discuss this issue.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. However, we acknowledge that the unexpected interactions deserve further discussion, and this will be incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript.  

      (4) Incomplete presentation of conditioning data.

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      We apologize, as we incorrectly labeled the X axis for the backward conditioning data set in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error will be corrected in the revised manuscript.

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      As noted above, the revised manuscript will show that the interpretations of the main findings stand whether ore the test data confounds retrieval with additional extinction learning. The revised manuscript will also clarify the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this will also be incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that, as we will acknowledge, is likely to engage several neuronal populations within the IL. Adequate statements on these matters will be included in the revised manuscript.

      Impact:

      The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.

      As noted in our responses, the interpretations of the data presented remain identical whether the test data conflate extinction retrieval with additional extinction learning or not. Although we agree that it is important to establish the role of specific IL neuronal populations in extinction learning, this was beyond the scope of the manuscript and the findings reported remain valuable to our understanding of inhibitory encoding within the IL.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure.

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      As noted (see response to Reviewer 1), efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following post-mortem analyses. This will be made explicit in the revised manuscript. Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations will be addressed in the revised manuscript.

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      We apologize, as noted above, we incorrectly labeled the X axis for the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This will be clarified in the methods section of the revised manuscript. The labeling errors on the Figures will be corrected.

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This will be provided in the revised manuscript. Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.  

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This issue will be clarified and explained in the revised version of the manuscript.

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

      In line with the Reviewer’s suggestion (see also Reviewer 3), the revised manuscript will include a discussion of the broader implications of the findings regarding inhibitory brain circuitry and will acknowledge the need to further explore sex differences and IL functions.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation is required to justify the need for multiple days of backward training. This will be provided in the revised manuscript.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

      The revised manuscript will make an effort to condense the discussion and focus on broader implications for the literature.

      References

      Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692

      Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015

      Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509

      Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276

      Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001

      Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556. https://doi.org/10.1093/cercor/bhw322

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase. The trial number was chosen to ensure comparability with previous studies addressing similar research questions with similar designs (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We have now acknowledged this limitation and future direction in the revised manuscript.

      The paragraph reads as follows: “It is important to note that our study was designed in alignment with previous studies addressing similar questions (e.g., Colloca et al., 2010). Our primary aim was to directly compare placebo and nocebo effects in a within-subject design and assess their persistence of these effects one week following the first test session. One limitation of our approach is the relatively short duration of each session, which may have limited our ability to examine the trajectory of responses within a single session. Future studies could address this limitation by increasing the number of trials for a more comprehensive analysis.”

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify this point. Participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We have added this information to the revised version of the manuscript.

      The paragraph now reads as follows: “On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60 and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. Note that participants were informed that these pre-test stimuli were part of the recalibration and refamiliarization procedure conducted prior to the second test session.”

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We have addressed this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session.

      The paragraph now reads: “This asymmetry is noteworthy in and of itself because it occurred despite the equidistant stimulus calibration relative to the control condition prior to conditioning. It may be the result of different physiological effects of the stimuli over time or amplified learning in the nocebo condition, consistent with its heightened biological relevance, but it could also be a stronger effect of the verbal instructions in this condition.”

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that this is indeed unfortunate. However, we would like to point out that all analyses reported in the manuscript have been controlled for the VAS ratings in the conditioning session, i.e., potential effects of the conditioned placebo and nocebo stimuli. Moreover, we have now conducted additional analyses, presented here in our response to the reviewers, to demonstrate that this imbalance did not systematically bias the results. Importantly, the key findings observed during the test phase remain robust despite this issue.

      Specifically, when excluding these 25 participants from the analyses, the reported stronger nocebo compared to placebo effects in the test session on day 1 remain unchanged. Likewise, the comparison of placebo and nocebo effects between days 1 and 8 shows the same pattern when excluding the participants in question. The only exception is the interaction between effect (placebo vs nocebo) x session (day 1 vs day 8), which changed from a borderline significant result (p = .049) to insignificant (p = .24). However, post hoc tests continued to show the same pattern as originally reported: a significant reduction in the nocebo effect from day 1 to day 8 and no significant change in the placebo effect.

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the methodological rigor and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We thank the reviewer for pointing this out. We included a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for their helpful comment and agree that the Results section requires additional information that would typically be provided by the Methods section if it directly followed the Introduction. In response, we have moved the former Figure 4 from the Methods section to the beginning of the Results section as a new Figure 1, to improve clarity. Further, we have revised the Methods section to explicitly state that all trials during the conditioning phase were manipulated in the same way.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the authors are claiming (correctly) that there is only limited work comparing placebo/nocebo effects, there are some papers missing from their citations:

      Nocebo responses are stronger than placebo responses after subliminal pain conditioning - - Jensen, K., Kirsch, I., Odmalm, S., Kaptchuk, T. J. & Ingvar, M. Classical conditioning of analgesic and hyperalgesic pain responses without conscious awareness. Proc. Natl. Acad. Sci. USA 112, 7863-7 (2015)

      We thank the reviewer and have now included this relevant publication into the introduction of the revised manuscript.

      Hird, E.J., Charalambous, C., El-Deredy, W. et al. Boundary effects of expectation in human pain perception. Sci Rep 9, 9443 (2019). https://doi.org/10.1038/s41598-019-45811-x

      We thank the reviewer for suggesting this relevant publication. We have now included it into the discussion of the revised manuscript by adding the following paragraph:

      “Recent work using a predictive coding framework further suggests that nocebo effects may be less susceptible to prediction error than placebo effects (Hird et al., 2019), which could contribute to their greater persistence and strength in our study.”

      (2) The trial-by-trial pain ratings could have been usefully modelled with a computational model, such as a Bayesian model (this is especially pertinent given the reference to Bayesian processing in the discussion). A multilevel model could also be used to increase the power of the analysis. This is a tentative suggestion, as I appreciate it would require a significant investment of time and work - alternatively, the authors could acknowledge it in the Discussion as a useful future avenue for investigation, if this is preferred.

      We thank the reviewer for this thoughtful suggestion. While we agree that computational modelling approaches could provide valuable insights into individual learning, our study was not designed with this in mind and the relatively small number of trials per condition and the absence of trial-by-trial expectancy ratings limit the applicability of such models. We have therefore chosen not to pursue such analysis but highlight it in the discussion as a promising direction for future research.

      “Notably, the most recent experience was the most predictive in all three analyses; for instance, the placebo effect on day 8 was predicted by the placebo effect on day 1, not by the initial conditioning. This finding supports the Bayesian inference framework, where recent experiences are weighted more heavily in the process of model updating because they are more likely to reflect the current state of the environment, providing the most relevant and immediate information needed to guide future actions and predictions24. Interestingly, while a change in pain predicted subsequent nocebo effects, it seemed less influential than for placebo effects. This aligns with findings that longer conditioning enhanced placebo effects, while it did not affect nocebo responses10 and the conclusion that nocebo instruction may be sufficient to trigger nocebo responses. Using Bayesian modeling, future studies could identify individual differences in the development of placebo and nocebo effects by integrating prior experiences and sensory inputs, providing a probabilistic framework for understanding the underlying mechanisms.”

      (3) The paper is missing any justification of sample size, i.e. power analysis - please include this.

      We apologize for the missing information on our a priori power analysis. As there is a lack of prior studies investigating within-subjects comparisons of placebo and nocebo effects that could inform precise effect size estimates for our research question, we based our calculation on the ability detect small effects. Specifically, the study was powered to detect effect sizes in the range of d = 0.2 - 0.25 with α = .05 and power = .9, yielding a required sample size of N = 83-129. We have now added this information to the methods section of the revised manuscript.

      (4) "On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60 and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation."

      What were the instructions about this? Was it before the electrode was applied? This runs the risk of unblinding participants, as they only expect to feel changes in stimulus intensity due to the TENS stimulation.

      We thank the reviewer for pointing out the potential risk of unblinding participants due to the re-familiarization process prior to the second test session. We would like to clarify that we followed specific procedures to prevent participants from associating this process with the experimental manipulation. The re-familiarisation with the thermal stimuli was conducted after the electrode had been applied and re-tested to ensure that both stimulus modalities were re-introduced in a consistent and neutral context. Participants were explicitly informed that both procedures were standard checks prior to the actual test session (“We will check both once again before we begin the actual measurement.”). For the thermal stimuli, we informed participants that they would experience three different intensities to allow the skin to acclimate (e.g., “...we will test the heat stimuli in 3 trials with different temperatures, allowing your skin to acclimate to the stimuli. …”), without implying any connection to the experimental conditions.

      Importantly, this re-familiarization procedure mirrored what participants had already experienced during the initial calibration session on day 1. We therefore assume that participants interpreted as a routine technical step rather than part of the experimental manipulation. We have now clarified this procedure in the methods section of the revised manuscript.

      (5) "For a comparison of pain intensity ratings between time-points, an ANOVA with the within-subject factors Condition (placebo, nocebo, control) and Session (day 1, day 8) was carried out. For the comparison of placebo and nocebo effects between the two test days, an ANOVA with the with-subject factors Effect (placebo effect, nocebo effect) and Session (day 1, day 8) was used."

      It seems that one ANOVA is looking at raw pain scores and one is looking at difference scores, but this is a bit confusing - please rephrase/clarify this, and explain why it is useful to include both.

      We thank the reviewer for highlighting this point. Our primary analyses focus on placebo and nocebo effects, which we define as the difference in pain intensity ratings between the control and the placebo condition (placebo effect) and the nocebo and the control condition (nocebo effect), respectively.

      To examine whether condition effects were present at each time-point, we first conducted two separate repeated measures ANOVAs - one for day 1 and one for day 8 - with the within-subject factor CONDITION (placebo, nocebo, control).

      To compare the magnitude and persistence of placebo and nocebo effects over time, we then calculated the above-mentioned difference scores and submitted these to a second ANOVA with within-subject factors EFFECT (placebo vs. nocebo effect) and SESSION (day 1 vs. day 8). We have now clarified this approach on page 19 of the revised manuscript. To avoid confusion, the Condition x Session ANOVA has been removed from the manuscript.

      (6) Please can the authors provide a figure illustrating trial-by-trial ratings during test trials as well as during conditioning trials?

      In response to the reviewer’s point, we now provide the trial-by-trial ratings of the test phases on days 1 and 8 as an additional figure in the Supplement (Figure S1) and would like to clarify that trial-by-trial pain intensity ratings of the conditioning phase are displayed in Figure 2C of the manuscript,

      (7) "Separate multiple linear regression analyses were performed to examine the influence of expectations (GEEE ratings) and experienced effects (VAS ratings) on subsequent placebo and nocebo effects. For day 1, the placebo effect was entered as the dependent variable and the following variables as potential predictors: (i) expected improvement with placebo before conditioning, (ii) placebo effect during conditioning and (iii) the expected improvement with placebo before the test session at day 1"

      The term "placebo effect during conditioning" is a bit confusing - I believe this is just the effect of varying stimulus intensities - please could the authors be more explicit on the terminology they use to describe this? NB changes in pain rating during the conditioning trials do not count as a placebo/nocebo effect, as most of the change in rating will reflect differences in stimulation intensity.

      We agree with the reviewer that the cited paragraph refers to the actual application of lower or higher pain stimuli during the conditioning session, rather than genuinely induced placebo or nocebo effect. We thank the reviewer for this helpful observation and have revised the terminology, accordingly, now referring to these as “pain relief during conditioning” and “pain worsening during conditioning”.

      (8) Supplementary materials: "The three temperature levels were perceived as significantly different (VAS ratings; placebo condition: M= 32.90, SD= 16.17; nocebo condition: M= 56.62, SD= 17.09; control condition: M= 80.84, SD= 12.18"

      This suggests that the VAS rating for the control condition was higher than for the nocebo condition. Please could the authors clarify/correct this?

      We thank the reviewer for spotting this error. The values for the control and the nocebo condition had accidentally been swapped. This has now been corrected in the manuscript: control condition: M= 56.62, SD= 17.09; nocebo condition: M= 80.84, SD= 12.18.

      (9) "To predict placebo responses a week later (VAScontrol - VASplacebo at day 8), the same independent variables were entered as for day 1 but with the following additional variables (i) the placebo effect at day 1 and (ii) the expected improvement with placebo before the test session at day 8."

      Here it would be much clearer to say 'pain ratings during test trials at day 1".

      We agree with the reviewer and have revised the manuscript as suggested.

      (10) For completeness, please present the pain intensity ratings during conditioning as well as calibration/test trials in the figure.

      Please see our answer to comment (6).

      (11) In Figure 1a, it looks like some participants had rated the control condition as zero by day 8. If so, it's inappropriate to include these participants in the analysis if they are not responding to the stimulus. Were these the participants who were excluded due to pain insensitivity?

      On day 8, the lowest pain intensity ratings observed were VAS 3 in the placebo condition and VAS 2 in the control condition, both from the same participant. All other participants reported minimum values of VAS 11 or higher (all on a scale from 0-100). Thus, no participant provided a pain rating of VAS 0, and all ratings indicated some level of pain perception in response to the stimulus. We did not define an exclusion criterion based on day 8 pain ratings in our preregistration, and we did not observe any technical issues with the stimulation procedure. To avoid post-hoc exclusions and maintain consistency with our preregistered analysis plan, we therefore decided to include all participants in the analysis.

      (12) "Comparison of day 1 and day 8. A direct comparison of placebo and nocebo effects on day 1 and day 8 pain intensity ratings showed a main effect of Effect with a stronger nocebo effect (F(1,97)= 53.93, 131 p< .001, η2= .36) but no main effect of Day (F(1,97)= 2.94, p= .089, η2 = .029). The significant Effect x Session interaction indicated that the placebo effect and the nocebo effect developed differently over time (F(1,97)= 3.98, p= .049, η2 = .039)"

      This is confusing as it talks about a main effect of "day" and then interaction with "session" - are they two different models? The authors need to clarify.

      We thank the reviewer for pointing this out. In our analysis, “Session” is the correct term for the experimental factor, which has two factor levels, “day 1” and “day 8”. This has now been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) More information on how "size of the effect" in Figures 1b and 2b was calculated is needed; this can be in the legend. If these are differences between control and each condition, then they were reversed for one condition (nocebo?), which is ok - but this should be clearly explained.

      We agree with the reviewer and have now revised the figure legends to improve clarity. The legends now read:

      1b: “Figure 1. Pain intensity ratings and placebo and nocebo effects during calibration and test sessions. (A) Mean pain intensity ratings in the placebo, nocebo and control condition during calibration, and during the test sessions at day 1 and day 8. (B) Placebo effect (control condition - placebo condition, i.e., positive value of difference) and nocebo effect (nocebo condition - control condition, i.e., positive value of difference) on day 1 and day 8. Error bars indicate the standard error of the mean, circles indicate mean ratings of individual participants. *: p < .001, : p < .01, n.s.: non-significant.”

      2b: “Figure 2. Mean and trial-by-trial pain intensity ratings, placebo and nocebo effects during conditioning. (A) Mean pain intensity ratings of the placebo, nocebo and control condition during conditioning. (B) Placebo effect (control condition - placebo condition, i.e., positive value of difference) and nocebo effect (nocebo condition - control condition, i.e., positive value of difference) during conditioning. (C) Trial-by-trial pain intensity ratings (with confidence intervals) during conditioning. Error bars indicate the standard error of the mean, circles indicate mean ratings of individual participants. ***: p < .001.”

      (2) In the methods, I was missing a clear understanding of how many trials there were in the conditioning phase, and then how many in the other testing phases. Also, how long did the experiment last in total?

      We apologize that the exact number of trials in the testing phases was not clear in the original manuscript. We now indicate on page 18 of the revised manuscript that we used 10 trials per condition in the test sessions. We have also added information on the duration of each test day (i.e., three hours on day 1 and one hour on day 8) on page 15.

      (3) In expectancy ratings, line 186 - are improvement and worsening expectations different from expected pain relief? It is implied that these are two different constructs - it would be helpful to clarify that.

      We agree that this is indeed confusing and would like to clarify that both refer to the same construct. We used the Generic rating scale for previous treatment experiences, treatment expectations, and treatment effects (GEEE questionnaire, Rief et al. 2021) that discriminates between expected symptom improvement, expected symptom worsening, and expected side effects due to a treatment. We now use the terms “expected pain relief” and “expected pain worsening” throughout the whole manuscript.

      (4) In the last section of the Results, somatosensory amplification comes out of nowhere - and could be better introduced (see point 2 above).

      We agree with the reviewer that introducing the concept of somatosensory amplification and its potential link to placebo/nocebo effects only in the Methods is unhelpful, given that this section appears at the end of the manuscript. We therefore now introduce the relevant publication (Doering et al., 2015) before reporting our findings on this concept.

      (5) In line 169, if the authors want to specify what portion of the variance was explained by expectancy, they could conduct a hierarchical regression, where they first look at R2 without the expectancy entered, and only then enter it to obtain the R2 change.

      We fully agree that hierarchical regression can be a useful approach for isolating the contribution of variables. However, in our case, expectancy was assessed at different time points (e.g., before conditioning and before the test session on day 1), and there was no principled rationale for determining the order in which these different expectancy-related variables should be entered into a hierarchical model.

      That said, in response to the reviewer’s suggestion, we have now conducted hierarchical regression analyses in which all expectancy-related variables were entered together as a single block (see below). These analyses largely confirmed the findings reported so far and are provided here in the response to the reviewers below. Given the exploratory nature of this grouping and the lack of an a priori hierarchy, we feel that the standard multiple regression models remain the most appropriate for addressing our research question because it allows us to evaluate the total contribution of expectancy-related predictors while also examining the individual contribution of each variable within the block. We would therefore prefer to retain these as the primary analyses in the manuscript.

      Results of the hierarchical regression analyses:

      Day 1 - Placebo response: In step 1, we entered the difference in pain intensity ratings between the control and the placebo condition during conditioning as a predictor. In step 2, we added the two variables reflecting expectations (i.e., expected improvement with placebo (i) before conditioning and (ii) before the test session on day 1). This allowed us to assess whether expectation-related variables explained additional variance beyond the effect of conditioning.

      The overall regression model at step 1 was significant, F(1, 102) = 13.42, p < .001, explaining 11.6% of the variance in the dependent variable (R<sup>2</sup> = .116). Adding the expectancy-related predictors in step 2 did not lead to a significant increase in explained variance, ΔR<sup>2</sup> = .007, F(2, 100) = 0.384, p = .682. Thus, the conditioning response significantly predicted placebo-related pain reduction on day 1, but additional information on expectations did not account for further variance.

      Day 1 - Nocebo response: The equivalent analysis was run for the nocebo response on day 1. In step 1, the pain intensity difference between the nocebo and the control condition was entered as a predictor before adding the two expectancy ratings (i.e., expected worsening with nocebo (i) before conditioning and (ii) before the test session on day 1).

      In step 1, the regression model was not statistically significant, F(1, 102) = 2.63, p = .108, and explained only 2.5% of the variance in nocebo response (R<sup>2</sup> = .025). Adding the expectation-related predictors in Step 2 slightly increased the explained variance by ΔR<sup>2</sup> = .027, but this change was also non-significant, F(2, 100) = 1.41, p = .250. The overall variance explained by the full model remained low (R<sup>2</sup> = .052). These results suggest that neither conditioning nor expectation-related variables reliably predicted nocebo-related pain increases on day 1.

      Day 8 - Placebo response: For the prediction of the placebo effect on day 8, the following variables reflecting perceived effects were entered as predictors in step 1: the difference in pain intensity ratings between the control and the placebo condition (i) during conditioning and (ii) on day 1. In step 2, the variables reflecting expectations were added: the expected improvement with placebo (i) before conditioning, (ii) before the test session on day 1 and (iii) before the test session on day 8.

      In step 1, the model was statistically significant, F(3, 95) = 14.86, p < .001, explaining 23.8% of the variance in the placebo response (R<sup>2</sup> = .238, Adjusted R<sup>2</sup> = .222). In step 2, the addition of the expectation-related predictors resulted in a non-significant improvement in model fit, ΔR<sup>2</sup> = .051, F(3, 92) = 2.21, p = .092. The overall variance explained by the full model increased modestly to 29.0%.

      Day 8 - Nocebo response: For the equivalent analyses of nocebo responses on day 8, the following variables were included in step 1: the difference in pain intensity ratings between the nocebo and the control condition (i) during conditioning and (ii) on day 1. In step 2, we entered the variables reflecting nocebo expectations including expected worsening with nocebo (i) before conditioning, (ii) before the test session on day 1 and (iii) before the test session on day 8. In step 1, the model significantly predicted the day 8 nocebo response, F(3, 95) = 6.04, p = .003, accounting for 11.3% of the variance (R<sup>2</sup> = .113, Adjusted R<sup>2</sup> = .094). However, the addition of expectation-related predictors in Step 2 resulted in only a negligible and non-significant improvement, ΔR<sup>2</sup> = .006, F(3, 92) = 0.215, p = .886. The full model explained just 11.9% of the variance (R<sup>2</sup> = .119).

      Typos:

      (6) Abstract - 104 heathy xxx (word missing).

      (7) Line 61 - reduce or decrease - I think you meant increase.

      Thank you, we have now corrected both sentences.

      References

      Colloca L, Petrovic P, Wager TD, Ingvar M, Benedetti F. How the number of learning trials affects placebo and nocebo responses. Pain. 2010

      Doering BK, Nestoriuc Y, Barsky AJ, Glaesmer H, Brähler E, Rief W. Is somatosensory amplification a risk factor for an increased report of side effects? Reference data from the German general population. J Psychosom Res. 2015

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      A comparison to VesselVio has now also been generated, and results are visualized in Supplementary Figure 11. VesselVio generates individual graphs for each time point, resulting in potential discrepancies in the structure of the graphs from different time points. Furthermore, Vesselvio uses distance transformation to estimate the vascular radius, which renders the vessel radius estimates highly susceptible to variation in the user selected methodology used to obtain segmentation results; while our approach uses intensity gradient-based boundary detection from centerlines in the image instead mitigating this bias. We have added the following paragraph to the Discussion section on the comparisons with the two methods:

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):

      The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm<sup>2</sup> photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm<sup>2</sup> photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm<sup>2</sup>; the 1.1 mW/mm<sup>2</sup> responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated below in Table 1.”

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. (Online). Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. I appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Weaknesses:

      (1) The broader significance of the findings needs to be better articulated. While the authors emphasize that comparing adaptive traits in sympatry and allopatry provides insights into selective processes shaping reproductive isolation and coexistence, it is unclear what key conceptual or theoretical questions are being addressed. Are these patterns expected under certain evolutionary scenarios? Have they been empirically demonstrated in other systems? The authors should explicitly state the overarching research question, incorporate some predictions, and better contextualize their findings within the existing literature. If the results challenge or support previous work, that should be highlighted to strengthen the study's importance in a broader context.

      We thank the reviewer for their valuable feedback. We understand that the framing of the results and the discussion may fail to convey the broader significance of our findings. In the first version of the manuscript, we framed our manuscript around the processes shaping reproductive isolation and co-existence in sympatry, but now realize that this question was too broad in regards to our results. We thus strictly focused on outlining the importance of ecological interactions in the evolution of traits in sympatric species. In the revised version of the manuscript, we rewrote the first paragraph of the introduction to introduce context regarding the effect of ecological interactions on trait evolution (lines 43-60). We then explicitly introduce the theoretical question investigated in our paper (i.e. “we investigate how ecological interactions in sympatry can constrain natural and sexual selection shaping trait evolution”, lines 62-63) and our predictions regarding the evolution of traits in sympatry vs. allopatry (lines 74-80). We also added predictions regarding our experiments on Morpho at the end of the introduction (lines 146-157). As a result, the discussion is now better aligned with the introduction, by discussing the putative effect of predation and mate choice on the evolution of wing iridescence in Morpho.

      (2) The motivation for studying visual signals and mate choice in allopatric populations (i.e., at the intraspecific level) is not well articulated, leaving their role in the broader narrative unclear. In particular, the rationale behind experiments 1, 2, and 3 is not well defined, as the authors have not made a strong case for the need for these intraspecific comparisons in the introduction. This issue is further compounded by the authors' primary focus on signal evolution in sympatry throughout both the results and the discussion. For instance, the divergence of iridescence in allopatry is a potentially interesting result. But the authors have not discussed its implications.

      We now clearly state in the introduction our motivation for studying visual signals and mate choice in allopatric populations (lines 74-80, lines 146-157). We argued that intraspecific comparisons help identify whether visual cues can be used in mate recognition between phylogenetically close subspecies, between whom visual resemblance is supposed to be higher than between closely-related species (tetrad experiment, and experiment 1). As M. h. bristowi and M. h. theodorus have different wing pattern, we also used this comparison to identify the traits involved in male mate preference within a species, testing the importance of iridescent color (experiment 2) or iridescent patterning (experiment 3). The results of those experiments can then be used to assess whether these traits are used in species recognition between sympatric species. See also our answers to recommendations 11 and 15 from reviewer #1.

      Overall, given that the primary conclusions are based on results and analyses in sympatry, the role of allopatric populations in shaping these conclusions needs to be better integrated and justified. Without a stronger link between the comparative framework and the study's key takeaways, the use of allopatric populations feels somewhat peripheral rather than central to the study's aim. Since the primary conclusions remain valid even without the allopatric comparisons, their inclusion requires a clearer rationale.

      To make a stronger case for the use of the allopatric population in our manuscript, we strengthened the justification behind the study of intraspecific allopatric populations vs. interspecific sympatric populations, as the iridescence measurements and the mate choice experiments in allopatric populations can serve as a baseline in studying how species interactions can shape the evolution of traits and mate recognition when compared to sympatric populations. Following your major comment #1, we rewrote the introduction to include a justification to the need for studying allopatric vs. sympatric populations (lines 74-80), and also further highlighted the need to study iridescence in sympatric species to fully understand the trait evolution of sympatric species in the discussion (339-343).

      (3) While the authors demonstrate that iridescence is indistinguishable to predators in sympatry, they overstate the role of predation in driving convergence. The present study does not experimentally demonstrate that iridescence in this species has a confusion effect or contributes to evasive mimicry. Alternatively, convergence could result from other selective forces, such as signal efficacy due to environmental conditions, rather than being solely driven by predation.

      We acknowledge that our study does not directly demonstrate that iridescence contributes to evasive mimicry. We did tone down the interpretation of the results in the discussion and state that predation is not the only selective pressure that could have promoted a convergent evolution of iridescence in sympatric species, as iridescence is a trait that could be involved in thermoregulation (lines 346-353) and camouflage (lines 363-369) for example. We made sure to mention that convergence in iridescent signals in sympatry is only an indirect support to the evasive mimicry hypothesis, and that further research is still needed, including direct predation experiments, to show that this convergence is indeed triggered by predation (lines 391-396).  

      Reviewer #2 (Public review):

      This study presents an investigation of the visual and chemical properties and mating behaviour in Morpho butterflies, aimed at addressing the nature of divergence between closely related species in sympatry. The study species consists of three subspecies of Morpho helenor (bristowi, theodorus, and helenor), and the conspecific Morpho achilles achilles. The authors postulate that whereas the iridescent blue signals of all (sub)species should function as a predator reduction signal (similar to aposematism) and therefore exhibit convergence, the same signals should indicate divergence if used as a mating signal, particularly in sympatric populations. They also assess chemical profiles among the species to assess the potential utility of scent in mediating species/sex discrimination.

      The authors first used reflectance spectrometry to calculate hue, brightness, and chroma, plus two measures of "iridescence" (perhaps better phrased as angular dependence) in each (sub)species. This indicated the ubiquitous presence of sexual dimorphism in brightness (males brighter), which also appears to be the case for iridescence (Figure 3A-B). Analysis of these data also indicated that whereas there is evidence for divergence among subspecies in allopatry, the same evidence is lacking for species in sympatry (P = 0.084). This was supported further by visual modelling, which showed that both conspecifics and birds should be (theoretically) capable of perceiving the colour difference among allopatric populations of M. helenor, whereas the same is not true for the sympatric species.

      The authors then conducted mate choice trials, first using live individuals and second using female dummies. The live experiments indicated the presence of assortative mating among the two subspecies of M. helenor (bristowi and theodorus). The dummy presentations indicated (a) bristowi males prefer conspecific wings, whereas theodorus have no preference, (b) bristowi males prefer the con(sub)specific colour pattern, (c) theodorus prefer the con(sub)specific iridescence when the pattern is manipulated to be similar among female dummies. A fourth experiment, using sympatric M. achilles and M. helenor, indicated no preference for conspecific female dummies. Finally, chemical analysis indicated substantial differences between these two species in putative pheromone compounds, and especially so in the males.

      The authors conclude that the similarity of iridescence among species in sympatry is suggestive of convergence upon a common anti-predation signal. Despite some behavioural evidence in favourof colour (iridescence)-based mate discrimination, chemical differences between Achilles and Helenor are posed as more likely to function for species isolation than visual differences.

      Overall, I enjoyed reading this manuscript, which presents a valiant attempt at studying visual, chemical and behavioural divergence in this iconic group of butterflies.

      Major comments

      My only major comment concerns the authors' favoured explanation for aposematism (or evasive mimicry) for convergence among species, which is based upon the you-can't-catch-me hypothesis first presented by Young 1971. Although there is supporting work showing that iridescent-like stimuli are more difficult to precisely localize by a range of viewers, most of the evidence as applied to the Morpho system is circumstantial, and I'm not certain that there is widespread acceptance of this hypothesis. Given that the present study deals with closely-related  (sub)species, one alternative explanation - a "null" hypothesis of sorts - is for a lack of divergence (from a common starting point) as opposed to evolutionary convergence per se. in other words, two subspecies are likely to retain ancestral character states unless there is selection that causes them to diverge. I feel that the manuscript would benefit from a discussion of this alternative, if not others. Signalling to predators could very well be involved in constraining the extent of convergence, but this seems a little premature to state as an up-front conclusion of this work. There is also the result of a *dorsal* wing manipulation by Vieira-Silva et al. 2024 which seems difficult to reconcile in light of this explanation. Whereas this paper is cited by the authors, a more nuanced discussion of their experimental results would seem appropriate here.

      We thank the reviewer for their constructive comments on our manuscript. We appreciate the reviewer’s concern regarding the way iridescence convergence between sympatric species is discussed in our manuscript, which align with similar concerns raised by Reviewer 1. Indeed, the you-can't-catch-me hypothesis has not been yet empirically tested in Morpho, this is currently a working hypothesis only supported by indirect lines of evidence.

      Among the 30 known Morpho species, iridescence is most likely the ancestral character, notably because iridescence is a trait shared by a majority of Morpho (we now mention this in the introduction lines 108-110). In this paper, we thus did not aim to identify the evolutionary forces involved in the appearance of iridescence in this group, but rather wanted to understand to what extent ecological interactions can impact the diversification (or not) of this trait. As such, the dorsal manipulations performed in Vieira-Silva et al 2024 showing that iridescence in Morpho may have a similar effect than crypsis does not impact our working hypothesis. Instead, we use VieraSilva et al 2024 to discuss the potential anti-predator effect of iridescence, that could potentially promote convergent evolution of iridescent patterns.

      In the main text, we now clearly mention our null hypothesis: under a scenario of neutral evolution of iridescence, we would expect that the divergence in wing coloration between two M. helenor subspecies would be lower than between two different Morpho species (M. helenor and M. achilles) and showed that our results sharply differ from this null expectation.

      We then improved the discussion by adding alternative hypotheses potentially explaining the convergent iridescent signal detected in sympatric species: we discussed the expected effect under neutral evolution (lines 339-343), but also added alternative hypotheses regarding the diversification of iridescence due to camouflage (lines 363-369), predator evasion (lines 373-377) and thermoregulation (lines 346-353).

      Reviewer #3 (Public review):

      The authors investigated differences in iridescence wing colouration of allopatric (geographically separated) and sympatric (coexisting) Morpho butterfly (sub)species. Their aim was to assess if iridescence wing colouration of Morpho (sub)species converged or diverged depending on coexistence and if iridescence wing colouration was involved in mating behaviour and reproductive isolation. The authors hypothesize that iridescence wing colouration of different (sub)species should converge in sympatry and diverge in allopatry. In sympatry, iridescence wing colouration can act as an effective antipredator defence with shared benefits if multiple (sub)species share the same colouration. However, shared wing colouration can have potential costs in terms of reproductive interference since wing colouration is often involved in mate recognition. If the benefits of a shared antipredator defence outweigh the costs of reproductive interference, iridescence wing colouration will show convergence and alternative mate recognition strategies might evolve, such as chemical mate recognition. In allopatry, iridescence wing colouration is expected to diverge due to adaptation to different local conditions and no alternative mate recognition is expected.

      Strengths:

      (1) Using allopatric and sympatric (sub)species that are closely related is a powerful way to test evolutionary hypotheses

      (2) By clearly defining iridescence and measuring colour spectra from a variety of angles, applying different methods, a very comprehensive dataset of iridescence wing colouration is achieved.

      (3) By experimentally manipulating wing coloration patterns, the authors show visual mate recognition for M. h. bristowi and could, in theory, separate different visual aspects of colouration (patterns VS iridescence strength).

      (4) Measurements of chemical profiles to investigate alternative mate recognition strategies in case of convergence of visual signals.

      Weaknesses:

      In my opinion, studies should be judged on the methods and data included, and not on additional measurements that could have been taken or additional treatments/species that should be included, since in most ecological and evolutionary studies, more measurements or treatments/species can always be included. However, studies do need to ensure appropriate replication and appropriate measurements to test their hypothesis AND support their conclusions. The current study failed to ensure appropriate replication, and in various cases, the results do not support the conclusions.

      First, when using allopatric and sympatric (sub)species pairs to test evolutionary hypotheses, replication is important. Ideally, multiple allopatric and sympatric (sub)species pairs are compared to avoid outlier (sub)species or pairs that lead to biased conclusions. Unfortunately, the current study compares 1 allopatric and 1 sympatric (sub)species pair, hence having poor (no) replication on the level of allopatric and sympatric (sub)species pairs,

      We would like to thank the reviewer for their constructive feedback. We agree that replication is important to test evolutionary hypotheses and that our study lacks replication for allopatric and sympatric Morpho populations. Ideally, one would require several allopatric and sympatric replicates to conclude on the effect of species interaction in trait evolution. Our study is a preliminary attempt at answering this question, covering a few Morpho populations but proposing a broad assessment of iridescence and mate preference for those populations. We clearly mentioned in the discussion that investigating multiple populations is needed to test whether the trend we observed in this paper can be generalized (line 388-392).

      Second, chemical profiles were only measured for sympatric species and not for allopatric (sub)species, which limits the interpretation of this data. The allopatric (sub)species could have been measured as non-coexistence "control". If coexistence and convergence in wing colouration drives the evolution of alternative mate recognition signals, such alternative signals should not evolve/diverge for allopatric (sub)species where wing colouration is still a reliable mate recognition cue. More importantly, no details are provided on the quantification of butterfly chemical profiles, which is essential to understand such data. It is unclear how the chemical profiles were quantified and what data (concentrations, ratios, proportions) were used to perform NDMS and generate Figure 5 and the associated statistical tests.

      We recognize that having the chemical profiles of the genitalia of the Morpho from the allopatric populations would have made a stronger case in favor of reinforcement acting on the divergence of the chemical compounds found on the genitalia of the sympatric Morpho species. Due to limited access to the biological material needed at the time of the chromatography, we could not test for lower divergence in the chemical profiles of allopatric Morpho butterflies. We made sure to mention this limitation in the discussion (lines 457-461). 

      We already stated in the methods that we compiled the area under the peak of each components found in the chromatograms of our samples and that we performed all the statistical analyses on this dataset. To make it clearer, we mention in the new version of the manuscript that the area under the peak of each component allows to measure the concentration of the components (in the methods lines 720, 723, 733). We also added some precisions in the legend of Figure 5.

      Third, throughout the discussion, the authors mention that their results support natural selection by predators on iridescent wing colouration, without measuring natural selection by predators or any other measure related to predation. It is unclear by what predators any of the butterfly species are predated on at this point

      We made sure to mention in the introduction (line 132-136) and in the discussion (line 373-377) that previous predation experiments performed on Morpho and other butterflies showed evidence that birds are likely predators for these species. These observations lead us to test for the putative effect of predation on the evolution of their color pattern, without directly testing predatory rates. We made sure this information is transparent in the revised manuscript, and now precise that assessing wing convergence is only an indirect way of testing the escape mimicry hypothesis (line 393-396).

      To continue on the interpretation of the data related to selection on specific traits by specific selection agents: This study did not measure any form of selection or any selection agent. Hence, it is not known if iridescent wing colouration is actually under selection by predators and/or mates, if maybe other selection agents are involved or if these traits converge due to genetic correlations with other traits under selection. For example, Iridescent colouration in ground beetles has functions as antipredator defence but also thermo- and water regulation. None of these issues are recognized or discussed.

      The lack of discussion of alternative selective pressures involved in the evolution of iridescence was pointed out by all reviewers. We thus modified the text to account for this comment, and no longer limit our discussion to the putative effects of predation. We now specifically discuss alternative hypotheses, including crypsis (362-369) and thermoregulation (line 346-353).

      Finally, some of the results are weakly supported by statistics or questionable methodology.

      Most notably, the perception of the iridescence coloration of allopatric subspecies by bird visual systems. Although for females, means and errors (not indicated what exactly, SD, SE or CI) are clearly above the 1 JND line, for males, means are only slightly above this line and errors or CIs clearly overlap with the 1 JND line. Since there is no additional statistical support, higher means but overlap of SD, SE or CI with the baseline provides weak statistical support for differences.

      We thank the reviewer for bringing interpretation issues concerning the chromatic distances of allopatric Morpho species measured with a bird vision model. We made sure to be nuanced in the description of this graph in the results section (line 208-212). Note that this addition does not change our main conclusion stating that Morpho and predator visual models better discriminate iridescence differences between allopatric subspecies than between sympatric species.

      We now also clearly mention in the figure’s legend that the error bars represent the confidence intervals obtained after performing a bootstrap analysis, in addition to the mention of the nature of the error bars already mentioned in the methods (line 580).

      Regarding the assortative mating experiment, the results are clearly driven by M. bristowi. For M. theodorus, females mate equally often with conspecifics (6 times) as with M. bristowi (5 times). For males, the ratio is slightly better (6 vs 3), but with such low numbers, I doubt this is statistically testable. Overall low mating for M. bristowi could indicate suboptimal experimental conditions, and hence results should be interpreted with care.

      We recognize that the tetrad experiment results are mainly driven by M. bristowi’s behavior as already mentioned in the results (line 231-232) but we now also mention it in the discussion (lines 401-402). This experiment would have benefited from more replicates, but the limited access to live males and virgin females for both subspecies was a limiting factor. Fisher’s exact test used to assess assortative mating is specifically appropriate to small sample sizes. We recognize that the sampling size is not ideal, however it is still statistically testable.

      Regarding the wing manipulation experiment, M. theodorus does not show a preference when dummies with non-modified wings are presented and prefers non-modified dummies over modified dummies. This is acknowledged by the authors but not further discussed. Certainly, some control treatment for wing modification could have been added.

      The use of controls to consider the effect of wing modification and odor by the permanent marker were already mentioned in the methods (lines 636-639). Following your recommendation and comments from the other reviewers, we now mention the use of this control in the results (lines 278283). We also address a potential issue that would have resulted in the rejection of these modified dummies by live males: we cannot be sure whether butterflies perceive these modifications as equivalent to natural coloration (lines 281-282). An additional control could have been used, adding black ink on the black dorsal parts of the pattern to assess its potential visual effect. The constraints on sampling unfortunately did not allow to add another treatment.

      Overall, the fact that certain measurements only provide evidence for 1 of the 2 (sub)species (assortative mating, wing manipulation) or one sex of one of the species (bird visual systems) means overall interpretation and overgeneralization of the results to both allopatric or sympatric species should be done with care, and such nuances should ideally be discussed.

      The aim of the authors, "to investigate the antagonistic effects of selective pressures generated by mate recognition and shared predation" has not been achieved, and the conclusions regarding this aim are not supported by the results. Nevertheless, the iridescence colour measurements are solid, and some of the behavioural experiments and chemical profile measurements seem to yield interesting results. The study would benefit from less overinterpretation of the results in the framework of predation and more careful consideration of methodological difficulties, statistical insecurities, and nuances in the results.

      Overall, we would like to thank all reviewers for their thorough assessment of our work. We understand that the imbalance between mate choice data, visual model data and chemical data only gives us a partial assessment of species recognition in Morpho butterflies, thus requiring more precision in the interpretation and the discussion of our results. We made sure to add balanced interpretations in our discussion, by mentioning the lack of replicates for allopatric and sympatric populations (lines 391-392), and the lack of chemical characterization of allopatric species (lines 458361, see previous comments) and by being more transparent on methodological limitations that we failed to convey in the first version of our manuscript. We brought nuance to our discussion and also discussed alternative hypotheses to predation to explain the convergence of iridescence found in sympatry.

      Reviewing Editor Comments:

      While all reviewers acknowledge the value of your data, they converge in their recommendations to tone down the evolutionary interpretations. Ideally, to test your main hypothesis, you would need several species pairs, or if only one, as in your case, replicated sympatric and allopatric sites for both species. Furthermore, your more specific hypotheses about convergence (vs. nondivergence), response to predators (vs. other environmental variables), and avoiding interspecific mating in sympatry (vs. not avoiding it in allopatry) would require appropriate alternative treatments/controls. We therefore recommend that you focus on those statements that you can support with your experiments and data, and introduce these statements in the introduction with reference to the appropriate literature.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25: This stated aim seems a bit off. The authors did not sensu stricto quantify 'how shared adaptive traits may shape genetic divergence' in this study. I suggest rewriting or deleting this whole sentence altogether. The study's aim is already clear in lines 29-34.

      We deleted the mention of the characterization of genetic divergence, since this study did not focus on any genetic analysis.

      (2) Line 34: The authors here state that they compared allopatric vs sympatric populations. This is strictly not true for M. Achilles. Further, the results after this sentence focus solely ondivergence/convergence in sympatry, nothing at the intraspecific level and implications of the findings

      We now mention that we tested allopatric vs. sympatric species of M. helenor only (lines 28-29). We also mention that the behavioral experiments were based on intraspecific comparisons, and discuss the implications of this result in the discussion.

      (3) Line 35: 'convergence driven by predation': this is a strong statement and cannot be directly inferred from the present set of experiments. Consider toning it down.

      We added nuance to this statement by rephrasing it “suggesting that predation may favors local resemblance” (lines 32-33)

      (4) Line 36: Replace 'behavioral results' with 'behavioral experiments' or something similar.

      Corrected

      (5) Line 45-49: These opening statements need some citations.

      We provided references for the first few lines, by citing terHorst et al 2018 (line 44) underlining the importance of species interactions in trait evolution, and Blomberg et al 2003 (line 45) showing that closely-related species tend to resemble each other by quantifying the phylogenetic signal of various traits.

      (6) Line 83, 165: 'visual effect', not sure what the authors are referring to. Please rewrite.

      We defined “visual effect” as the way wing color patterns could be perceived by predators or mates. We removed mentions of “visual effect” and directly used its definition instead.

      (7) Line 105 onwards: This section of the introduction could benefit from more concise writing. The authors might consider reducing the number of specific examples and instead offering broader general statements, supported by citations from multiple studies.

      We reduced the number of examples given in this paragraph and used general statements supported by multiple citations as examples. (lines 102-119).

      (8) Line 108-110: This sentence seems to be redundant with the previous one.

      We merged this sentence with the previous one to improve clarity. (lines 103-105)

      (9) Line 140: 'with chemical defenses': include citations here.

      We added citations of Joron et al 1999 and Merrill et al 2014, which document the evolution of convergent wing patterns (mimicry) in butterfly species with chemical-defenses.

      (10) Line 149: This is a bit of a stretch. Note that genetic divergence could be influenced by many other things, not only the processes that the authors examined.

      We agree with the reviewer that the study of the convergent vs. divergent evolution of visual cues is not enough to fully understand the mechanisms allowing genetic divergence between species. Because this paper does not focus on characterizing genetic divergence, we removed it from the manuscript to avoid oversimplification.

      (11) Line 151: Again. Here, the author's primary focus seems to be at an interspecific level. One is left to wonder about the need for comparisons at the intraspecific level in M.helenor and the implications. Please clarify

      In the end of the introduction (lines 146-157), we specifically highlighted the importance of intraspecific comparisons. While studying the effect of sympatry on the evolution of the iridescent color pattern, we use this intraspecific comparison as a baseline to account for convergence or divergence of iridescence in a sympatric interspecific pair of Morpho, because under neutral evolution two subspecies are expected to be more similar than two different species (this assumption has been clarified line 147-148). We also used intraspecific mate choice to test for the use of visual cues in mate recognition (experiment 1) and to test what type of signal could be perceived by Morphos (the iridescent coloration or the iridescent pattern, experiment 2 and 3). These results help contextualize the interspecific mate choice, focused on determining whether visual cues could also be used in species recognition. Since we show that iridescent coloration is important in mate recognition at the intraspecific scale, it helps understand why species recognition is low at the interspecific scale because of wing color convergence between M. helenor and M. achilles.

      (12) Line 154: 'signals on mate preferences'.

      Corrected.

      (13) Line 189: 'At the intraspecific level', maybe in the brackets include 'allopatric populations' just so the results are in a similar format as in the color contrast section below.

      We added details to make clearer that the intraspecific level is studied between allopatric Morpho populations (line 189).

      (14) Line 189-192: Please rearrange the figure (current B as A and vice versa) or present the results in order as in the figure (interspecific first and then intraspecific level).

      We rearranged Figure 3 so that the intraspecific comparison (allopatric population) appears as A and the interspecific level (sympatric population) appears as B, to follow the order of presentation in the main text.

      (15) Line 232: The motivation behind experiments 1, 2, and 3 is unclear. The authors have not made a strong point in the introduction about the need for these comparisons at an intraspecific level. Given that the authors are focused on divergence/convergence at an interspecific level, this set of experiments seems to be irrelevant to the present study. The implications of these findings are also not discussed.

      We added motivation to the use of experiment 1, 2, and 3 in the introduction (lines 151-154) by stating that those experiments were used to assess whether blue color could indeed be used as a mating cue in Morpho helenor (experiment 1) and to try to understand what part of the visual signal is important in mate choice in Morpho helenor: the wing pattern (experiment 2) or the iridescent coloration (experiment 3). Although motivation for these experiments was not detailed in our manuscript, we already discussed the implications of the results of experiments 1, 2 and 3 in the discussion by stating that visual cues can take many forms and that considering both color AND pattern is important in understanding visual cues (lines 408-416). We carefully reworked this new version to make it more straightforward.

      (16) Line 260: Insert 'wild-type' before model to ensure similar wording as in the previous section.

      Corrected.

      (17) Line 286: Insert 'sympatric' after mimetic.

      Corrected.

      (18) Line 307: Include a reference to the figures or table where these results are presented.

      We now mention in the main text that the different proportions of beta-ocimene found between males M. helenor and M. achilles are shown in Table S2.

      (19) Line 343: These inferences are speculative. Add a line here, something like 'although this warrants further research in this species'.

      We detailed what additional experiments are needed lines 388-396.

      (20) Line 357: The authors have not discussed their results on iridescence divergence in allopatric populations (line 190) and its implications.

      We now made clear in the beginning of the discussion that the divergence of iridescence in allopatric populations is used as a baseline to test for convergent iridescence between species (lines 339-343).

      (21) Line 361 onwards: This first paragraph is a bit confusing, as the results mainly focus on allopatry, while the title refers to sympatry.

      To avoid confusion between the title and the content of the discussion, we divided the last part of the discussion into two different parts. As the first paragraph mainly focus on allopatry, we isolated it and titled it “Iridescent color patterns can be used as mate recognition cues in M. helenor” (line 498). The next paragraph of the discussion, focusing on the sympatric Morpho populations, has been titled “Evolution of visual and olfactory cues in mimetic sister-species living in sympatry” (line 418).

      (21)  Line 383: visual cues 'as' poor species.

      Corrected.

      (23) Line 405: Why females here and not males? This is again confusing since the authors tested for male mate choice in the main experiments. Some background information on sex-specific mate choice in the methods might help.

      In this specific sentence, we talk about performing mate choice experiments to test for the discrimination of olfactory cues by females (and not males) because we found a high divergence in the chemical compounds found on male genitalia. Although female chemical compounds could also be used as a cue by males in mate recognition, olfactive mate choice is often driven by female choice in butterflies. We recognize that this perspective does not line up with the mate choice presented in our results section which focused on male mate choice based on visual cues, because of ecological reasons (Morpho males tend to be attracted to bright blue colorations but not females) and technical reasons (in cages, females tend to hide away from the males or male dummies, and this behavior is not compatible with experiments involving flying around false males). In the discussion, we made sure to precise that the perspective we cite here is about testing the implications of divergence in male olfactory cues (line 454). We also added motivation to why we chose to investigate male (and not female) mate choice based on visual cues in the methods (lines 613-618) and in the results (219-223).

      (24) Line 417: This inference is speculative. Consider toning it down.

      We rewrote the sentence: “We find evidence of converging iridescent patterns in sympatry suggesting that predation could play a major role in the evolution of iridescence. Further work is nevertheless needed to directly test this hypothesis and establish the important of evasive mimicry in Morpho” (lines 465-468).

      (25) Line 429: 'Convergent trait evolution leads to mutualistic interactions enhancing coexistence'. Careful here. It is not very evident how convergent trait evolution (iridescence) is mutualistic in this case, as there is no experimental evidence for evasive mimicry yet. Consider rewording or toning this sentence down.

      We agree with the reviewer and removed this statement, only keeping the end of the sentence: “Altogether, this study addresses how convergence in one trait as a result of biotic interactions may alter selection on traits in other sensory modalities, resulting in a complex mosaic of biodiversity. (lines 479-481).

      (26) Line 442: Since the samples come from a breeding farm, I have a few questions. How are the authors sure about the location where the specimens were collected? How long have they been kept in captivity? Have they been subjected to any artificial selection? More details are needed here.

      Since M. helenor bristowi and M. helenor theodorus are only found in the wild in West and East Ecuador respectively, those M. helenor subspecies can only be collected in those two allopatric populations. Their phenotype is directly linked to their geographic repartition, this is how we made sure about their collect location. M. h. theodorus we used in this study were caught in East Ecuador in Tena, and M. h. bristowi were caught in West Ecuador in Pedro Vincente Madonado. We received pupae from the breeding farm, meaning that the Morpho used for the experiments were raised in captivity since their date of emergence. Upon emergence, they were transferred into cages for 4 to 5 days to wait for sexual maturity before performing the tetrad and mate choice experiments. This information was added to the method (lines 490-496).

      (27) Line 476: Include some citations supporting this statement.

      We now cite Bennett and Théry (2007), reviewing avian color vision, and Briscoe (2008), characterizing the sensitivity of the photoreceptors found in the eyes of butterflies. Both citations show that the 300-700nm range is seen by avian and butterfly visual systems.

      (28) Line 480 onwards: Please clarify if the analysis used only one value (mean?) per species, sex, angle of measurement, and locality or included data from multiple individuals.

      The analyses of both colorimetric variables and global iridescence were performed using iridescence data from multiple individuals (10 males and 10 females from M. h. bristowi, M. h. theodorus, M. h. helenor and M. a. achilles), for which we measured iridescence at 21 angles of illumination. Sampling size are mentioned lines 507, 515, 540-542.

      (29) Line 510: Is there a specific reason that authors did not investigate achromatic contrasts? Provide some justification here. Or include the results of achromatic contrasts in the supplement.

      We added the achromatic results in the supplement and in the results (lines 200-204). For both the avian visual model and the Morpho visual model, the confidence intervals always overlapped with the JND threshold, showing that neither birds nor butterflies could theoretically discriminate the wing reflectance brightness in allopatric and sympatric populations.

      (30) Line 552 onwards: I may have missed it. It is not entirely clear why the authors focused on male mate choice rather than female preference for visual cues. The authors should explicitly justify this choice and cite previous studies demonstrating that male mate choice, rather than female preference, is important in this species. This should be stated in the results section as well.

      We added a paragraph in the method (lines 613-618) to describe the ecological and technical reasons leading to testing only male mate choice using visual cues (also see our response to recommendation #23).

      (31) Line 537 onwards: What was the criterion used to score that mating had occurred? Why first mating and not how long they were mating? Please add these details.

      We stopped the experiment as soon as a male/female pair was formed by joining their genitalia (we added this information in the method lines 599-600). Since the tetrad experiment involves the interaction of two males and two females from different subspecies, we considered that mate choice happened before the formation of any couple, and is not necessarily dependent on how long they mate by observing their mating behavior. For instance, we witnessed avoidance behaviors from females that systematically hide their genitalia and refused to join their abdomen to some males, while being very ‘open’ to others (but did not quantify it).  

      (32) Line 571: The authors used a black permanent marker to modify wing patterns but did not validate whether butterflies perceive these modifications as equivalent to natural coloration. It is possible that the alterations introduced unintended visual cues and may explain why most males rejected the dummies (line 267). The authors should acknowledge this limitation here.

      We now acknowledge this limitation in the method (lines 638-639) and in the results section (lines 278-283).

      (33) Line 591: Insert 'above' after protocol.

      Corrected.

      (34) Line 605: If the authors included random effects in their model, then it should be generalized linear mixed model (GLMM) and not GLM as they wrote.

      We indeed included a random effect in our model accounting for male ID and trial number, we thus replaced “GLM” by “GLMM” in the manuscript.

      (35) Line 615: This set of analyses does not seem to account for pseudo-replication, as the data were recorded from the same male more than once (Line 583). Please clarify and redo the analysis with the GLMM framework

      We run new analyses using the GLMM framework: we used a binomial GLMM to test whether individuals preferentially interacted with dummy 1 vs. dummy 2 while accounting for pseudoreplication. The previously detected tendencies hold true with these new analyses, except for the visual mate discrimination of M. achilles: we now find statistical evidence that M. achilles tend to approach more their conspecifics during the mate choice experiment, although the signal is weak (line 297-307). Indeed, while we previously concluded that both species in sympatry (M. helenor and M. achilles) could not discriminate their conspecific mates, we now emphasize that M. achilles is somewhat sensitive to some visual signals. However, its estimated probability of approaching a conspecific is only 0.54, which is low compared to the estimated probability of approaching (0.61) or touching (0.84) a con-subspecific for M. bristowi. We thus concluded that even though some visual cues could be relevant for mate recognition, they are less reliable for male choice in sympatric populations were color patterns are more convergent, compared to allopatric populations. We thus updated Figure 4 and Figure S8 and S9, which are now picturing the probability of approaching or touching a conspecific or con-subspecific with the updated pvalues retrieved from the GLMM analyses. We also updated the results (line 297-307) and the discussion (lines 430-438) to bring nuance to our previous results.  

      (36) Line 963: Figure 3D. Is there a particular reason for comparing allopatric populations only within Ecuador rather than between Ecuador and French Guiana for M. helenor? Please clarify.

      We aimed at comparing the putative discrimination of blue coloration using visual models vs. what the butterflies actually discriminate using mate choice experiments. Since we only performed mate choice experiments involving M. h. bristowi x M. h. theodorus (allopatric populations within Ecuador) and M. h. helenor x M. a. achilles (sympatric population from Ecuador), we only looked at those comparisons using visual models. We added this precision lines (559-560).

      (37) Line 980: Are these predicted probabilities or just mean proportions as written in line 614? Then the label should be changed to 'Proportion of approaches' or something similar.

      Following our answer to recommendation #35, the points now represent the probability of touching a conspecific in the graph for each male, for every trial of every male tested. We corrected the legend of the figure. 

      Reviewer #2 (Recommendations for the authors):

      (1) Line 25: "...therefore facilitating co-existence in sympathy".

      Corrected.

      (2) Line 28: "contrasting" instead of contrasted.

      Corrected.

      (3) Line 33: begin a new sentence at the colon.

      Corrected.

      (4) Line 49: the phrase "habitat filtering" is unclear and should perhaps be defined or qualified.

      We replaced “habitat filtering” by its definition and cited Keddy (1992), describing the community assembly rules and defining habitat filtering (line 46)

      (5) Line 52: remove "even".

      Corrected.

      (6) Line 53: divergent suites may also result because traits are often constrained by genetic architecture (multivariate genetic covariances). This is discussed at length and specifically in relation to ornamental coloration by Kemp et al. 2023

      We rewrote the introduction and focused on only reviewing the ecological interactions promoting trait divergence in sympatric species, and did not mention genetics in this paper.

      (7) Line 87: (and throughout) refer to "colouration" or "colour pattern" rather than "colourations".

      Corrected.

      (8) Line 151: Remove "To do so,".

      Corrected.

      (9) Line 191: I would like to see the degrees of freedom for this test.

      We added the F-statistic=2.09 and the degrees of freedom df=1 of this test, and for all the following tests.

      (10) Line 201: (and throughout) replace "on" with "of".

      Corrected.

      (11) Line 205: modelling the visual properties of the wings allows one to infer what is theoretically visible/distinguishable. The modelling is useful but not necessarily definitive of vision/behaviour per se under different conditions in the wild. I therefore think it is appropriate to phrase the wording around the modelling approach more carefully. Perhaps refer to "theoretical" or "inferred" discriminability, or state (e.g.) that species should/should not be capable of perceiving differences based on the modelling data. You do this well in your wording of lines 207-209. This need not apply in the discussion because you're then dealing with the combination of modelling results and behaviour (mating trials).

      We agree with the reviewer that visual modelling only allows to infer what is theoretically discriminated by the butterflies, and that the wording of our sentence is confusing. We therefore modified the sentence to account for those precisions: “Morpho butterflies and predators can theoretically visually perceive the difference in the blue coloration between different subspecies of M. helenor…… using both bird and Morpho visual models” (line 206-209).

      (12) Line 222: Either the chi-square test or Fisher's exact test should be sufficient (why report both?)

      Chi-square test relies on large-sample assumptions (expected counts>5) whereas Fischer’s exact test does not and is valid even with small or unbalanced sample sizes. Since the M. bristowi female/M. h. theodorus male paring only occurred 3 times, we do not meet the primary assumptions to apply a Chi-square test, although it is significant. We used a Fischer’s test to confirm the results. Using both and finding that both tests are significant shows that the results are robust, although they may appear redundant. To simplify, we remove the results of the Chisquare test and only keep the Fisher’s test in the methodology and the results.

      (13) Line 224 (and throughout): Degrees of freedom should be provided for statistical tests.

      We reported the statistic value and the degrees of freedom for all mentions of the statistical tests in the main text, except for the Fischer test which does not rely on an asymptotic distribution like the Chi-squared distribution as it is an exact test.

      (14) Lines 266-267: This sentence has interest, but it is rather vague at present. Wouldn't your controls account for the effect of manipulation? This could be explained further.

      During our mate choice experiments, all Morpho female dummies used for the experiments were painted with black markers, either on their dorsal blue band to modify their blue iridescent phenotype, or on their ventral side, thus controlling for the effect of manipulation. However, we cannot rule out that the modification of the dorsal blue iridescence could have had a “repulsive” effect for males for several reasons. For example, depending on the visual discrimination of darker colors by Morphos, the painted black band could have a slightly different color compared to the dark “brown” usually surrounding their blue iridescent patterns. We now explain this in the results (lines 278-283) and in the methodology (lines 638-639)  

      (15) Line 316: I'm not certain that the similarity is best described as "striking", given a P-value of 0.084 for this contrast

      We agree with the reviewer and removed this adjective for this line.

      (16) Lines 387-390: This sentence is puzzling because, theoretically speaking, we should expect selection on visual preference to be heightened (not relaxed) in sympatry if colouration isincluded among the traits used in mate selection. I'm not certain I have understood the meaning here.

      We would like to thank the reviewer for pointing out this typo. If shared predatory pressures favors convergent evolution of color pattern, then the visual signals become less reliable for species recognition. As a result, sexual selection on visual preference is heightened and becomes stronger, favoring the evolution of alternative cues used to discriminate conspecific mates. We changed the sentence and now write “the convergent evolution of iridescent wing patterns… may have negatively impact visual discrimination and favored the evolution of divergent olfactory cues” (lines 457-458).

      (17) Line 529: Mating experiments. Given that these are quite large butterflies, I wondered whether a 3x3x2m cage would be sufficient in size to allow the expression of male courtship. A brief description of the courtship behaviour in these species or Morphos generally would be a useful addition to the paper.

      A cage this size was enough for the males to express a flight behavior similar to what can be seen in nature, while also being able to see the females (live females or dummies). We tried to perform mate experiments in a larger cage (7m x 5m x 3m) but the trials were not conclusive because male did not find the dummies depending on where they were flying in the cage. A 3mx3mx2m cage is a good compromise maximizing interactions while still allowing enough space to fly. We now describe Morpho male behavior and female behavior in the methods (lines 613-618).

      (18) Line 546: Why are both tests needed (chi-square AND Fisher's exact)?

      Similarly to our answer on recommendations #12, were used both tests to show robustness in the statistical results. We only kept the Fisher’s test results to simplify the results.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates the role of HIF1a signalling in epicardial activation and neonatal heart regeneration in mice. Through a combination of genetic and pharmacological approaches, the authors show that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction (MI). However, several aspects of the study remain incomplete and would benefit from further clarification and additional experimental support to solidify the conclusions.

      We reveal herein prolonged epicardial activation following myocardial infarction (MI) beyond post-natal days 1-7 (P1-P7) by genetic or pharmacological stabilisation of HIF-signalling. This extends the so-called “regenerative window” during an adult-like response to injury, leading to enhanced survived myocardium and functional improvement of the heart, even against a backdrop of persistent, albeit reduced, fibrosis. The epicardium is known to enhance cardiomyocyte proliferation and myocardial growth during heart development via trophic growth factor (for example, IGF-1, FGF, VEGF, TGFβ and BMP) signalling (reviewed in PMID:29592950) and epicardium-derived cell-conditioned medium reduces infarct size and improves heart function (PMID: 21505261). Further experiments, outside of the scope of the current study, are required to determine whether activated neonatal epicardium elicits similar paracrine support to sustain the myocardium and heart function after injury beyond P7 into adulthood.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches to the role of hypoxia signaling in enhancing the regenerative potential of the epicardium.

      Weaknesses:

      The major weakness is the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in epicardial cells. Additionally, novel experimental approaches should be performed to allow for the translation of these findings to the clinical arena.

      We respectfully disagree that we have not convincingly demonstrated a role for HIF-signalling in promoting epicardial EMT. We adopt epicardial explant assays utilising a well characterised ex vivo protocol previously described for studying EMT in embryonic, neonatal and adult epicardium (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142). These assays demonstrate in WT1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> explants enhanced cobblestone to spindle-like change in cell morphology, increased cell migration, appearance of stress fibres and an up-regulation of the mesenchymal marker alpha-smooth muscle actin (αSMA); all parameters associated with EMT. In addition, our in vivo analyses of Wt1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> hearts, in response to neonatal injury, reveal elevated numbers of WT1+ epicardial cells within the sub-epicardial region and underlying myocardium as is associated with active EMT and subsequent migration from the epicardium.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. They found that WT1<sup>+</sup> epicardial cells become hypoxic and begin expressing HIF1a from mid-gestation onward. During development, epicardial HIF1a signaling regulates WT1 expression and promotes coronary vasculature formation. In the postnatal heart, genetic and pharmacological upregulation of HIF1a sustained epicardial activation and improved regenerative outcomes.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      There appears to be a discrepancy between some of the conclusions and the provided histological data. Additionally, the study does not offer mechanistic insight into the functional recovery observed.

      We respectfully disagree with the comment that our histological data does not support our conclusions and expand on this in the response to specific reviewer comments. We agree that further mechanistic experiments outside of the scope of the current study are required to identify precisely how activated neonatal epicardium results in increased healthy myocardium after injury beyond post-natal day 7 (P7).

      Reviewer #3 (Public review):

      Summary:

      The authors' research here was to understand the role of hypoxia and hypoxia-induced transcription factor Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart, and this persisted into neonatal stages until postnatal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart, and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction, and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infarction. This study outlines the potential to extend the regenerative time window in neonatal mammalian hearts.

      We thank the reviewer for this positive endorsement and recognition of the importance of mechanistic insight into how to extend the window of neonatal heart regeneration.

      Weaknesses:

      While the observations of improved cardiac function are clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      We report an increase in healthy myocardium arising from prolonged activation of the epicardium during the neonatal window and following injury at post-natal day 7 (P7). We speculate this recapitulates the role of the epicardium during heart development which is known to be a source of trophic growth factors that can enhance myocardial growth. Further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      We believe the decreased fibrosis is a natural consequence of the increase in survived myocardium arising from the activated epicardium. There is strong precedent here following injury at post-natal day 1 (P1) in which fibrosis is evident early-on but is resolved over time with growth of the myocardium in the regenerating heart (PMID: 23248315).

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Address issues related to image quality, colocalization, sample labeling, appropriate controls, and quantification - particularly in Figures 1, 2, 6, and Supplementary Figure 9. Increase sample size as noted by reviewers.

      The issues of co-localisation and sample labelling have been addressed under response to reviewers. We are unable to increase sample numbers but have clarified the number of regions per section and numbers of sections per heart analysed where appropriate.

      (2) Clarify the effects of epicardial HIF1a activation on neovascularization.

      We have removed reference in the abstract to an effect on neovascularisation.

      (3) Extend assessments of epicardial hypoxia and HIF1a expression to earlier embryonic stages, when epicardial EMT is more active.

      Our earliest timepoint of E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445). In the same study, E11.5 lineage tracing of epicardial cells is restricted to outer layer of the heart; thus, our timepoints are representative in capturing both the onset and progression of in vivo EMT.

      (4) Strengthen EMT assays and mechanistic modeling. Provide evidence from physiologically relevant models, as current 2D culture assays do not adequately support conclusions about EMT. Include additional EMT markers and quantification where appropriate.

      We respectfully disagree that epicardial explants are not a valid assay for assessing EMT. As noted under responses to reviewers, such primary explants have been widely described elsewhere (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142) and enable documentation of multiple parameters that are associated with active EMT, including an assessment of the extent of cell migration, cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We support our findings in explants by revealing reduced WT1+ epicardium-derived cells (EPDCs) in the sub-epicardial region and underlying myocardium of WT1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> embryonic hearts (data in Figure 2) indicative of impaired epicardial EMT and migration of EPDCs and in vivo following neonatal MI with pharmacological inhibition of PHD2, where we observe the reciprocal phenotype of increased numbers of epicardium-derived cells emerging from the outer epicardial layer (data in Figure 6).

      (5) Strengthen mechanistic insights into the role of epicardial cells in the functional recovery observed in MI hearts.

      We agree that further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Reviewer #1 (Recommendations for the authors):

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts. The study is potentially interesting, but it presents several major caveats.

      (1) One of the critical points reported in the early stages of this study is the early co-localization of Wt1, the hypoxic report (HP1), and HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) during embryonic development. Figure 1 is meant to report such findings. However, unfortunately, I hardly see any co-localization at all in the Wt1+ epicardial cells for HP1, with some colocalization is seen for HIF1 and 2 alpha, although none of these data are quantified. Thus, it is hard to believe such co-localization.

      We respectfully disagree with this comment. We highlight cells in Figure 1 that are co-stained for WT1+ and HP1. In addition, we identify HIF1-α and HIF2- α positive cells which either reside within the epicardium, as the outer cell layer, or within the underlying sub-epicardial region, respectfully.

      (2) The authors claimed that they have analyzed the expression of the hypoxic report, as well as Wt1 and the HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) in the AV groove, as compared to the apex, in embryonic heart ranging from E12.5 to E18.5 (Figure 1). Unfortunately, all images provided that are tagged as AV groove are rather misleading. They do not represent the AV groove but part of the right ventricular free wall. If the authors want to refer to the AV groove, AV cushions should be visible underneath.

      We have removed specific reference to the AV groove and refer to the highlighted regions as the “Base” of the heart.

      (3) The authors analyzed the hypoxic condition of the developing heart from E12.5 to E18.5. However, it remains unclear why the authors only explored the hypoxic conditions from E12.5 onwards, since epicardial EMT mainly occurs earlier than this time point, i.e., E10.5 onwards. Therefore, it would be needed to explore it already at this earlier time point.

      We respectfully disagree with the reviewer and refer to the comment above regarding the fact that E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445).

      (4) The authors reported a conditional mouse model of HIF1alpha deletion by using the Wt1CreERT2 driver. Curiously, Wt1 is dependent on hypoxia signaling (i.e., HIF1a). Therefore, it is unclear whether there is a negative feedback loop between the deletion of Hif1alpha and the activation of the Cre driver might have functional consequences. Convincing evidence should be provided that such crosstalk does not interfere with Hif1alpha inactivation, and therefore, appropriate controls should be run in parallel.

      We discount a negative feedback loop in this instance based on the fact we have utilised heterozygous mice for the WT1<sup>CreERT2/+</sup> line and observe a consistent and reproducible phenotype for the developing hearts on a Wt1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> background and following injury in Wt1<sup>CreERT2/+</sup>;Phd2<sup>fl/fl</sup> mice. Collectively this indicates that the WT1-CreERT2 driver is active in the context of diminishing HIF-1α and Phd2, respectively. In addition, have carried out parallel experiments using epicardial explants derived from R26R-CreERT2;Phd2<sup>fl/fl</sup> (Figure 3) to circumvent any potential confounding issues; the results of which are consistent with increased epicardial EMT in support of our overall hypothesis.

      (5) On Figure 2a-f the authors reported that epicardial cells are diminished in Wt1CreERT2Hif1alpha mice as compared to controls. I am very sorry, but I do not see any difference. Furthermore, it is unclear to me how the authors quantified such differences, i.e., what marker signal did they use and how it was performed (Figure 2c and d)?

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining in Figure 2, which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (comparing magenta cells in panels a) versus b). Quantification was carried out for numbers of WT1+ cells residing within the PDPN-positive epicardium (and underlying PDPN-negative myocardium) across multiple images from multiple sections and multiple hearts.

      (6) On Figure 2g, the authors reported differences in total vessel length. Are they referring to impaired microvasculature development? Or is this analysis also including major coronary vessels? What about the major coronary vessels and trees, is there any affection?

      This analysis refers to the microvasculature and not the major coronary arteries or coronary trees.

      (7) The authors reported that there might be some differences in EMT markers, but unfortunately, all of them are analyzed on 2D cultures, where no substrate for EMT is present, i.e., an underlying ECM bed. Thus, the authors cannot claim that EMT is altered. Additional experiments using either collagen substrate and/or Matrigel are required to fully demonstrate that EMT is impaired. Furthermore, quantitative analyses of such differences should be provided.

      The 2D cultures are epicardial explants from mutant versus wild type hearts and represent a widely adopted previously published ex-vivo assay for investigating epicardial EMT across embryonic to adult stages (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142); including an assessment of the extent of migration and cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We do not understand the comment regarding an “underlying ECM bed” as the cells exhibit EMT routinely on tissue culture plastic and will deposit their own ECM during the culture time course and in response to EMT/cell migration. In terms of quantification this was carried out for scratch assay experiments, as a proxy for EMT and emergent mesenchymal cell migration, as presented in Figure 3i, j with significant enhanced scratch closure and cell migration following Molidustat treatment.

      (8) The description of data provided on Supplementary Figure 5 is spurious and should be removed. A note in the discussion might be sufficient.

      We respectfully disagree. The ChIP-seq data, in what is now Figure 2- figure supplement 3, highlights a HIF-1 α binding site within the Wt1 locus suggesting putative upstream regulation of WT1 by HIF-1α. Thus this provides a potential explanation as to how HIF-1α may activate the epicardium through up-regulation of Wt1/WT1.

      (9) On Figure 3, the authors further illustrate the change of EMT markers using ex vivo cardiac explants. They reported increased expression of Snai2 that, although statistically significant, is most likely of no biological relevance (increase of only 20% at transcript level). What about Snai1, Prrx1, and other EMT promoters? Are they also induced? As previously stated, these 2D cultures do not provide supporting evidence that EMT is occurring, thus 3D gel assays should be performed in which Z-axis analyses will provide evidence on the different migratory behaviour of those cells.

      We respectfully suggest that a 20% change in snai2 expression is biologically meaningful with respect to EMT. This in-turn is supported by associated cell migration, reduced ZO-1 expression, increased stress fibres and increased alpha-SMA as a mesenchymal marker; all properties associated with active EMT. Other suggested markers have not been validated as formally required for EMT, for example Snai1 (PMID: 23097346). The migratory capacity of targeted versus epicardial cells was assessed by combined explant and scratch assay experiments.

      (10) The description of single-cell analyses is very incomplete. Which mice were used for these analyses, wildtype control, or hypoxic mice? Please provide a clearer description of the samples used. Additionally, the entire rationale of these analyses is dubious. Doing single-cell analyses to analyze a couple or three markers in a very small cell population is rather ridiculous. qPCR might be far more appropriate and convincing, or a bulk RNAseq analysis of isolated epicardial cells.

      The single-cell analyses represent an unbiased assessment of different pathways in epicardial cells (identified bioinformatically) between intact P1 and P7 stages in wild type (control) hearts, with a focus on hypoxia-related gene expression and HIF-dependent pathways. It was not designed to analyse a small number of genes, rather global differences in the hypoxic states between P1 and P7 hearts. Selected genes (Vegfa, Pdk3, Egln 1 (Phd2)) were analysed to highlight the key differences in hypoxic signalling across the regenerative window. The fact the hearts were uninjured/intact is clarified in the text and legends for Figure 4 and now Figure 4-figure supplement 1.

      (11) The analyses provided in Figure 5 are very interesting and their findings are very relevant. However, I would think that the complementary experimental approach should also be done, i.e, MI followed by activation with tamoxifen, since that situation would be more realistic in the clinical setting.

      Tamoxifen causes respiratory failure in neonates with MI, so the two cannot be combined at the same time or soon after surgery. Moreover, tamoxifen takes significant time to take effect on targeted gene down-regulation which may negate sufficient activation of the epicardium following injury.

      The experiments in Figure 5 were designed to demonstrate that prolonged heart regeneration could be elicited in a cell-specific (epicardial-specific) manner via a genetic approach. The pharmacological experiments in Figure 6 are complementary in this regard by demonstrating equivalent effects with drug (Molidustat) delivery to reduce PHD2 and stabilise HIF post-MI.

      (12) In Figure 6, expression of Wt1 is highly prominent in P7 controls, mainly restricted to the epicardial lining while in the experimental setting, such Wt1 expression is broadly distributed on the subepicardial space, nicely demonstrating epicardial activation. However, it is very surprising to see such Wt1 expression in controls, something that is not expected, as compared to the data reported in Figure 4g. Could the authors please reconcile these findings?

      Figure 6 represents the injury setting and Figure 4g the intact setting (as clarified above, in the text and revised figure legends). Hence in the latter WT1 expression is significantly reduced in the P7 heart, as anticipated. With injury at P7 we anticipate activation of WT1 in control hearts, albeit restricted to the epicardial layer (as occurs in adult hearts, PMID: 21505261). In contrast, following Molidustat-treatment of P7 hearts post-MI we observe extensive epicardial expansion into the sub-epicardial region and EPDC migration into the underlying myocardium (Figure 6b).

      Reviewer #2 (Recommendations for the authors):

      The role of hypoxia and HIF1a signaling in epicardial activation is an important topic, and the genetic approaches employed in this study are appropriate. However, several aspects of the study remain unclear and would benefit from further clarification or explanation by the authors:

      (1) The authors detected hypoxic regions using an anti-pimonidazole fluorescence-conjugated monoclonal antibody (HP1). The data would become more compelling if negative and positive controls were provided.

      We believe the HP1 staining is compelling in the images shown and is consistent with hypoxic regions of the developing heart. We reveal HP1 staining at cellular resolution with neighbouring cells positive and negative for the HP1 signal in the apex of the heart and within the epicardium and sub-epicardial regions at E12.5 (Figure 1a) and diminished/altered hypoxic/HP1 regional signal through subsequent developmental stages at E14.5-18.5 (Figure 1a-d).

      (2) Many HIF1a-positive cells in the AV groove region do not appear to overlap with HP1 staining (Figure 1a). Providing a low-magnification image of HIF1α expression would be helpful to better assess the extent of overlap with HP1 staining

      HIF-1 is highly unstable and hence detection of HIF-1+ cells will likely only sample of cells compared to HP1 which is a surrogate for broader regions of hypoxia.

      (3) Although the authors conclude that epicardial HIF1a deletion results in a significant reduction of WT1⁺ cells in both the epicardium and myocardium (Figure 2a-d), the provided images are not sufficiently clear to fully support this interpretation. Providing additional evidence to support this conclusion would be helpful.

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (Figure 2a versus 2b; magenta WT1+ staining).

      (4) Similar to the point raised above, the authors' conclusion regarding the increased expression of WT1 following Molidustat treatment does not appear to be fully supported by the provided images (Figure 6b-f). Immunofluorescence staining for WT1 does not clearly demonstrate epicardial expression in the remote zone of either the control or Molidustat-treated hearts. In addition, while an increase of WT1<sup>+</sup> cells is observed in the infarct zone of the Molidustat-treated heart, it is somewhat unexpected that such expansion is not evident in the corresponding region of the control heart, given that epicardial cells typically expand near the infarct area. Clarification on these points would be helpful.

      Figure 6b reveals WT1 expression in controls (upper panel set) that is reactivated proximal to the infarct region, given WT1 is not expressed in adult epicardium but restricted to the epicardial layer (as occurs in injured adult mouse hearts PMID: 21505261). This contrasts with what is observed in the Molidustat-treated P7 hearts post-MI, where we observe epicardial expansion and migration of WT1+ cells into the underlying myocardium (Figure 6b, lower panel set, infarct zone).

      (5) The authors conclude that WT1<sup>+</sup> cells in the myocardial tissue exhibit endothelial identity based on the colocalization of WT1 and EMCN signals (Supplementary Figure 9c). However, this interpretation is difficult to assess, as WT1 is a nuclear marker and EMCN is a membrane protein, which makes precise colocalization challenging to confirm with confidence. Additional supporting evidence may be necessary to substantiate this conclusion.

      WT1 is known to be up regulated in endothelial cells in response to injury as shown previously in several studies (for example, PMID: 25681586). Here we show clear co-localisation of nuclear WT1 and cytoplasmic Endomucin (EMCN) in what is now Figure 6- figure supplement 1c and would encourage the reviewer and readers to magnify the image by zooming-in on the relevant co-stained panel.

      (6) The authors conclude that activation of epicardial HIF1a signaling has no effect on neovascularization in postnatal MI hearts (Figure 5c). However, the abstract states: "Finally, a combination of genetic and pharmacological stabilisation of HIF ... increased vascularisation, augmented infarct resolution and preserved function beyond the 7-day regenerative window" (Lines 38-41). Clarification regarding this apparent discrepancy would be appreciated.

      The abstract has been altered to remove the statement of increased vascularisation.

      (7) The study appears somewhat incomplete, as it lacks mechanistic insight into the functional recovery observed following epicardial Phd2 deletion and Molidustat treatment in postnatal MI hearts. Although the authors suggest a potential paracrine role of the epicardium in protecting cardiomyocytes from apoptosis, this hypothesis has not been experimentally addressed. Incorporating such analysis would help to reinforce the study's conclusions.

      Further experiments are required, which are out-of-scope of this study, to define a mechanistic link between the genetic or pharmacological stabilisation of HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Other points:

      (1) Providing single-channel images for Figures 1a-d and 6g would be helpful for clarity and interpretation.

      We believe the combined channel views of co-staining for two markers on a background of DAPI staining to pin-point cell nuclei, are informative and support our conclusions.

      (2) Have the authors considered using AngioTool to quantify the number of vessels in Figure 5b-c?

      AngioToolTM was used to quantify the vessels, as we have used previously (PMID: 33462113) and this is now added to the methods and legend of Figure 2.

      Reviewer #3 (Recommendations for the authors):

      There are several areas where the manuscript can be improved, such that its conclusions can be solidified.

      (1) The authors highlight a point where blocking Phd2 can enhance survival of cardiac tissue, but did not report on survival markers. They surmised that apoptosis could be decreased in Phd2 mutant or Molidustat treatment but did not show this. The authors should determine if apoptosis is decreased in the myocardium and epicardium.

      We show evidence of increased levels of healthy myocardium in the genetic and pharmacological models of stabilised HIF-signalling. We exclude increased cardiac hypertrophy or increased cardiomyocyte proliferation as causative, so suggest as a reasonable alternative enhanced survival, albeit this need not necessarily be via an apoptotic pathway given the incidence of necrotic cell death during MI. We are unable to generate new surgeries and mutant/treated heart samples to analyse for apoptotic markers at this stage.

      (2) There appears to be no difference in cardiomyocyte proliferation in Molidustat-treated animals, but the experiment was only performed on 2 to 3 animals. This is too small a sample size to conclude from these results. The authors should increase the sample size to make this assertion.

      We respectfully disagree that we are unable to conclude no effect on cardiomyocyte proliferation. We analysed multiple heart regions per section, for EdU+/cTnT+ colocalised signals across several sections per heart, set against a consistency of effect on other parameters in hearts treated with Molidustat. We are unable to generate more P7 heart surgeries +/- Molidustat and +/- EdU at this stage.

      (3) It is curious as to how, after myocardial infarction, the fibrotic scar tissue is decreased in the Phd2 deletion but not as profound in Molidustat-treated mice at d21. Can the authors speculate why the difference exists and how this decrease arises? For example, are there decreased pro-inflammatory signals in Phd2 deleted mice? Is there decreased collagen deposition and ECM gene expression? Do macrophage recruitment into the infarct zone differ between mutant/treated vs WT?

      The representative images in Figure 6k reveal a trend towards reduced fibrosis with Molidistat treatment (Figure 6l), but across all hearts analysed this was not as significant as observed in the epicardial-specific deletion injured hearts (Figure 5g, h). This may be due to the relatively short half-life of Molidustat (approximately 4-10 hours, PMID: 32248614), the dosing regimen for the drug and/or the fact that it was not specifically delivered/targeted to the epicardium.

      (4) The magnified images in Figure 1 do not match the boxes in the whole heart images. It is unclear what the white boxes signify.

      The white boxes have been removed from Figure 1. The magnified image panels are from serial heart sections and this is now clarified in the Figure 1 legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review

      GENERAL QUESTIONS:

      (1) For many enveloped viruses, the attachment factors - paradoxically - are also surface glycoproteins, often complexed with a distinct fusion protein. The authors note here that the glycoportiens do not inhibit the initial binding, but only limit the stability of the adhesive interface needed for subsequent membrane fusion and viral uptake. How these antagonistic tendencies might play out should be discussed.

      When the surface density of receptor molecules for a virus with glycans increases, the density of free glycans not bound to the virus increases along with the amount of virus adsorbed. However, if the total amount of glycans is considered to be a function of the receptor density, the reaction may become more complicated. This complication may also be affected by the prolonged infection. If the receptor density on the cell surface is high, the infection inhibitory effect of glycans may not be obtained in a system in which a high concentration of virus is supplied from the outside world for a long time. This is because once viruses have entered the cell, they accumulate inside the cell, and viral infection is affected by the total accumulated amount, which is the integration of the number of viruses that have entered over time. This distinction indicates that the virus entry reaction and the total amount of infection in the cell must be considered separately. This is an important point, but it was not clearly mentioned in the original manuscript.

      Our experiments were conducted under conditions that clearly allowed us to detect the virusinhibiting function of glycans without being affected by the above points. In order to clarify these points, we will revise this article as follows, referring to an experiment that is somewhat related to this discussion (the Adenovirus infection experiment into HEK293T cells shown in Figure S1F)..

      (Page-3, Introduction)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (Page 20, Discussion)

      When the virus receptor is a glycoprotein or glycan itself, the inhibition of virus infection by glycans becomes more complex because the total amount of glycans is also a function of the receptor density. It is also important to note that the total amount of infection into a cell is the time integral of virus entry. Even if the probability of virus entry is significantly reduced by glycans, the cumulative number of virus entries may increase if high concentrations of virus continue to be supplied from outside the cell for a long period of time. In the case of Adenovirus, which continues to amplify in HEK293T cells after infection, we showed that MUC1 on the cell surface has an inhibitory effect on long-term cumulative infection (Supplementary Figure 1F). However, such an accumulation effect may be caseby-case depending on the virus cell system, and may be more pronounced when the cell surface density of virus receptor molecules is high. As a result, if the virus receptor molecule is a glycan or glycoprotein and infection continues for a long period of time, the infection inhibition effect may not be observed despite an apparent increase in the total amount of glycans in the cell. In any case, our results clarified the factor of virus entry inhibition dependent on the total amount of glycans because appropriate conditions were set.

      (2) Unlike polymers tethered to solid surface undergoing mushroom-to-brush transition in densitydependent manner, the glycoproteins at the cell surface are of course mobile (presumably in a density-dependent manner). They can thus redistribute in spatial patterns, which serve to minimize the free energy. I suggest the authors explicitly address how these considerations influence the in vitro reconstitution assays seeking to assess the glycosylation-dependent protein packing.

      We performed additional experiments using lipid bilayers that had lost fluidity, and found that there is no significant difference in protein binding between fluid and nonfluid bilayers. The redistribution of molecules due to molecular fluidity may play some roles but not in our experimental systems. It suggests that glycoproteins can generate intermolecular repulsion even in fluid conditions such as cell membranes, just as they do on the solid phase. This experiment was also very useful because it allowed us to compare our results in the fluid bilayer with solid-state measurements of saturation molecular density and the brush transition. This comparison gave us confidence that in the reconstituted membrane system, even at saturation density, the membrane proteins are not as stretched as they are in the condensed brush state. We have therefore added a new paragraph and a new figure (Supplementary Fig. 5B) to discuss this issue, as follows:

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      (3) The discussion of the role of excluded volume in steric repulsion between glycoprotein needs clarification. As presented, it's unclear what the role of "excluded volume" effects is in driving steric repulsion? Do the authors imply depletion forces? Or the volume unavailable due to stochastic configurations of gaussian chains? How does the formalism apply to branched membrane glycoproteins is not immediately obvious.

      Regarding the excluded volume due to steric repulsion between glycoproteins, we considered the volume that cannot be used by glycans as Gaussian chains branching from the main chain. We would like to expand on this point by adding several papers that make similar arguments. I'm glad you brought this up because we hadn't considered depletion forces - the excluded volume between glycoproteins should generate a depletion force, but in this case we believe this force will not have a significant effect on viruses that are larger than the glycoproteins. We also attempted to clarify the discussion in this section by focusing on intermolecular repulsion, and restructured paragraphs, which are also related to General Question 2 and Specific Question 2. The relevant part has been revised as follows. (page 15~page16):

      To compare the packing of proteins with different molecular weights and R<sub>F</sub>, These were smaller than the coverage of molecules at hexagonal close packing that is ~90.7%. In contrast, the coverage of b-CD43 and b-MUC1 at saturated binding was estimated to be greater than 100% under this normalization standard, indicating that the mean projected sizes of these molecules in surface direction were smaller than those expected from their R<sub>F</sub> Thus, it is clear that glycosylation reduces the saturation density of membrane proteins, regardless of molecular size.

      Highly glycosylated proteins resisted densification, indicating that some intermolecular repulsion is occurring. In the framework of polymer brush theory, the intermolecular repulsion of densely packed highly glycosylated proteins is due to an increase in either f<sub>el</sub>, f<sub>int</sub> (d<R<sub>F</sub>), or both (Hansen et al., 2003; Wu et al., 2002). The term of intermolecular interaction, f<sub>int</sub>, is regulated by intermolecular steric repulsion, which occurs when neighboring molecules cannot approach the excluded volume created by the stochastic configuration of the polymer chain (Attili et al., 2012; Faivre et al., 2018; Kreussling and Ullman, 1954; Kuo et al., 2018; Paturej et al., 2016). The magnitude of this steric repulsion depends largely on R<sub>F</sub> in dilute solutions, but the molecular structure may also affect it when molecules are densified on a surface. In other words, the glycans protruding between molecules can cause steric inhibition between neighboring proteins (Figure 5D). Such intermolecular repulsion due to branched side chains occurs only when the molecules are in close proximity and sterically interact on a twodimensional surface, but not in dilute solution, and does not occur in unbranched polymers such as underglycosylated proteins (Figure 5D). Based on the above, we propose the following model for membrane proteins: Only when the membrane proteins are glycosylated does strong steric repulsion occur between neighboring molecules during the densification process, suppressing densification.

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub>, is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (4) The authors showed that glycoprotein expression inversely correlated with viral infection and link viral entry inhibition to steric hindrance caused by the glycoprotein. Alternative explanations would be that the glycoprotein expression (a) reroutes endocytosed viral particles or (b) lowers cellular endocytic rates and via either mechanism reduce viral infection. The authors should provide evidence that these alternatives are not occurring in their system. They could for example experimentally test whether non-specific endocytosis is still operational at similar levels, measured with fluid-phase markers such as 10kDa dextrans.

      The results of the experiment suggested by the reviewer are shown in the new Supplementary Figure 3B. (This results in generation of a new Supplementary Figure 3, and previous Supplementary Figures 4-5 are now renumbered as Supplementary Figures 5-6). Endocytosis of 10KDa dextran was attenuated by the expression of several large-sized molecules, but was not affected by the expression of many other glycoproteins that have the ability to inhibit infection. These results were clearly different from the results in which virus infection was inhibited more by the amount of glycan than by molecular weight. Therefore, it was found that many glycoproteins inhibit virus infection through processes other than endocytosis. Based on the above, we have added the following to the manuscript: (p9 New paragraph:)

      We also investigated the effect of membrane glycoproteins on membrane trafficking, another process involved in viral infection. Expression of MUC1 with higher number of tandem repeats reduced the dextran transport in the fluid phase, while expression of multiple membrane glycoproteins that have infection inhibitory effects, including truncated MUC1 molecules, showed no effect on fluid phase endocytosis, indicating a molecular weight-dependent effect (Supplementary Figure 3B). The molecular weight-dependent inhibition of endocytosis may be due to factors such as steric inhibition of the approach of dextran molecules or a reduction in the transportable volume within the endosome. In any case, it is clear that many low molecular weight glycoproteins inhibit infection by disturbing processes other than endocytosis. Based on the above, we focus on the effect of glycoproteins on the formation of the interface between the virus and the cell membrane.

      (5) The authors approach their system with the goal of generalizing the cell membrane (the cumulative effect of all cell membrane molecules on viral entry), but what about the inverse? How does the nature of the molecule seeking entry affect the interface? For example, a lipid nanoparticle vs a virus with a short virus-cell distance vs a virus with a large virus-cell distance?

      Thank you for your interesting comment. If the molecular size of the ligand is large, it should affect virus adsorption and molecular exclusion from the interface. In lipid nanoparticle applications, controlling this parameter may contribute to efficiency. In addition, a related discussion is the influence of virus shell molecules that are not bound to the receptor. I will revise the text based on the above.

      Discussion (as a new paragraph after the paragraph added in Q1):

      In this study, we attempted to generalize the surface structure on the cell side, but the surface structure on the virus side may also have an effect. The efficiency of virus adsorption and the efficiency of cell membrane protein exclusion from the interface will change depending on the molecular length of the receptor-ligand, although receptor priming also has an effect. In addition, free ligands of the viral envelope or other coexisting glycoproteins may also have an effect as they are also required for exclusion from the virus-cell interface. In fact, there are reports that expression of CD43 and PSGL-1 on the virus surface reduces virus infection efficiency (Murakami et al., 2020). Such interface structure may be one of the factors that determine the infection efficiency that differs depending on the virus strain. More generally, modification of the surface structure may be effective for designing materials such as lipid nanoparticles that construct the interface with cell.

      SPECIFIC QUESTIONS:

      (1) The proposed mechanism indicates that glycosylation status does not produce an effect in the "trapping" of virus, but in later stages of the formation of the virus/membrane interface due to the high energetic costs of displacing highly glycosylated molecules at the vicinity of the virus/membrane interface. It is suggested to present a correlation between the levels of glycans in the Calu-3 cell monolayers and the number of viral particles bound to cell surface at different pulse times. Results may be quantified following the same method as shown in Figure 2 for the correlation between glycosylation levels and viral infection (in this case the resulting output could be number of viral particles bound as a function of glycan content).

      The results of this experiment are now shown as Supplementary Figure 2F and 2G. We compared the amount of virus bound after incubation for 10 minutes or for 3 hours as in the infection experiment, but no negative correlation was found between the total amount of glycans on the surface of the Calu3 monolayer and the amount of virus bound. Interestingly, there was a sight positive correlation was detected, which may be due to concentrated virus receptor expressions in glycan-enriched cells. This result shows that glycoproteins do not strongly inhibit virus binding. We will amend the text as follows (see also Q6).

      (Page 10)

      Glycans could be one of the biochemical substances ……We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no negative correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). The slight positive correlation between bound virus and glycans may be due to higher expression levels of viral receptors in glycan-rich cells. ….

      (2) The use of the purified glycosylated and non-glycosylated ectodomains of MUC1 and CD-43 to establish a relationship between glycosylation and protein density into lipid bilayers on silica beads is an elegant approach. An assessment of the impact of glycosylation in the structural conformation of both proteins, for instance determining the Flory radius of the glycosylated and non-glycosylated ectodomains by the FRET-FLIM approach used in Figure 4 would serve to further support the hypothesis of the article.

      Unfortunately, the proposed experiment did not provide a strong enough FRET signal for analysis. This was due in part to the difficulty in constructing a bead-coated bilayer incorporating PlasMem Bright Red, which established a good FRET pair in cell experiments. We also tried other fluorescent molecules, but were unable to obtain a strong and stable FRET signal. Another reason may be that the curvature of the beads is larger than that of the cells, making it difficult to obtain a sufficient cumulative FRET effect from multiple membrane dyes. We plan to improve the experimental system in the future.

      On the other hand, we also found that in this system, the signal changes were very subtle, making it difficult to detect molecular conformational changes using FRET. After reconsidering general questions (2) and (3), we speculated that the molecular density in the experiment, even at saturation binding, was below or at most equivalent to the brush transition point. In other words, proteins on the bead-coated bilayer may not be significantly extended in the vertical direction. Therefore, the conformational changes of these proteins may not be large enough to be detected by the FRET assay. We updated Figure 3C and Figure 5D (model description) to better reflect the above discussion and introduced the following discussion in the manuscript.

      (page11)

      We introduced the framework of conventional polymer brush theory to study the structure of viruscell interfaces containing proteins……. Numerous experimental measurements of the formation of polymer brushes have also been reported (Overney et al., 1996; Wu et al., 2002; Zhao and Brittain, 2000). In these measurements, the transition to a brush typically occurs at a density higher than that required to pack a surface with hemispherical polymers of diameter R<sub>F</sub>. This is the point at which the energy loss due to repulsive forces between adjacent molecules (f<sub>int</sub>) exceeds the energy required to stretch the polymer perpendicularly into a brush (f<sub>el</sub>).

      (page16)

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (3) The MUC1 glycoprotein is reported to have a dramatic effect in reducing viral infection shown in Fig 1F. On the contrary, in a different experiment shown in Fig2D and Fig2H MUC1 has almost no effect in reducing viral infection. It is not clear how these two findings can be compatible.

      The immunostaining results show that the density of MUC1 molecules is very low in the experimental system in Figure 2 (Figure 2C), which is supported by the SC-RNASeq data (as shown in Supplementary Figure 2A, MUC1 is not listed as a top molecule). In other words, the MUC1 expression level in this experiment is too low to affect virus infection inhibition. On the other hand, the Pearson correlation function represents the strength of the linear relationship between two variables, so it is not the most appropriate indicator for seeing the correlation with the MUC1 expression level, which has little change (Figure 2D, 2F). In fact, even TOS analysis, which can see the correlation by focusing on the cells with the highest expression level, cannot detect the correlation (Figure 2H).Therefore, the MUC1 data in Figure 2DFH will be annotated and corrected in the figure legend.

      Figure2 Legend:

      MUC1 has a small mean expression level and variance, and is more affected by measurement noise than other molecules when calculating the Pearson correlation function (Figure 2C-2F). In addition, the number of cells in which expression can be detected is small, so no significant correlation was detected by TOS analysis (Figure 2H).

      (4) Why is there a shift in the use of the glycan marker? How does this affect the conclusions? For the infection correlation relating protein expression with glycan content the PNA-lectin was used together with flow cytometry. For imaging the infection and correlating with glycan content the SSA-lectin is used.

      For each cell line, we selected the lectin that could be measured over the widest dynamic range. This lectin is thought to recognize the predominant glycan species in the cell line (Fig. S1C, Fig. 2D). In our model, we believe that viral infection inhibition is not specific to the type of sugar, but is highly dependent on the total amount of glycans. If this hypothesis is correct, the reason we used different lectins in each experiment is simply to select the lectin that recognizes the most predominant glycan species that is most convenient for predicting the total amount of glycans in cells. This hypothesis is consistent with our observations, where the total amount of glycans estimated by different lectins could explain the infection inhibition in a similar way in the experiments in Figures 1 and 2, and the TOS analysis in Figure 2 showed that minor glycans also have an infection inhibitory effect. On the other hand, it is of course possible to predict the total amount of glycans more accurately by obtaining as much information on glycans as possible (related to Q5). Based on the above discussion, the manuscript will be revised as follows.

      Page5

      Using HEK293T cell lines exogenously expressing genes of these proteins tagged with fluorescent markers, their glycosylation was measured by binding of a lectin from Arachis hypogaea (PNA), and the number of these proteins in the cells was measured simultaneously. PNA was used for the measurement because it has a wider dynamic range than other lectins (Supplementary Figure 1C). This suggests that GalNAc recognized by PNA is predominantly present on glycans of HEK293T cells, especially on the termini of glycans that are amenable to lectin binding, compared to other saccharides.. …

      page9  

      Our findings suggest that membrane glycoproteins nonspecifically inhibit viral infection, and we hypothesize that their inhibitory function is also nonspecific depending on the type of glycan. Our hypothesis is consistent with the observations in the TOS analysis. Although minor saccharide species in the system (such as GlcNAc and GalNAc recognized by DSA, WGA, or PNA) showed anticolocalization with infection, their scores were much lower than those of major saccharide species. This suggests that all major and minor saccharide species have an infection inhibitory effect, but cells enriched with minor type glycans are only partially present in the system, and the contribution of these cells to virus inhibition is also partial. It is also consistent with the observation that the amount of GalNAc recognized by PNA determines the virus infection inhibition in HEK 293T cells (Figure 1). Therefore, we believe that our assay using a single type of predominantly expressed lectin is still useful for estimating the total glycan content. Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of saccharide species would allow for more accurate estimation. It should be noted that the amount of bound lectin does not necessarily measure the overall glycan composition but likely reflects the sugar population at the free end of the glycan chain to which the lectin binds most.

      (5) The authors in several instances comment on the relevance and importance of the total glycan content. Nevertheless, these conclusions are often drawn when using only one glycan-binding lectin. In fact, the anti-correlation with viral infection is distinct for the various lectins (Fig 2D and Fig 2H). Would it make more sense to use a combination of lectins to get a full glycan spectrum?

      As stated in the answer to Q4, we believe that we were able to detect the infection-suppressing effect of the total glycan amount by using the measurement value of the major component glycan as an approximation. However, as you pointed out, if we could accurately measure the minor glycan components and add up their values, we believe that we could measure the total glycan amount more accurately. In order to measure multiple glycans simultaneously and with high accuracy, some kind of biochemical calibration may be necessary to compare the measurements of lectin-glycan pairs with different binding constants. We believe that these are very useful techniques, and would like to consider them as a future challenge. The corrections listed in Q4 are shown below.

      (Page 9)

      Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of glycans would allow for more accurate estimation. …….

      (6) Fig 3A shows virus binding to HEK cells upon MUC1 expression. Please provide the surface expression of the MUC1 so that the data can be compared to Fig 1F. Nevertheless, it is not clear why the authors used MUC expression as a parameter to assess virus binding. Alternatively, more conclusive data supporting the hypothesis would be the absence of a correlation between total glycan content and virus binding capacity.

      The relationship between the expression level of MUC1 in each cell and the amount of virus binding is shown in Supplementary Figure 3A. There is no correlation between the two. In HEK293T cells, many glycans are modified with MUC1, so MUC1 was used as the indicator for analysis (Supplementary Figure 1C). As you pointed out, it is better to use the amount of glycan as an indicator, so we analyzed the relationship between the amount of bound virus and the amount of glycan on the surface on the Calu-3 monolayer (Supplementary Figure 2F, 2G, introduced in the answer to Specific (Q1)). In any case, no correlation was found between virus binding and surface glycans. I will correct the manuscript as follows.

      (page 9)

      Glycans could be one of the biochemical substances that link the intracellular molecular composition and macroscopic steric forces at the cell surface. To clarify this connection, we further investigated the mechanism by which membrane glycoproteins inhibit viral infection. First, we measured viral binding to cells to determine which step of infection is inhibited. We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). These results indicate that glycoproteins do not inhibit virus binding to cells, but rather inhibit the steps required for subsequent virus internalization.

      (7) While the use of the Flory model could provide a simplification for a (disordered) flexible structure such as MUC1, where the number of amino acids equals N in the Flory model, this generalisation will not hold for all the proteins. Because folding will dramatically change the effective polypeptide chain-length and reduce available positioning of the amino acids, something the authors clearly measured (Fig 4G), this generalisation is not correct. In fact, the generalisation does not seem to be required because the authors provide an estimation for the effective Flory radius using their FRET approach

      Current theories generalizing the Flory model to proteins are incomplete, and it is certainly not possible to accurately estimate the size of individual molecules undergoing different folding. However, we found such a generalized model to be useful in understanding the overall properties of membrane proteins. In our experiments, we were indeed able to obtain the R<sub>F</sub>s of some individual molecules by FRET measurements. However, this modeling made it possible to estimate the distribution range of the RFs, including for larger proteins that cannot be measured by FRET. For example, from our results, we can estimate that the upper limit of the RFs of the longest membrane proteins is about 10.5 nm, assuming that the proteins follow the Flory model in all respects except for the shortening of the effective length due to folding. These analyses are useful for physical modeling of nonspecific phenomena, as in our case.

      In order to discuss the balance between such theoretical validity and the convenience of practical handling, we revise the manuscript as follows.

      (page 13) 

      This shift in ν indicates that glycosylation increases the size of the protein at equilibrium, but the change in R<sub>F</sub> is slight, e.g., a 1.3-fold increase for one of the longest ectodomains with N = 4000 when these values of ν are applied. This calculation also gives a rough estimate of the upper limit of the R<sub>F</sub> of the extracellular domains of all membrane proteins in the human genome (approximately 10.5 nm). Physically, this change in ν by glycosylation may be caused by the increased intramolecular exclusion induced sterically between glycan chains. This estimated ν are much smaller than that of 0.6 for polymers in good solvents, possibly due to protein folding or anchoring effects on the membrane. In fact, the ν of an intrinsically disordered protein in solution has been reported to be close to 0.6 (Riback et al., 2019; Tesei et al., 2024). Overall, these analyses using the Flory model provide information on the size distribution of membrane proteins and the influence of glycans, although the model cannot predict the exact size of each protein due to its specific folding.

      MINOR COMMENTS/EDITS:

      (1) In Figures 2A and 2C, as well as Supplemental Figure 2C, the fluorescent images indicate that GFP expression differs among the various groups. Ideally, these should be at the same GFP expression level, as the glycan and antibody staining occurred post-viral infection. For instance, ACE2 is a well-known positive control and should enhance SARS-CoV-2 infection. Yet, based on the findings presented in Supplemental Figure 2C, ACE2 appears to correlate with the lowest infection rate. The relationship between the infection rate and key glycoproteins needs clearer quantification.

      We measured the virus inhibition effect specific to each molecule using a cell line expressing low levels of viral receptors and glycoproteins (Fig. 1). On the other hand, the system in Fig. 2 contains diverse viral receptors and glycoproteins and has not been genetically manipulated. (We apologize that there was a typo in our description of experiment, which will be corrected, as shown below). The variation in infection rate between samples was caused by multiple factors but was not related to the molecule for which the correlation was measured. The receptor-based normalization used in the experiment in Fig. 1 cannot be applied in this system in Fig.2 due to the complexity of the gene expression profile. Therefore, instead of such parameter-based normalization, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      To test this hypothesis, we infected a monolayer of epithelial cells endogenously expressing highly heterogeneous populations of glycoproteins with SARS-CoV-2-PP, and measured viral infection from cell to cell visually by microscope imaging. …

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (2) For clarity, the authors should consider separating introductory and interpretive remarks from the presentation of results. These seem to get mixed up. The introduction section could be expanded to include more details about glycoproteins, their relevance to viral infection, and explanations of N- and O-glycosylation.

      Following the suggestion, (1) we added an explanation of the relationship between glycoproteins and viral infection, and N-glycosylation and O-glycosylation to the Introduction section, and (2) moved the introductory parts in the Results section to the Introduction section, as follows.

      (1; page3)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. These glycoprotein groups have no common amino acid sequences or domains. The glycans modified by these proteins include both the N-type, which binds to asparagine, and the O-type, which binds to serine and threonine. Furthermore, there have been no reports of infection-suppressing effects according to the specific monosaccharide type in the glycan. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (2 : Page 4-5)

      To confirm that glycans are a general chemical factor of steric repulsion, an extensive list of glycoproteins on the cell membrane surface would be useful. The wider the range of proteins to be measured, the better. Therefore, we collect information on glycoproteins on the genome and compile them into a list that is easy to use for various purposes. Then, by analyzing sample molecules selected from this list, it may be possible to infer the effect of the entire glycoprotein population on the steric inhibition of virus infection, despite the complexity and diversity of the Glycome (Dworkin et al., 2022; Huang et al., 2021; Moremen et al., 2012; Rademacher et al., 1988). Elucidation of the mechanism of how glycans regulate steric repulsion will also be useful to quantitatively discuss the relationship between steric repulsion and intracellular molecular composition. For this purpose, we apply the theories of polymer physics and interface chemistry.

      Results

      List of membrane glycoproteins in human genome and their inhibitory effect on virus infection

      To test the hypothesis that glycans contribute to steric repulsion at the cell surface, we first generate a list of glycoproteins in the human genome and then measure the glycan content and inhibitory effect on viral infection of test proteins selected from the list (Figure 1A). To compile the list of glycoproteins, we ….

      (3) In the sentence, "glycoproteins expressed lower than CD44 or other membrane proteins including ERBB2 did not exhibit any such correlation, although ERBB2 expressed ~4 folds higher amount than CD44 and shared ~7% among all membrane proteins," it is unclear which protein has a higher expression level: CD44 or ERBB2? Furthermore, the use of the word "although" needs clarification.

      Corrected as follows:

      (page 8)

      ……showed a weak inverse correlation with viral infection; even such a weak correlation was not observed with other proteins, including ERBB2, which is approximately four-fold more highly expressed than CD44

      (4) In Supplementary Figure 5, please provide an explanation of the data in the figure legend, particularly what the green and red signals represent.

      Corrected as follows:

      STORM images of all analyzed cells, expressing designated proteins. The detected spots of SNAPsurface Alexa 647 bound to each membrane protein are shown in red, and the spots of CF568conjugated anti-mouse IgG secondary antibody that recognizes Spike on SARS-CoV2-PP are shown in green. For cells, a pair of two-color composite images and a CF658-only image are shown. Numbers on axes are coordinates in nanometer.

      (5) It would be good to see a comprehensive demonstration of the exact method for estimation of membrane protein density (in the SI), since this is an integral part of many of the analyses in this paper. The method is detailed in the Methods section in text and is generally acceptable, but this methodology can vary quite widely and would be more convincing with calibration data provided.

      We added flow cytometry and fluorometer data for calibration (Supplementary Figure 1L,M) and introduced a sentence explaining the procedure for obtaining the values used for calibration as follows:

      (page 54)

      …….Liposome standards containing fluorescent molecules (0.01– 0.75 mol% perylene (Sigma), 0.1– 1.25 mol% Bodipy FL (Thermo), and 0.005– 0.1% DiD) as well as DOPC (Avanti polar lipids) were measured in flow cytometry (Supplmentary Figure 1L). Meanwhile, by fluorimeter, fluorescence signals of these liposomes and known concentrations of recombinant mTagBFP2, AcGFP and TagRFP-657 proteins and SNAP-Surface 488 and Alexa 647 dyes (New England Biolabs) were measured in the same excitation and emission ranges as in flow cytometry assays (Supplementary Figure 1M). Ratios between the integral of fluorescent intensities in this range between two dyes of interest are used for converting the signals measured in flow cytometry. Additional information needed for calibration is the size difference between liposomes and cells. The average diameter of liposomes is measured to be 130 nm, and the diameter of HEK 293T cells is estimated to be 13 µm (Furlan et al., 2014; Kaizuka et al., 2021b; Ushiyama et al., 2015). From these data, the signal from cells acquired by flow cytometry can be calibrated to molecular surface density. For example, the Alexa 647 signal acquired by flow cytometry can be converted to the signal of the same concentration of DID dye using fluorometer data, but the density of the dye is unknown at this point. This converted DID signal can then be calibrated to the density on liposomes rather than cells using liposome flow cytometry data. Finally, adjusted for the size difference between liposomes and cells, the surface molecular density on cells is determined. By going through one cycle of these procedures, we could obtain calibration unit, such as 1 flow cytometry signal for a cell in the designated illumination and detection setting = 0.0272 mTagBFP2 µm<sup>-2</sup> on cell surface.

      (Figure legend, Supporting Figure 1: )

      … L. Flow cytometry measurements for liposomes containing serially diluted dye-conjugated lipids and fluorescent membrane incorporating molecules (Bodipy-FL, peryelene, and DID) with indicated mol%. Linear fitting shown was used for calibration.  M. Fluorescence emission spectrum for equimolar molecules (50µM for green and far-red channels, and 100µM for blue channel), excited at 405 nm, 488 nm, and 638 nm, respectively. Membrane dyes were measured as incorporated in liposomes. Purified recombinant mTagBFP2 was used.

      (6) Fig 2A: The figure legend should describe the microscopy method for a quick and easy reference.

      Corrected as follows:

      (Figure legend, Figure 2)

      A. Maximum projection of Z-stack images at 1 µm intervals taken with a confocal microscope. SARSCoV2-pp-infected, air-liquid interface (ALI)-cultured Calu-3 cell monolayers were chemically fixed and imaged by binding of Alexa Fluor 647-labeled Neu5AC-specific lectin from Sambucus sieboldiana (SSA) and GFP expression from the infecting virus.

      (7) Fig 2B: what is the color bar supposed to represent? Is it the pixel density per a particular value? Units and additional description are required. In addition, these are "arbitrary units" of fluorescence, but you should tell us if they've been normalized and, if so, how. They must have been normalized, since the values are between 0 and 1, but then why does the scale bar for SSA only go to 0.5?

      The color bar shows the number of pixels for each dot, resulting in the scale for density scatter plot. The scale on the X-axis was incorrect. All these issues have been fixed in this revision, in the figure and in the legend as follows.

      (Figure legend, Figure 2)

      B. Density scatter plot of normalized fluorescence intensities in all pixels in Figure 2A in both GFP and SSA channels. Color indicates the pixel density.  

      (8) Fig 3D has a typo: this should most likely be "grafted polymer."

      (9) Fig 3E has a suspected typo: in the text, the author uses the word "exclusion" instead of "extrusion." The former makes more sense in this context.

      (10) Fig 5A has a typo: "Suppoorted" instead of Supported Lipid Bilayer.

      (11) Fig 7E-F has a suspected typo: Again, this should most likely be the word "exclusion" instead of "extrusion."

      Thank you so much for pointing out these mistakes, I have corrected them all as suggested.

      (12) Which other molecules are referred to, on page 6 (middle), that do not have an inhibitory effect? Please specify.

      We specified the molecules that have inhibitory effects, and revised as follows: 

      These proteins include those previously reported (MUC1, CD43) as well as those not yet reported (CD44, SDC1, CD164, F174B, CD24, PODXL) (Delaveris et al., 2020; Murakami et al., 2020). In contrast, other molecules (VCAM-1, EPHB1, TMEM123, etc.) showed little inhibitory effect on infection within the density range we used.

      (13) Fig 2 B: the color LUT is not labelled nor explained.

      Corrected as described in (7)

      (14) Please provide the scale bars for figures Fig 2A, C, E and Suppl Fig 2C, D.

      Corrected. 

      (15) Please provide the name for the example of a 200 aa protein that is meant to inhibit viral infection but is not bigger than ACE2. Also providing the densities in Fig 3A would help to correlate the data to Fig 1F.

      Corrected as follows: 

      (page 10)

      We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein (mean density ~50 µm<sup>-2</sup>) that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). …..

      In our measurements, a protein with extracellular domain of ~200 amino acids (e.g. CD164 (138aa)) at a density of ~100 μm-2 showed significant inhibition in viral infection. This molecule is shorter than the receptor ACE2 (722 aa),

      (16) In the experiments conducted in HeK cells expressing the different glycoproteins studies it is mentioned that results of infection were normalised by the amount ACE2 expression. Is the expression of receptor homogenous in the experiments conducted in Figure 2? Clarify in the methods if the expression of receptor has been quantified and somehow used to correct the intensity values of GFP used to determine infection.

      As also explained for Q1, the system in Fig. 2 contains diverse viral receptors and glycoproteins, and the receptor-based normalization used in the experiment in Fig. 1 cannot be applied. Instead, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (17) Can you provide additional details about the method of thresholding to eliminate "background" localisations in STORM?

      Method section was corrected as follows: 

      (page 59)

      …Viral protein spots not close to cell membranes were eliminated by thresholding with nearby spot density for cell protein. Specifically, the entire image was pixelated with a 0.5µm square box and all viral protein signals within the box that had no membrane protein signals were removed. Also, viral protein spots only sparsely located were eliminated by thresholding with nearby spot density for viral protein. This thresholding process removed any detected viral protein spot that did not have more than 100 other viral protein spots within 1µm.

      (18) The article says "It was shown that the number of bound lectins correlated with the amount of glycans, not with number of proteins (Figure 1E)". Figure 1E correlates experimental PNA/mol with predicted glycosylation sites, not with the number of expressed proteins. Correct sentence with the right Figure reference.

      As you pointed out, the meaning of this sentence was not clear. We have amended it as follows to clarify our intention:

      (page 8)

      Since a wide range of glycoproteins inhibit viral infection, it is possible that all types of glycoproteins have an additive effect for this function. ……. In this cell line, this inverse correlation was most pronounced when quantifying N-acetylneuraminic acid (Neu5AC, recognized by lectins SSA and MAL) compared to the various types of glycans, while some other glycans also showed weak correlations (Supplementary Figure 2C). These results showed that the amount of virus infection in cell anticorrelated with the amount of total glycans on the cell surface. As amount of glycans is determined by the total population of glycocalyx, infection inhibitory effect can be additive by glycoprotein populations as we hypothesized.

      If the inhibitory effect is nonspecific and additive, the contribution of each protein is likely to be less significant. To confirm this, we also measured the correlation between the density of each glycoprotein and viral infection. CD44, which was shown to…….. Our results demonstrate that total glycan content is a superior indicator than individual glycoprotein expression for assessing infection inhibition effect generated by cell membrane glycocalyx. These results are consistent with our hypothesis regarding the additive nature of the nonspecific inhibitory effects of each glycoprotein.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes? 

      Thank you for asking about this.  Each VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retina exhibits multiple zones of choroidal neovascularization.  The examples in Figures 1 and Figure 1 – Figure supplements 1 and 2 are mostly from retinas with loss of TGFBR1, but we could have chosen similar examples from retinas with loss of TGFBR2.  The quantification in the original version of Figure 1- Figure supplement 1 panel C had a labeling error.  It actually showed the quantification choroidal neovascularization (CNV) in the sum of both VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, not only in VE-cad-CreER;TGFBR1 CKO/- retinas as originally labeled.  The point that it made is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now updated that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  We think it likely that a more extensive sampling would show little or no difference between these two genotypes – but the data is what it is. This is now described in the Results section. 

      We have also added a panel D to Figure 1- Figure supplement 1, which shows a retina flatmount analysis of CNV.  This is done by mounting the retina with the photoreceptor side up so that the outer retina can be optimally imaged. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, it does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype? 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing. 

      Thank you for raising this point.  For the revised manuscript, we have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociate the tissue and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present after tissue homogenization.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusions as the original analysis: the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      As described in our response to query 2, we have conducted additional experiments to look at vascular leakage in control, VE-cad-CreER;TGFBR1 CKO/-, and NdpKO retinas.  We have also looked at Sulfo-NHS-biotin leakage in the VE-cadCreER;TGFBR1 CKO/- brain, and it is indistinguishable from WT controls.  Since Sulfo-NHS-biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D-tip cells lost in these mutants by snRNAseq? 

      Please note: Alk5 is another name for TGFBR1.  This is noted in the second sentence of paragraph 4 of the Introduction.  The reviewer is correct: there are a lot of similarities because these are exactly the same KO mice.  Also, Zarkada and we used the same VEcadCreER to recombine the CKO allele.  The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published in Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251.  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.  Finally, we have no reason to doubt the results of Zarkada et al.

      Reviewer #2 (Public review): 

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected. 

      Strengths: 

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain. 

      Weaknesses: 

      The causal link between TGF-β loss, vascular inflammation, and immune infiltration remains unresolved. The authors' model posits that EC-specific TGF-β loss directly causes inflammation, which recruits immune cells. However, an alternative explanation is plausible: Tgfbr1/2 KO-induced developmental defects (e.g., leaky vessels) permit immune extravasation, subsequently triggering inflammation. The observations that vein-specific upregulation of ICAM1 staining and the lack of immune infiltration phenotypes in the non-CNS tissues support the alternative model. Late-stage induction of Tgfbr1/2 KO (avoiding developmental confounders) could clarify TGF-β's role in retinal angiogenesis versus anti-inflammation. 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      In the revised manuscript, we have expanded the Discussion section to address the two alternative hypotheses raised by the reviewer.  Here are the relevant data in a nutshell: (1) vascular leakage into the parenchyma, as measured with sulfo-NHSbiotin, in TGFBR1 endothelial CKO retinas is far less than in NdpKO retinas, where nearly all ECs convert to a fenestration+ (PLVAP+) phenotype and there is leakage of sulfo-NHS-biotin, (2) ICAM1 in ECs in TGFBR1 endothelial CKO retinas increases several-fold more than in NdpKO or Frizzled4KO retinas, (3) TGFBR1 endothelial CKO retinas have more infiltrating immune cells than NdpKO or Frizzled4KO retinas, and (4) in TGFBR1 endothelial CKO retinas large numbers of immune cells are observed within and adjacent to blood vessels.  We think that the simplest explanation for these data is that loss of TGFbeta signaling in ECs causes an endothelial inflammatory state with enhanced immune cell extravasation.  That said, the case for this model is not water-tight, and there could be less direct mechanisms at play.  In particular, this model does not explain why the inflammatory phenotype is limited to CNS (and especially retinal) vasculature.

      Regarding the last sentence of the reviewer’s comment (“Late stage induction…”), we have tried activating CreER recombination at different ages and we observe a large reduction in the inflammatory phenotype when recombination is initiated after vascular development is complete.   This observation suggests that the vascular developmental/anatomic defect – and perhaps the resulting retinal hypoxia response – is required for the inflammatory phenotype.  In the revised manuscript we have expanded the Results and Discussion sections to describe this observation.

      Reviewer #1 (Recommendations for the authors): 

      Suggestions for experiments: 

      (1) The authors need to show a quantitative comparison of the number of choroidal neovascular tufts per whole eye crosssection in both genotypes (TgfbR1 and TgfbR2 KO mice). 

      Thank you for raising this point.  The quantification in the original version of Figure 1- Figure supplement 1 panel C was mis-labeled.  It quantifies choroidal neovascularization (CNV) in both VE-cad-CreER;TGFBR1 CKO/- and VE-cadCreER;TGFBR2 CKO/- retinas, not VE-cad-CreER;TGFBR1 CKO/- retinas only as originally labeled.  The point it makes is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now corrected that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  This is now described in the Results section. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors should provide a detailed quantification of the leakage phenotype outside the vessels into the CNS parenchyma, both in the retina and brain, in TgfbR1 KO mice. 

      Thank you for raising this point.  There is no detectable Sulfo-NHS-biotin leakage into the brain parenchyma in VE-cadCreER;TGFBR1 CKO/- mice.  We have expanded Figure 2 to show and quantify the data for retinal vascular leakage (Figure 2C and D).  The data show that in VE-cad-CreER;TGFBR1 CKO/- mice there is accumulation of Sulfo-NHS-biotin in the vascular tufts but minimal accumulation elsewhere in the retinal vasculature and minimal leakage of Sulfo-NHS-biotin into the retinal parenchyma.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing to ascertain these preliminary data. 

      Thank you for raising this point.  We have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data to increase the numbers of cells.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociating the cells and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells, since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset of and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusion as in the original submission, namely that the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.  The Results section has been expanded to describe this new data and analysis.    

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      Sulfo-NHS biotin leakage in the VE-cad-CreER;TGFBR1 CKO/- brain is minimal, and it is indistinguishable from WT controls.  Since Sulfo-NHS biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) The authors should perform a more detailed RNAseq analysis of tip and stack (stalk) cells in TgfbrR1 KO mice to determine whether D tip cells are lost in these mutants by snRNAseq. 

      The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published by Zarkada et al, who analyzed the same VE-cad-CreER;TGFBR1 CKO/- mutant mice, although they refer to the TGFBR1 gene by its alternate name ALK5 [Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251].  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.

      Suggestions for improving the manuscript:  

      (6) The statement that ECs acquire properties of immune cells (Page 2, Line 90) is incorrect. Endothelial cells may acquire characteristics of antigen presenting cells. 

      Thank you for that correction.  Based on the review from Amersfoort et al (2022) (Amersfoort J, Eelen G, Carmeliet P. (2022) Immunomodulation by endothelial cells - partnering up with the immune system? Nat Rev Immunol 22:576-588) and the articles cited in it, we have changed the sentence to “Although vascular endothelial cells (ECs) are not generally considered to be part of the immune system, in some locations and under some conditions they acquire properties characteristic of immune cells, including secretion of cytokines, surface display of co-stimulatory or co-inhibitory receptors, and antigen presentation in association with MHC class II proteins (Pober and Sessa, 2014; Amersfoort et al., 2022).”  

      (7) The statement in Page 3, Line 100-101 [In CNS ECs, quiescence is maintained in part by the actions of astrocyte-derived Sonic Hedgehog, with the result that few immune cells other than resident microglia are found within the CNS (Alvarez et al., 2011).] is incomplete. Wnt signaling also suppresses the expression of leukocyte adhesion molecules from endothelial cells and therefore helps with immune cell quiescence. 

      Thank you for raising that point.  We have expanded that sentence to include Wnt signaling in CNS endothelial cells, as described in the following reference: Lengfeld JE, Lutz SE, Smith JR, Diaconu C, Scott C, Kofman SB, Choi C, Walsh CM, Raine CS, Agalliu I, Agalliu D. (2017) Endothelial Wnt/beta-catenin signaling reduces immune cell infiltration in multiple sclerosis. Proc Natl Acad Sci USA 114:E1168-E1177.

      (8) It may be beneficial for the reader to separate the results of the vascular phenotypes related to choroidal neovascularization compared to retinal vascular development. 

      Thank you for this suggestion.  The two topics are partly overlapping: choroidal neovascularization is described in Figure 1, and retinal development is described in Figures 1 and 2.  The challenge is that some of same images illustrate both phenotypes as in Figure 1, so the topics cannot be easily separated.

      (9) In addition to comparing the phenotypes in Tgfb signaling mutant mice with Wnt signaling and VEGF-A signaling mutants, the authors should compare and contrast their data with those found in Alk5 KO mice, as there are a lot of similarities. 

      The reviewer has alerted us to a nomenclature challenge which we will try to resolve in the introduction: Alk5 is just another name for TGFBR1.  The reviewer is correct: there are a lot of similarities between the present study and that of Zarkada et al (2021) because both use the same TGFBR1(=Alk5) CKO mice.

      Reviewer #2 (Recommendations for the authors): 

      Figure 2 

      For 2B, the authors should clarify whether the two regions shown in the Tgfbr1 KO retina (P14) represent central vs. peripheral areas, as phenotype severity varies. 

      For 2C, does the uneven biotin accumulation reflect developmental gradients (e.g., central-peripheral maturation timing)? 

      Thank you for raising these points.  Regarding Figure 2B, these images are all from the mid-peripheral retina, where the phenotype is moderately severe.  This is now noted in the figure legend.

      Regarding Figure 2C, the reviewer is correct that the pattern of Sulfo-NHS-biotin is uneven in VEcadCreER;Tgfbr1CKO/- retinas – it accumulates only in the tufts.  We have expanded Figure 2C to show a comparison between control (i.e.

      phenotypically WT), NdpKO, and TGFBR1 endothelial KO retinas, and we have expanded the associated part of the Results section.  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is not leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      Figure 6 

      The claim that PECAM1+ rings on veins reflect EC-immune cell binding is uncertain, as PECAM1 is also known to be expressed by immune cells. The complete correlation of PECAM1 and CD45 staining signals suggests that a subset of immune cells upregulates PECAM1. The VEcadCreER;Tgfbr1 flox/-; SUN1:GFP reporter would be helpful to delineate ECimmune cell proximity. Super-resolution imaging with Z-stacks could also resolve spatial relationships (luminal vs. abluminal immune cell adhesion). 

      Thank you for this comment.  The reviewer is correct that, at the resolution of these images, we cannot determine whether the PECAM1 immunostaining signal is derived from ECs, from leukocytes, or from both.  This is now stated in the Results section.  The PECAM1-rich endothelial ring structure associated with leukocyte extravasation has been characterized in various publications, for example in (1) Carman CV, Springer TA. (2004) A transmigratory cup in leukocyte diapedesis both through individual vascular endothelial cells and between them. J Cell Biol 167:377-388 and (2) Mamdouh Z, Mikhailov A, Muller WA. (2009) Transcellular migration of leukocytes is mediated by the endothelial lateral border recycling compartment. J Exp Med 206:2795-2808.  The ring structures visualized in Figure 6D by PECAM1 immunostaining conform to the ring structures described in these and other papers.  In showing these structures, our point is simply that they likely represent sites of leukocyte extravasation.  This is now clarified in the text.  We have also added some additional references on leukocyte extravasation and the ring structures.

      Figure 7 

      A time-course analysis of ICAM1 would strengthen the mechanistic model. Does ICAM1 upregulation precede immune infiltration (supporting inflammation as the primary defect)? Given that immune cells appear by P14 (per snRNA-seq), is ICAM1 elevated earlier? 

      This is an interesting idea, but based on what is known about leukocyte adhesion and extravasation we predict that there will not be a clean temporal separation between ICAM1 induction and leukocyte adhesion/infiltration.  That is, if the proinflammatory state causes an increase in the number of leukocytes, then as ICAM1 levels increase, leukocyte adhesion would also increase.  Similarly, if the presence of leukocytes increases the pro-inflammatory state, then as the number of leukocytes increases, the levels of ICAM1 would be predicted to increase.  Thus, we think that a time course analysis is unlikely to provide a definitive conclusion.

      Figure 8-SF1 

      In brain slices, a transient pan-IgG accumulation suggests a self-resolving defect in the BBB. However, this BBB impairment appears to be spatiotemporally distinct from ICAM1 upregulation. ICAM1 staining is restricted to the lesion site, aligning with immune cell-driven inflammation. 

      Thank you for raising these points.  The reviewer is correct that these observations don’t fit together in a clear way.  There does not appear to be a general increase in brain vascular permeability in VE-cad-CreER;TGFBR1 CKO/- mice, as shown by sulfo-NHS-biotin.  However, there is a large and transient increase in IgG in the brain parenchyma, suggestive of a general vascular alteration, and – as the reviewer correctly notes – it is not accompanied by a generalized increase in ICAM1 vascular immunostaining.  At this point, we don’t have any real insight into the mechanistic basis of the transient IgG increase.

      Thank you for handling this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation". 

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity. 

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types. 

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing diseaseeQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits. 

      We thank the reviewer for their accurate summary of our study and positive review of our findings for immune-related diseases.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al. 

      We thank the reviewer for their positive review of our results and manuscript. As Reviewer #1 noted, whether our and others' observation extends to other diseases or traits is an open question. For instance, Figure 2b in Mostafavi et al., Nat. Genet. (2023) demonstrated that there was a spectrum of depletion of eQTLs and enrichment of GWAS signals in constrained genes across various tissues and traits, respectively. Therefore, gene expression constraint may play a larger or smaller role in different diseases or traits. That immune cell types and cell states are extremely diverse (Schmiedel et al., Cell (2018) and Calderon et al., Nat. Genet. (2019), just to name a few) likely adds to the complexity of gene regulation that contributes to immune-mediated disease.

      Reviewer #2 (Public Review): 

      Summary: 

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin. 

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study. 

      We thank the reviewer for their accurate summary of our results.

      (1) For reproducibility, details are necessary in the method section.

      Details about adding YRI samples in ATAC-seq: For example, how many samples are there, and what is used among public data? There is LCL-derived iPSC and differentiated iPSC (cardiomyocytes) data, not LCL itself. How does this differ from LCL, and what is the rationale for including this data despite the differences?

      Banovich et al., Genome Research (2018) (PMID: 29208628), who generated data using LCLderived iPSCs and differentiated iPSCs (cardiomyocytes), also generated ATAC-seq data from 20 YRI LCL samples. We analyzed those data to identify open chromatin regions (i.e., ATACseq peaks) in LCLs and merged the regions with open chromatin regions identified with 100 GBR LCL samples from two studies by Kumasaka et al. (Nature Genetics (2016)

      PMID: 26656845 and Nature Genetics (2019) PMID: 30478436). However, we restricted the caQTL analysis to only the 100 GBR samples because of possible ancestry effects and batch effects. We attempted caQTL analysis with the 20 YRI samples as well, but the result was noisy, likely due to smaller sample size and lower read depth of the ATAC-seq data.

      caQTL is described as having better power than eQTL despite having fewer samples. How does the number of ATAC peaks used in caQTL compare to the number of gene expressions used in eQTL?

      The number of ATAC peaks used in caQTL (99,320) is ~6.7 times greater than the number of genes (14,872) used in the eQTL analysis. Therefore, there is a higher chance of detecting a significant caQTL signal and a significant colocalization signal than there is for eQTLs. However, we reasoned that since distal eQTLs are more easily detected as caQTLs and since increasing the sample size of eQTLs through meta-analysis uncovered additional eQTL colocalization at loci with caQTL colocalization only, colocalized caQTLs are likely capturing disease-relevant regulatory effects.

      Details about RNA expression data: In the method section, it states that raw data (ERP001942) was accessed, and in data availability, processed data (E-GEUV-1) was used. These need to be consistent.

      Thank you for pointing this out. We used the processed data from Expression Atlas (https://www.ebi.ac.uk/gxa/experiments/E-GEUV-1/Results), and that's what we meant by "We downloaded RNA expression level data of the LCL samples from the Expression Atlas." We have revised the “RNA expression data preparation” section in our manuscript to make the text clearer.

      How many samples were used (the text states 373, but how was it reduced from the original 465, and the total genotype is said to be 493 samples while ATAC has n=100; what are the 20 others?), and it mentions European samples, but does this exclude YRI?

      We thank the reviewer for pointing out these points of confusion. Our reported count of 493 samples included YRI samples with RNA-seq data or ATAC-seq data that we ultimately did not use for QTL analyses. There were 373 European samples with RNA-seq data that we used for eQTL analysis, and 100 GBR samples (including some that overlap with the 373 European samples) that we used for caQTL analysis. We have revised the text to clarify these points.

      (2) Experimental results determining which TFs might bind to the representative signals of caQTL are required.

      We agree that caQTL colocalization is just the start of elucidating the regulatory mechanism of a GWAS locus. Determining which TFs are bound and which TFs' binding is altered would be necessary to describe the causal regulatory mechanism. For this, we utilized the Cistrome database to search for TFs whose binding overlaps the colocalized caQTL peaks. We present the results of this analysis in Supplementary Table 3 and Supplementary Figure 4, both of which we have added in our revised manuscript. Overall, protein factors associated with active transcription, such as POL2RA, and several immune cell TFs, including RUNX3, SPI1, and RELA, were frequently detected in those peaks. Detecting these factors in most peaks supports the likelihood that the colocalized caQTL peaks are active cis-regulatory elements. These results are consistent with our observation of enriched caQTL-mediated heritability in regions with active histone marks (Figure 1).

      (3) It is stated that caQTL is less tissue-specific compared to eQTL; would caQTL performed with ATAC-seq results from different cell types, yield similar results?

      We thank the reviewer for the question. Calderon et al. (PMID: 31570894) observed that "most effects on allelic imbalance (of ATAC-seq) were shared regardless of lineage or condition". Yet, there were regions where a different cell type or state would show inaccessibility (Figure 4d in Calderon et al.). Thus, we expect that ATAC-seq results from different cell types (e.g., T cells, B cells, monocytes, etc.) would lead to additional caQTLs showing colocalization at cell-typespecific open chromatin. However, if a region is accessible in both cell types, caQTL may be detected in both. Moreover, Alasoo et al., Nature Genetics (2018) (PMID: 29379200) observed that “many disease-risk variants affect chromatin structure in a broad range of cellular states, but their effects on expression are highly context specific.” In both studies, the authors investigated immune cell types, and there could be different observations in non-immune cell types and other diseases and traits.

      Reviewer #1 (Recommendations For The Authors): 

      I think it would strengthen the paper to explore gene-level differences in the discovery of caQTLs and eQTLs. For example, complex disease-relevant genes, on average, have more/longer regulatory domains (as shown by Wang and Goldstein, AJHG 2020; Mostafavi et al., Nat. Genet. 2023). Therefore, it is plausible that for such genes, caQTLs are much more easily discoverable than eQTLs due to (i) a larger mutational target size for caQTLs, and (ii) dispersion of expression heritability across multiple domains, which hampers the discovery of eQTLs but not caQTLs, which are studied independently of other domains in the region. In other words, discovered caQTLs and eQTLs likely vary in terms of their distance to genes (as the authors report), as well as their target genes.

      We thank the reviewer for the suggestion to explore gene-level differences. We expect that the effects of complex disease-relevant genes having more / longer regulatory domains, on average, to explain our observations. We agree on both of your points that there are many more regulatory elements that are captured as accessible regions than expressed genes and that genes often have multiple independent eQTLs leading to dispersion of heritability. The genelevel trend that we described was the distance of the regulatory element from the genes. Additional analyses would be a relevant future direction.

      Also considering gene-level analysis, Mostafavi et al. show that the types of biases they report for eQTLs also apply to other molecular QTLs. It would be valuable to compare GWAS hits with versus without caQTL colocalization. Similarly, it would be insightful to compare GWAS hits with both colocalized caQTLs and eQTLs to GWAS hits with colocalized caQTLs but no eQTLs in any of the cell types. 

      We thank the reviewer for the comment. Investigating for potential biases in the colocalized caQTL would be useful, but we considered it beyond the scope of this work. In terms of biological factors, we demonstrated through mediated heritability analyses that more accessible chromatin (based on ATAC-seq read coverage) and regions with active histone marks were enriched for autoimmune disease associations (Figure 1). Furthermore, as greater distance of the regulatory variant from the transcription start site significantly reduced the cis-heritability, we would expect that distance would play a major role, similar to Mostafavi et al.’s conclusions.

      I don't think the argument for the role of natural selection contributing to the "missing regulation" is presented accurately. Specifically, large eQTLs acting on top trait-relevant genes are under stronger selection and thus, on average, segregate at lower frequencies. This makes them difficult to discover in eQTL assays. However, if not lost, they contribute as much, if not more, to trait heritability than weaker eQTLs at the same gene because their larger effects compensate for their lower frequency. At the most extreme, selection should have a "flattening" effect (e.g., see Simons et al., PLOS Biol 2018; O'Connor et al., AJHG 2019): weak and strong eQTLs at the same gene are expected to contribute equally to heritability. Therefore, the statement "Consequently, only weak eQTL variants, often in regions distal to the gene's promoter, may remain and affect traits" is not correct. If this turns out to be empirically true, other models, such as pleiotropic selection, need to explain it. 

      We thank the reviewer for the correction. We agree with the comment and have revised the sentences in the introduction accordingly.

      It is worth speculating why caQTLs may be more consistent across cell types than cis-eQTLs. Additionally, readers may infer from the paper that the focus should shift from eQTLs to caQTLs, which may not be the authors' intention. Perhaps these approaches are complementary: caQTLs can help with TSS-distal disease variants, while finding the target gene and regulatory context is more straightforward with eQTL colocalization. Addressing these points in the discussion will be helpful.

      We appreciate the reviewer's suggestion to clarify the advantages of incorporating cis-eQTLs and caQTLs. Our argument is exactly as you put it, and we added a paragraph on this in the Discussion.

      I believe the authors could do more to contextualize their findings within the existing literature on the subject, particularly Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; and Mostafavi et al., Nat. Genet. 2023. For instance, Umans et al. suggest that "if most standard eQTLs are generally benign, increasing sample size and adding more tissue types in an effort to identify even more standard eQTLs may not help us to explain many more disease risk mutations". Conversely, Mostafavi et al. argue for a multipronged approach, which appears more aligned with the authors' conclusions.

      We followed the reviewer’s suggestion to place our work in the context of existing literature on this topic. Moreover, we clarified what our recommendations for future data generation are.

      I thought Figures 1C-D were unclear. 

      We added a sentence in the figure legend describing that stronger and more significant enrichment indicate that mediated heritability is concentrated in that subset.

      Reviewer #2 (Recommendations For The Authors): 

      Complete workflow figures for caQTL calling and eQTL calling are required. 

      To improve clarity of the caQTL and eQTL calling workflow, we added Supplementary Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1.1) The authors argue that low-level features in a feedback format could be decoded only from deep layers of V1 (and not superficial layers) during a perceptual categorization task. However, previous studies (Bergman et al., 2024; Iamshchinina et al., 2021) demonstrated that low-level features in the form of feedback can be decoded from both superficial and deep layers. While this result could be due to perceptual task or highly predictable orientation feature (orientation was kept the same throughout the experimental block), an alternative explanation is a weaker representation of orientation in the feedback (even before splitting by layers there is only a trend towards significance; also granger causality for orientation information in MEG part is lower than that for category in peripheral categorization task), because it is orthogonal to the task demand. It would be helpful if the authors added a statistical comparison of the strength of category and orientation representations in each layer and across the layers.

      We agree that the strength of feedback information is related to task demand. Specifically, we would like to highlight the relationship between task demand and feedback information in the superficial layer. Previous studies have shown that foveal feedback information is observed only when the task requires the identity information of the peripheral objects (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In this study, we found that the deep layer represented both orientation and categorical feedback information, while the superficial layer only represented categorical information. This suggests that feedback information in the superficial layer may be related to (or enhanced by) the task demands. In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. This assumption is consistent with the anatomical connections of the superficial layer, which not only receives feedback connections but also sends outputs to higher-level regions for further processing. This is also consistent with Iamshchinina et al.’s observation that, when orientation information had to be mentally rotated and reported (i.e., task-relevant), it was observed in both the superficial and deep layers of V1. Bergmann et al. observed illusory color information in the superficial layer of V1, which may reflect a combination of lateral propagation and feedback mechanisms in the superficial layer that support visual filling-in phenomena. We have revised the discussion in the manuscript: In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. Recent studies (Iamshchinina et al., 2021; Bergman et al., 2024) have also highlighted the relationship between feedback information and neural representations in V1 superficial layer.

      To further demonstrate the laminar profiles of low- and high-order information, we have re-analyzed the data and added more fine-scale laminar profiles with statistical comparisons in the revised manuscript. The results again showed significant neural decoding performances in the deep layer of both category and orientation information, and only significant decoding performances of category information in the superficial layer.

      (1.2) The authors argue that category feedback is not driven by low-level confounding features embedded in the stimuli. They demonstrate the ability to decode orientations, particularly well represented by V1, in the absence of category discrimination. However, the orientation is not a category-discriminating feature in this task. It could be that the category-discriminating features cannot be as well decoded from V1 activity patterns as orientations. Also, there are a number of these category discriminating features and it is unclear if it is a variation in their representational strength or merely the absence of the task-driven enhancement that preempts category decoding in V1 during the foveal task. In other words, I am not sure whether, if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding.

      The low-order features mentioned in the manuscript refer to visual information encoded intrinsically in V1, independent of task demands. In the foveal experiment, the task is to discriminate the color of fixation, which is unrelated to the category or orientation of the object stimuli. The results showed that only orientation information could be decoded from foveal V1. This indicates that low-order information, such as orientation, is strongly and automatically encoded in V1, even when it is irrelevant to the task. Meanwhile, category information could not be decoded, indicating that category information relies on feedback signals driven by attention or the task to the objects, both of which are absent in the fixation task. Other evidence indicates that category feedback is not driven by low-level features intrinsically encoded in V1. First, the laminar profiles of these two types of feedback information differ considerably (see response to 1.1). Second, only category feedback information was correlated with behavioral performance (MEG experiment). These findings demonstrate that category feedback information is task-driven and differs from the automatically encoded low-order information in foveal V1. The reviewer expressed some uncertainty that, whether “if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding”. Our data showed that orientation could be automatically decoded in V1, regardless of task demand. Thus, if orientation was a category-specific feature in the foveal task (i.e., sharpies are always horizontal and smoothies are always vertical), category decoding would be successful in V1. However, in this scenario, the orientation and other shape features are not independent, thus preventing us to find out whether non-orientation shape features could be decoded in V1.  

      Reviewer #2 (Public review):

      (2.1) While not necessarily a weakness, I do not fully agree with the description of the 2 kinds of feedback information as "low-order" and "high-order". I understand the motivation to do this - orientation is typically considered a low-level visual feature. But when it's the orientation of an entire object, not a single edge, orientation can only be defined after the elements of the object are grouped. Also, the discrimination between spikies and smoothies requires detecting the orientations of particular edges that form the identifying features. To my mind, it would make more sense to refer to discrimination of object orientation as "coarse" feature discrimination, and orientation of object identity as "fine" feature discrimination. Thus, the sentence on line 83, for example, would read "Interestingly, feedback with fine and coarse feature information exhibits different laminar profiles.".

      We agree that the object orientation (invariant to object category or identity) is defined on a larger spatial scale than the local orientation features such as local edges, however, in this sense, the object orientation is a coarse feature. In contrast, the category-defining information is mainly contributed by the local shape information (i.e., little cubes vs. bumps), which is more fine-scale information. One way to look at this difference is that the object orientation information is mainly carried by low-spatial frequency information and will survive low-pass filtering, hence “coarse”; while the object category information would largely be lost if the objects underwent low-pass spatial filtering.

      We believe the labeling words “low-order” and “high-order” are consistent with the typical use of these terms in the literature, referring to features intrinsically encoded in early visual cortex vs. in high level object sensitive cortical regions. The more important aspects of our results are in their differential engagement in feedforward vs. feedback processing, with low-order features automatically represented in the early visual cortex during feedforward processing while high-order features represented due to feedback processing. Results from the foveal fMRI experiment (Exp. 2) strongly support this assumption that, when objects were presented at the fovea and the task was a fixation color task irrelevant to object information, foveal V1 could only represent orientation information, not category information. Notably, there was a dramatic difference in decoding performance in foveal V1 between Exp.1 and Exp.2, which ruled out the argument that both orientation and category information were driven by local edge information represented in V1.

      (2.2) Figure 2 and text on lines 185, and 186: it is difficult to interpret/understand the findings in foveal ROIs for the foveal control task without knowing how big the ROI was. Foveal regions of V1 are grossly expanded by cortical magnification, such that the central half-degree can occupy several centimeters across the cortical surface. Without information on the spatial extent of the foveal ROI compared to the object size, we can't know whether the ROI included voxels whose population receptive fields were expected to include the edges of the objects.

      The ROI of foveal V1 was defined using data from independent localizer runs. In each localizer run, flashing checkerboards of the same size as the objects in the task runs were presented at the fovea or in the periphery. The ROI of foveal V1 was identified as the voxels responsive to the foveal checkerboards. In other words, The ROI of foveal V1 included the voxels whose population receptive fields covered the entire object in the foveal visual field.

      We included a figure in the revised manuscript comparing the activation maps induced by the foveal object stimulus in the task runs with the ROI coverage defined by the localizer runs. 

      (2.3) Line 143 and ROI section of the methods: in order for the reader to understand how robust the responses and analyses are, voxel counts should be provided for the ROIs that were defined, as well as for the number (fraction) of voxels excluded due to either high beta weights or low signal intensity (lines 505-511).

      In the revised manuscript, we have included the number of voxels in each ROI and the criteria for voxel selection:

      For each ROI, the number of voxels depended on the size of the activated region, as estimated from the localizer data. The numbers are as follows: foveal V1, 2185 ± 389; peripheral V1, 1294± 215; LOC, 3451 ± 863; and pIPS, 5154 ± 1517. To avoid the signals of large vessels, a portion of voxels was removed based on the distribution of large vessels: V1 foveal, 22.5% ± 6.6%; V1 peripheral, 6.8% ± 3.9%; LOC, 16.1% ± 8.1% ; and pIPS, 5.1% ± 3.2%. For the decoding analysis, the top 500 responsive voxels in each ROI were selected to balance the voxel numbers across different ROIs for training and testing the decoder.

      (2.4) I wasn't able to find mention of how multiple-comparisons corrections were performed for either the MEG or fMRI data (except for one Holm-Bonferonni correction in Figure S1), so it's unclear whether the reported p-values are corrected.

      For the fMRI results, there is strong evidence showing that feedback information is sent to the foveal V1 during a peripheral object task (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In addition, anatomical and functional evidence shows that the superficial and deep layers of V1 receive feedback information during visual processing. Therefore, in the current study, we specifically examined two types of feedback information in the superficial and deep layers of foveal V1, and did not apply multiple-comparison correction to the decoding results.

      Regarding the MEG results, since we did not have a strong prior about when feedback information would arrive in the foveal V1, a cluster-based permutation method was used to correct for multiple comparisons in each time course. Specifically, for each time point, the sign of the effect for each participant was randomly flipped 50000 times to obtain the null hypothesis distribution for each time point. Clusters were defined as continuous significant time points in the real and flipped time series, and the effects in each cluster were summed to create a cluster-based effect. The most significant cluster-based effect in each flipped time series was then used to generate the corrected null hypothesis distribution.

      We included these clarifications in Significance testing part of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      It would be helpful if the authors could elaborate more on the fMRI decoding results in higher-order visual areas in the Discussion (there are recent studies also investigating higher-order visual areas (Carricarte et al., 2024) and associative areas (Degutis et al., 2024)) and relate it to the MEG information transmission results between the areas overlapping with the regions recorded in the fMRI part of the study.

      We have discussed the fMRI decoding results in the LOC and IPS in the revised manuscript: 

      In the current study, fMRI signals from early visual cortex and two high-level brain regions (LOC and pIPS) were recorded. Neural dynamics of these regions were extracted from MEG signals. Decoding analyses based on fMRI and MEG signals consistently showed that object category information could be decoded from both regions. These findings raise an important question:  Further Granger causality analysis indicates that the feedback information in foveal V1 was mainly driven by signals from the LOC. Layer-specific analysis showed that category information could be decoded in the middle and superficial layers of the LOC. A reasonable interpretation of this result is that feedforward information from the early visual cortex was received by the LOC’s middle layer, then the category information was generated and fed back to foveal V1 through the LOC’s superficial layer. A recent study (Carricarte et al., 2024) found that, in object selective regions in temporal cortex, the deep layer showed the strongest fMRI responses during an imagery task. Together, the results suggest that the deep and superficial layers correspond to different feedback mechanisms. It is worth noting that other cortical regions may also generate feedback signals to the early visual cortex. The current study did not have simultaneously recorded fMRI signals from the prefrontal cortex, but it has been shown that feedback signals can be traced back to the prefrontal cortex during complex cognitive tasks, such as working memory (Finn et al., 2019; Degutis et al., 2024). Further fMRI studies with submillimeter resolution and whole-brain coverage are needed to test other potential feedback pathways during object processing.

      The behavioral performance seems quite low (67%), could authors explain the reasons for it?

      We designed the object stimuli to be difficult to distinguish on purpose. Some of our pilot data showed that the more involved the participants were in the peripheral object task, the easier the foveal feedback information was to decoded. It is reasonable to assume that if the peripheral objects were easily distinguishable, the feedback mechanism may not be fully recruited during object processing. Furthermore, since we were decoding category and orientation information rather than identity information, the difficulty of distinguishing two objects from the same category and with the same orientation would not affect the decoding of category and orientation information in the neural signals.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 52: the meaning of the sentence starting with "However, ..." is not entirely clear. Maybe the word "while" is missing after the first comma?

      (2) Line 224. If I'm understanding the rationale for the MEG analysis correctly, it was not possible to localize foveal regions, but the cross-location decoding analysis was used to approximate the strength and timing of feedback information. If this is the case, "neural representations in the foveal region" were not extracted.

      (3) Figure 4. The key information is too small to see. The lines indicating where decoding performance was significant are quite thin but very important, and the text next to them indicating onset times of significant decoding is in such a small font size I needed to zoom in to 300% to read it (yes, my eyes are getting old and tired). Increasing the font size used to represent key information would be nice.

      (4) Figure 4 caption. Line 270 describes the line color in the plots as yellow, but that color is decidedly orange to my eye.

      (5) Line 340/341: Papers that define and describe feedback-receptive fields seem important to cite here:

      Keller, A. J., Roth, M. M., & Scanziani, M. (2020). Feedback generates a second receptive field in neurons of the visual cortex. Nature, 582(7813), 545-549.

      Kirchberger, L., Mukherjee, S., Self, M. W., & Roelfsema, P. R. (2023). Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science advances, 9(3), eadd2498.

      (6) Lines 346-350: this sentence seems to have some missing or misused words, because the syntax isn't intact.

      (7) Line 367: supports should be support.

      We thank the reviewers for the comments and have corrected them in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Although the manuscript does have strengths in principle, the weaknesses do exist: the manuscript would benefit from more comprehensive analyses to fully support its key claims in the manuscript. In particular:

      (1) The Western blotting results in Figure 5A & B appear to support the claim that betulin inhibits GABR gene expression (L26), as a decrease in target protein levels is often indicative of suppressed gene expression. The result description for Figure 5A & B is found in L312-L316, within Section 3.6 ("Responses of MpGABR to betulin"), where MST and voltage-clamp assays are also presented. It seems the observed decrease in MpGABR protein content is due to gene downregulation, rather than a direct receptor protein-betulin interaction. However, this interpretation lacks discussion or analysis in either the corresponding results section or the Discussion. In contrast, Figures 5C-F are specifically designed to illustrate protein-betulin interactions. Presenting Figure 5A & B alongside these panels might lead to confusion, as they support distinct claims (gene expression vs. protein binding/inhibition). Therefore, I recommend moving Figure 5A & B either to the end of Figure 3 or to a separate figure altogether to improve clarity and logical flow. A minor point in the Western blotting experiment is that although GAPDH was used as a reference protein, there is no explanation in the corresponding M&M section.

      We thank the reviewer for the concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses.

      (A) According to your suggestion, the original Figure 5A and B have been inserted into Figure 3, following Figure 3D. The original Figure 3E-I has been saved as a new figure, to illustrate the RNAi assay.

      (b) “GAPDH was used as a reference protein” has been supplied in the M&M section, see

      Line 209.

      (2) The description of the electrophysiological recording experiment is unclear regarding the use of GABA. I didn't realize that GABA, the true ligand of the GABA receptor, was used in this inhibition experiment until I reached the Results section (L321), which states, "In the presence of only GABA, a fast inward current was generated." Crucially, no details are provided on the experiment itself, including how GABA was applied (e.g., concentration, duration, whether GABA was treated, followed by betulin, or vice versa). This information is essential for reproducibility. Please ensure these details are thoroughly described in the corresponding M&M section.

      We thank the reviewer for the valuable comments.

      (a) Detailed information on how to apply GABA has been added to the corresponding M&M section (Lines 260-263): After 3 days of incubation, the oocytes were used for electrophysiological recording. GABA was dissolved in 1 × Ringer's solution to prepare 100 µM GABA solution. Subsequently, the 100 µM GABA solutions containing different concentrations of betulin (0, 5, 10, 20, 40, 80, 160, 320 µM) were used to perfuse the oocytes.

      (b) Additionally, we also checked other contents of M&M section to ensure that sufficient detail has been supplied.

      (3) The phylogenetic analysis, particularly concerning Figures 4 and 6B, needs significant attention for clarity and representativeness. First, your claim that MpGABR is only closely related to CAI6365831.1 (L305-L310) is inconsistent with the provided phylogenetic tree, which shows MpGABR as equally close to Metopolophium dirhodum (XP_060864885.1) and Acyrthosiphon pisum (XP_008183008.2). Therefore, singling out only Macrosiphum euphorbiae (CAI6365831.1) is not supported by the data. Second, the representation of various insect orders is insufficient. All 11 sequences in the Hemiptera category (in both Figure 4 and Figure 6B) are exclusively from the Aphididae family. This small subset cannot represent the highly diverse Order Hemiptera. Consequently, statements like "only THR228 was conserved in Hemiptera" (L338), "The results of the sequence alignment revealed that only THR228 was conserved in Hemiptera" (L430), or "THR228... is highly conserved in Hemiptera" (L486) are not adequately supported. Third, similar concerns apply to the Diptera order, which includes 10 Drosophila and 2 mosquito samples (not diverse or representative enough), and likely to other orders as well. Thereby, the Figure 6B alignment should be revised accordingly to reflect a more accurate representation or to clarify the scope of the analysis. Fourth, there's a discrepancy in the phylogenetic method used: the M&M section (L156) states that MEGA7, ClustalW, and the neighbor-joining method were used, while the Figure 4 caption mentions that MEGA X, MUSCLE, and the Maximum likelihood method were employed. This inconsistency needs to be clarified and made consistent throughout the manuscript. Fifth, I have significant concerns about the phylogenetic tree itself (Figure 4). A small glitch was observed at the Danaus plexippus node, which raises suspicion regarding potential manipulation after tree construction. More critically, the tree, especially within Coleoptera, does not appear to be clearly resolved. I am highly concerned about whether all included sequences are true GABR orthologs or if the dataset includes partial or related sequences that could distort the phylogeny. Finally, for Figure 6B, both protein (XP_) and nucleotide (XM_) sequences were mix used. I recommend using the protein sequences instead of nucleotide sequences in this figure panel, as protein sequences are more directly informative.

      We thank the reviewer for the careful reading and valuable comments.

      (a) Firstly, according to your comments, phylogenetic analysis has been re-performed with more represent species from each Order (Fig. 5 and Fig. 7B). The results revealed that only THR228 was conserved across 11 species in the Aphididae family of Hemiptera. Therefore, the expressions like "only THR228 was conserved in Hemiptera" have been revised to “among the four residues, only THR228 was conserved across 11 species in the Aphididae family of Hemiptera” (Line 106, Line 369, Line 477, and Lines 563-564).

      (b) We have modified the description of Fig. 5 (the original Fig. 4): MpGABR  (XP_022173711.1) was found to be genetically closely related to CAI6365831.1 from Macrosiphum euphorbiae, XP 008183008.2 from Acyrthosiphon pisum, and XP 060864885.1 from Metopolophium dirhodum (Fig. 5 and Table S6). See Lines 342-346.

      (c) Phylogenetic analysis was performed using MEGA7 with multiple amino acid sequence alignment (ClustalW) and the neighbor-joining method. We have revised the Fig. 5 (the original Fig. 4) caption to make it accurate and consistent throughout the manuscript.

      (d) We are sorry about the small glitch at the Danaus plexippus node. Actually, after the phylogenetic tree was constructed, it was imported in Adobe Illustration for coloring and classification annotation. There may have been operational errors during the process of resizing the image, resulting in the occurrence of the small glitch. Besides, the unclear clustering of Coleoptera may be due to improper regulation of distance (pixels) of branch from nodes. Again, thanks for your careful reading. We have rebuilt the phylogenetic tree.

      (e) Based on your suggestion, the sequence IDs have been unified as the protein sequence IDs (Fig. 5, Fig. 7B and Table S6)

      (4) The Discussion section requires significant revision to provide a more insightful and interpretative analysis of the results. Currently, much of the section primarily restates findings rather than offering deeper discussion. For instance, L409-L419 restate the results, followed by the short sentence "Collectively, these results suggest that betulin may have insecticidal effects on aphids by inhibiting MpGABR expression". It could be further expanded to make it beneficial to elaborate on proposed mechanisms by which gene expression might be suppressed, including any potential transcription factors involved. In contrast, while L422-L442 also initially summarize results, the subsequent paragraph (L445-L472) effectively discusses the potential mechanisms of inhibitory action and how mortality is triggered, which is a good model for other parts of the section. However, all the discussion ends up with a short statement, "implying that betulin acts as a CA of MpGABR" (L472), which appears to be a leap. The inference that betulin acts as a competitive antagonist (CA) is solely based on the location of its extracellular binding site, which does not exactly overlap with the GABA binding site. It needs stronger justification or actually requires further experimental validation. The authors should consider rephrasing this statement to acknowledge the need for additional studies to definitively confirm this mechanism of action.

      We appreciate the reviewer's careful reading and valuable feedback, which will certainly enhance the quality of our manuscript.

      (a) Possible reasons for the effect of betulin on MpGABR expression have been discussed in our manuscript (Lines 455-466): The regulation of gene expression is sophisticated and delicate (Pope and Medzhitov 2018). The regulatory network controlling GABR expression remains unclear. In adult rats, epileptic seizures has been reported to increase the levels of brain-derived neurotrophic factor (BDNF), which in turn prompted the transcription factors CREB and ICER to reduce the gene expression of the GABR α1 subunit (Lund et al. 2008). In Drosophila, it has been demonstrated that WIDE AWAKE, which regulated the onset of sleep, interacted with the GABR and upregulated its expression level (Liu et al. 2014). In Drosophila brain, circular RNA circ_sxc was found to inhibit the expression of miR-87-3p in the brain through sponge adsorption, thereby regulating the expression of neurotransmitter receptor ligand proteins, including GABR, and ensuring the normal function of synaptic signal transmission in brain neurons (Li et al. 2024). However, it remains unclear how betulin reduces the expression of MpGABR, and further research is needed.

      (b) In the Discussion section, we acknowledged the need for further research to ultimately confirm the mechanism by which betulin competes with GABA for binding to MpGABR (Lines 532-535): Although the mechanism by which betulin competes with GABA for binding to MpGABR requires further experimental validation, our work may have provided a novel target for developing insecticides.

      (c) Besides, we have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (d) Furthermore, the discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and

      potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Weaknesses:

      There are several important limitations that need to be addressed. The manuscript does not explore whether the observed sensitivity to betulin reflects a broadly conserved feature of GABA receptors across animal lineages or a more lineage-specific adaptation. This evolutionary context is crucial for understanding the broader significance of the findings.

      In addition, while the compound's aphicidal effect is well established, the potential for off-target effects in non-target organisms - especially vertebrates - remains unaddressed, despite prior evidence that betulin interacts with mammalian GABAa receptors. There is little discussion on the ecological or environmental safety of exogenous betulin application, such as persistence, degradation, or exposure risks.

      We sincerely thank the reviewer for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we have provided a point-by-point response to address them.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (c) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #1 (Recommendations for the authors):

      (1) L28 Provide the full name of MST.

      Thanks for your suggestion. The full name of MST, microscale thermophoresis, has been supplied.

      (2) L87 in the Order Hemiptera.

      Thanks for your suggestion. Corrected.

      (3) L99 "Leaf bioassay" would be better to differentiate the greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (4) L104 It should be 7 doses, including the "0 mg/mL" control.

      Thanks for your suggestion. Corrected.

      (5) L104 Since the LC50 of pymetrozine is 1.0612 mg/mL, a wider range of doses should have been tested compared to the dose range of betulin.

      Thanks for your comment.

      (a) Firstly, seven doses (0, 0.0625, 0.125, 0.25, 0.5, 1, and 2 mgmL<sup>-1</sup>) were set to calculate the LC50 of betulin and pymetrozine. Since the LC50 values of betulin and pymetrozine are 0.1641 and 1.0612 mgmL<sup>–1</sup>, respectively, which are within the set range, indicating that the set dose range is reasonable and the LC50 values of betulin and pymetrozine are reliable.

      (b) To compare the control effects of betulin and pymetrozine against M. persicae, LC50 of betulin (0.1641 mgmL<sup>-1</sup>) and pymetrozine (1.0612 mgmL<sup>-1</sup>) were used to treat M. persicae.

      (6) L109 Greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (7) L112 Tween-80 and acetone in L103. Keep the order consistent throughout the manuscript.

      Thanks for your suggestion. Corrected.

      (8) L122 Mortality was recorded at 1, 5, 9, and 14 days after treatment. Revise the other similar mistakes throughout the manuscript (e.g. L250, L254, L255, L256, L259, etc.).

      Thanks for your suggestion. Corrected.

      (9) L126 apterous instead of wingless (keep a consistent expression).

      Thanks for your suggestion. Corrected.

      (10) L138 Primer Premier?

      Thanks for your comment. Corrected.

      (11) L141 Add RPS18 primers in Table S2.

      Thanks for your comment. Corrected.

      (12) L155 MEGA7 vs. MEGAX (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (13) L156 NJ method vs. ML method (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (14) L157 2.7. RNAi assay (Remove "In vitro" and re-number the following M&M sections accordingly).

      Thanks for your comment. Corrected.

      (15) L163 Add dsGFP primers in Table S2.

      Thanks for your comment. Corrected.

      (16) L166 apterous instead of wingless (keep a consistent expression).

      Thanks for your comment. Corrected.

      (17) L172 Add the source of pET-B2M vector.

      pET-B2M vector was obtained from BGI (Shenzhen, China), which has been added in our manuscript (Line 194).

      (18) L195 coding sequence instead of cDNA.

      Thanks for your comment. Corrected.

      (19) L198 the mutations of R224A ...

      Thanks for your comment. Corrected.

      (20) L199 TYR), or T228R ...

      Thanks for your comment. Corrected.

      (21) L211 and 90 ng.

      Thanks for your comment. Corrected.

      (22) L213 genomic DNA instead of gDNA, because gDNA may be confused in the context of sgRNA.

      Thanks for your suggestion. Corrected.

      (23) L253 (Fig. 1A-B).

      Thanks for your comment. Corrected.

      (24) L268 Explain why these 15 DEGs were selected for qRT-PCR.

      Thanks for your comment. These 15 DEGs were randomly selected and act as representative DEGs with different expression levels. The reason for selection of these 15 DEGs were added in the manuscript (Lines 295-296).

      (25) L287 What about GABRB? It has a TM domain.

      GABRB refers to “gamma-aminobutyric acid receptor subunit beta-like” annotated on NCBI. Theoretically, it should contain four transmembrane structural domains, while it has only one, indicating that it is incomplete.

      (26) L297 Add dsGFP as another control group.

      Thanks for your comment. Corrected.

      (27) L299 increased by 30.44% (Remove a comma).

      Thanks for your comment. Corrected.

      (28) L308 XM_022318019.1 (or protein accession number with XP_).

      Thanks for your comment. Corrected.

      (29) L338 that THR228 was conserved only in Hemiptera.

      Thanks for your comment. Since our original intention was to emphasize that THR228 is the only conserved among the four key amino acid residues, after careful consideration, we retained the expression "only THR228".

      (30) L342 or T228R.

      Thanks for your comment. Corrected.

      (31) L382 Is pyrhidone a general name for pymetrozine?

      Thanks for your comment. Corrected.

      (32) L450 Remove "and so on".

      Thanks for your comment. Corrected.

      (33) Figure 1D: Remove "Environment friendly". Replace the plant pot image on the right side with the one sprayed with pymetrozine, like the one in Figure 1F.

      Thanks for your comment. 

      (a) "Environment friendly" in Figure 1D has been removed.

      (b) We have attempted to modify the Figure 1D according to your suggestion. However, the modified Figure 1D is similar to Figure 1F and appears monotonous. Therefore, we have retained the original framework of Figure 1D.

      (34) Figure 2E 111036117 and 111041856 are in different IDs (XM_). I suggest keeping GeneID in Figure 2E and Table S2, as shown in Table S4.

      Thanks for your comment. Corrected.

      (35) Figure 2H: Add unit of the heatmap values. Or just add the title (e.g., expression level) on top of the bar.

      Thanks for your comment. Corrected.

      (36) Figure 3A: Add "aa" next to 700.

      Thanks for your comment. Corrected.

      (37) Figure 3E-G: Revise the tick marks on Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (38) Figure 5C: Remove "1" and move "WT" up to the position where "1" was.

      Thanks for your comment. Corrected.

      (39) Figure 5D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (40) Figure 5E: Remove the decimal. (e.g. 5 uM, 10 uM, 20 uM, etc.).

      Thanks for your comment. Corrected.

      (41) Figure 6B: What are the numbers next to the amino acid sequences? Provide the information in the figure caption.

      Thanks for your comment. The numbers next to the amino acid indicates the site of the last residue of the key amino acids, which was supplied in the figure caption.  

      (42) Figure 6D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5. The X-axis title should be betulin (see Figure 5D). In the figure caption at the 5th row from the top, R244A should be R224A.

      Thanks for your comment. Corrected.

      (43) Figure 7E: R122T (not R1272T).

      Thanks for your comment. Corrected.

      (44) Supplementary Figure 1: It should be Figure S1. Add dsGFP in the figure caption.

      Thanks for your comment. Corrected.

      (45) Figure S2: What are the two pink bars and the other bars in brown or blue? Add an appropriate explanation in the figure caption.

      Thanks for your comment. Corrected.

      (46) Table S1: r square?

      Thanks for your comment. It is “r square” and corrected.

      (47) Table S2: (a) Add horizontal lines to separate qPCR, RNAi, cloning, and heterologous expression from each other (b) Replace XM_022318017.1 and XM_022318019.1 with their corresponding GeneIDs, as shown in Table S4. (c) AK340444.1 is a sequence from another aphid (Acyrthosiphon pisum)-Revise it. (d) In the cloning primers, place MpGABR first, followed by MpGABRAP and MpGABRB, as shown in the manuscript and Table S5. (e) Also, in the cloning primers, MpGABRB and MpGABRAP use reverse primers without stop codon, while MpGABR uses stop codon (TCA = TGA in reverse)-Revise it accordingly. Otherwise, provide the reason.

      Thanks for your comment. Corrected.

      (48) Table S3: (a) Add "Drosophila melanogaster" and the target sequence ID in the table caption. Is it KF881792.1, as shown in Table S6? (b) Align the sequences to the left side. 

      Thanks for your comment. 

      (a) The GenBank number of target sequence is KF881792.1 (Drosophila melanogaster). We have added this information in the Table S3 note.

      (b) It has been adjusted according to your suggestion.

      (49) Table S5: (a) Replace the accession numbers with GeneID, as shown in Table S4. K340444.1 is a sequence from another aphid (Acyrthosiphon pisum), (b) Coding sequences with stop codon are 2082, 357, and 753, respectively, while the sequences without stop codon are 2079, 354, and 750, respectively. The lengths of the deduced amino acids are 693, 118, and 250. Revise accordingly.

      Thanks for your comment. Corrected.

      (50) Table S6: (a) Use GenBank No for protein sequences. There is no Gene ID in this table. (b) Order (instead of Class). (c) See my comment on the phylogenetic analysis above.

      Thanks for your comment. Corrected.

      (51) Table S7 (a) Add unit under "Binding Energy". (b) There are two ALA226 [Alkyl] with two different distances. (c) PHE227 at the bottom should be THR228?

      Thanks for your comment.

      (a) The unit of "Binding Energy" was kcalmol<sup>–1</sup>, and it was added in the table caption.

      (b) Refer to Figure 6A, there were two Alkyl interaction between ALA226 and betulin. Therefore, there were two ALA226 [Alkyl] with two different distances.

      (c) Similarly, there were two Pi-Alkyl interactions between PHE227 and betulin. Thus, there were two rows of PHE227 in the table.

      (52) Table S9 (a) R117T should be R122T. (b) r square?

      Thanks for your comment. a and b Corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction

      (a) It lacks a deeper biological and evolutionary framing of the GABA receptor system. As GABA receptors are highly conserved across animal taxa, the observed interaction between betulin and the aphid GABA receptor could have broader implications. This possibility is not addressed in the current version, which limits the reader's appreciation of the relevance of this mode of action.

      (b) Previous reports of betulin activity in mammalian systems are not mentioned in the introduction, even though they are directly relevant to concerns about off-target toxicity. Therefore, the introduction should be revised to (i) briefly introduce the evolutionary conservation of GABA receptors, and (ii) acknowledge that betulin may affect a broader range of organisms, which sets up the need for caution in its application.

      Thanks for your important suggestions.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) The possible effects of betulin on a broader range of organisms have been acknowledged in the Introduction section (Lines 68-77): An immune stimulant, Ir-Bet, was prepared using iridium complex and betulin, which evoked ferritinophagy-enhanced ferroptosis, thereby activating anti-tumor immunity (Lv 2023). The anti-inflammatory effect of betulin has been reported in macrophages at lymphoma site in mice (Szlasa et al. 2023). Betulin has been found to improve hyperlipidemia and insulin resistance and decrease atherosclerotic plaques by inhibiting the maturation of sterol regulatory element-binding protein (Tang et al. 2011). Besides, betulin and its derivatives have been found to exhibit insecticidal activity against Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024).

      (c) At the end of the introduction, we remind that betulin should be used with caution (Lines 111-112): However, given that betulin may affect a wider range of organisms, it should be used with caution.

      (2) Method

      Number of biological replicates in all assays and justification of thresholds used for significance in RNAi and survival experiments are not addressed in the manuscript.

      Thanks for your careful reading. We have checked Materials and Methods section and added corresponding number of biological replicates in all assays. Besides, the p-values for the corresponding significance analyses of RNAi and survival experiments have been added to our Manuscript.

      (2)  Discussion

      (a) Consistent with the comments on the Introduction, the absence of discussion on (i) the evolutionary conservation of GABA receptor sensitivity to betulin, (ii) potential off-target effects in non-target insects and vertebrates (if so, this cannot be use for "eco-friendly pesticide" as the authors stated in the manuscript), and (iii) ecological risks associated with the exogenous application of betulin limits both the interpretive depth and applied relevance of the study.

      (b) To strengthen the Discussion, the authors should consider addressing: (i) whether the observed sensitivity reflects a conserved pharmacological vulnerability across animal taxa or a lineage-specific adaptation; (ii) the potential ecological risks of deploying betulin as a bioinsecticide, and (iii) the need for future research into the environmental fate, degradation, and safety profile of betulin prior to any field-level application.

      Thank you for your valuable comments.

      (a) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (b) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-551): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin had specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. 

      (c) Additionally, at the end of the Discussion, we remind that more research is needed before any field application of betulin (Lines 551-553): In summary, before any field application, further research on the environmental behavior, degradation process, and safety of betulin is needed.

      Reference

      Amiri S, Dastghaib S, Ahmadi M, Mehrbod P, Khadem F, Behrouj H, Aghanoori M, Machaj F, Ghamsari M, Rosik J, Hudecki A, Afkhami A, Hashemi M, Los M, Mokarram P, Madrakian T, Ghavami S. 2020. Betulin and its derivatives as novel compounds with different pharmacological effects. Biotechnology Advances 38: 107409.

      de Almeida Teles AC, dos Santos BO, Santana EC, Durço AO, Conceição LSR, Roman Campos D, de Holanda Cavalcanti SC, de Souza Araujo AA, dos Santos MRV. 2024.

      Larvicidal activity of terpenes and their derivatives against Aedes aegypti: a systematic review and meta-analysis. Environmental Science and Pollution Research 31: 64703-64718.

      Guo L, Qiao X, Haji D, Zhou T, Liu Z, Whiteman NK, Huang J. 2023. Convergent resistance to GABA receptor neurotoxins through plant–insect coevolution. Nature Ecology & Evolution 7: 1444-1456.

      Haddi K, Turchen LM, Viteri Jumbo LO, Guedes RN, Pereira EJ, Aguiar RW, Oliveira EE. 2020. Rethinking biorational insecticides for pest management: unintended effects and consequences. Pest Management Science 76: 2286-2293.

      Huang X, Hao N, Shu L, Wei Z, Shi J, Tian Y, Chen G, Yang X, Che Z. 2025. Preparation and insecticidal activities of betulin-cinnamic acid-related hybrid compounds and insights into the stress response of Plutella xylostella L. Pest Management Science 81: 4243-4255.

      Lee HY, Min KJ. 2024. Betulinic acid increases the lifespan of Drosophila melanogaster via Sir2 and FoxO activation. Nutrients 16: 441.

      Li Q, Wang L, Tang C, Wang X, Yu Z, Ping X, Ding M, Zheng L. 2024. Adipose tissue exosome circ_sxc mediates the modulatory of adiposomes on brain aging by inhibiting brain dme-miR-87-3p. Molecular Neurobiology 61: 224-238.

      Li Y, Wang Y, Gao L, Tan Y, Cai J, Ye Z, Chen A, Xu Y, Zhao L, Tong S, Sun Q, Liu B, Zhang S, Tian D, Deng G, Zhou J, Chen Q. 2022. Betulinic acid self-assembled nanoparticles for effective treatment of glioblastoma. Journal of Nanobiotechnology 20: 39.

      Liu S, Lamaze A, Liu Q, Tabuchi M, Yang Y, Fowler M, Bharadwaj R, Zhang J, Bedont J,

      Blackshaw S, Lloyd Thomas E, Montell C, Sehgal A, Koh K, Wu Mark N. 2014. WIDE AWAKE mediates the circadian timing of sleep onset. Neuron 82: 151-166.

      Lund IV, Hu Y, Raol YH, Benham RS, Faris R, Russek SJ, Brooks Kayal AR. 2008. BDNF selectively regulates GABAA receptor transcription by activation of the JAK/STAT pathway. Science Signaling 1: ra9.

      Lv M, Zheng Y, Wu J, Shen Z, Guo B, Hu G, Huang Y, Zhao J, Qian Y, Su Z, Wu C, Xue X, Liu H, Mao Z. 2023. Evoking ferroptosis by synergistic enhancement of a cyclopentadienyl iridium-betulin immune agonist. Angewandte Chemie International Edition 62: e202312897.

      Nakao T, Banba S. 2021. Important amino acids for function of the insect Rdl GABA receptor. Pest Management Science 77: 3753-3762.

      Pope SD, Medzhitov R. 2018. Emerging principles of gene expression programs and their regulation. Molecular Cell 71: 389-397.

      Szlasa W, Ślusarczyk S, Nawrot Hadzik I, Abel R, Zalesińska A, Szewczyk A, Sauer N, Preissner R, Saczko J, Drąg M, Poręba M, Daczewska M, Kulbacka J, Drąg Zalesińska M. 2023. Betulin and its derivatives reduce inflammation and COX-2 cctivity in macrophages. Inflammation 46: 573-583.

      Tang JJ, Li JG, Qi W, Qiu WW, Li PS, Li BL, Song BL. 2011. Inhibition of SREBP by a small molecule, betulin, improves hyperlipidemia and insulin resistance and reduces atherosclerotic plaques. Cell Metabolism 13: 44-56.

      Tsang SY, Ng SK, Xu Z, Xue H. 2006. The evolution of GABAA receptor–like genes. Molecular Biology and Evolution 24: 599-610.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylationdependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy-dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to nonequilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity in ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from the interplay between two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the non-monotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). 

      The novel benefit of the model introduced by the authors is that it also achieves a nonmonotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      We thank the reviewer for this comment. We agree that the ability of our model to reproduce non-monotonic dependence on kinase/phosphatase activity was not sufficiently motivated in the original submission. We have now added a brief mention of the biological motivation for nonmonotonic kinase activity in the discussion (lines 229-247) to describe the potential biological significance of this behavior. In particular, non-monotonic kinase/phosphatase dependence may act as a safeguard, filtering out signaling cells with abnormally elevated kinase activity or suppressed phosphatase activity. In the presence of non-monotonic dependence on network activity, downstream signaling would remain contingent on extracellular cues, and cells with extreme kinase/phosphatase imbalances would fail to signal. This could prevent persistent, cueindependent activation, an especially important protective mechanism in pathways regulating metabolically taxing functions such as growth, proliferation, or mounting immune responses. Although direct experimental evidence for the widespread use of this mechanism is currently scarce, our motivation is supported both by the presence of similar regulatory behaviors of phosphatases which arise through distinct mechanisms (such as CD45 in T-cell receptor signaling, (Weiss, 2019)), but highlight the potential biological use of this strategy and by theoretical work on phosphorylation-dephosphorylation cycles, which demonstrates a similar effect in more general settings (Swain, 2013).

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the authors is missing (see line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      We thank the reviewer for identifying this oversight, it has been corrected. See Figure 3 in the new text. 

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      We thank the reviewer for this comment. We in fact study all sites (Figure 5A in the resubmitted manuscript). Notably, as suggested by the reviewer, the concentration of the first site is indeed represented by the sum of concentrations of all phosphorylated species. The concentration of the 2<sup>nd</sup> site is represented by the sum of concentrations of all species except for the first one and so on (lines 153-155). 

      Reviewer #2 (Public review):

      Summary:

      In classical models of signaling networks, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-anity complexes be more prone to degrade. This particular type of kinetic discrimination allows for overcoming equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. Moreover, the authors vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. Finally, they provide a testable prediction.

      Weaknesses:

      The naming of certain variables may be confusing to readers.

      We apologize for the confusion due to unclear presentation. We have clarified our definitions throughout the manuscript. 

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract and introduction present the problem as if this model is solving the fundamental problem of non-monotonic dependence on ligand affinity. However, as the authors noted in their results, this problem has already been solved by a previous phosphorylation model with N-state degradation. What the authors' new model achieves is the additional experimentally observed non-monotonicity of kinase activity dependence. The abstract and introduction should be changed to reflect the actual novel contributions and also to motivate the biological significance of non-montonic kinase activity dependence.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the nonmonotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). We have also provided biological motivation behind nonmonotonic kinase activity dependance (lines 229-247). 

      (2) It is important to show (in the supplemental materials if needed) that the closest equilibrium analog to the model (for example, reversible rate constants from each of the activated states to an inactive state) does not achieve non-monotonicity with ligand affinity.

      We have added a model in the supplementary materials that represents a detailed balance Markov chain. In the model, we imagine that ligand bound receptors undergo a series of equilibrium transitions, all characterized by the same activation and inactivation rate. We show that at saturating ligand levels, the signaling output only depends on the ratio of the activation to the inactivation rate (i.e., the thermodynamic stability of the active site) (lines 466-488).

      (3) Schematics for earlier models are described in Figure 1. However, no schematic for the actual model proposed by the authors is shown. This should be added as a subpanel to Figure 1.

      We thank the reviewer for identifying our omission of our model schematic. We have included our model schematic as its own figure (Figure 3).

      (4) Minor: Figure 1 is referred to as Figure?? In line 97 of page 3.

      We thank the reviewer for identifying this error, it has been corrected. 

      Reviewer #2 (Recommendations for the authors):

      (1) There is an inconsistency between Figure 2(a) and Equation (1), it suggests that p_N is \omega^N/(\omega+\delta)^N. This makes more sense with the model defined in the supplementary material.

      We thank the reviewer for identifying this error. Equation (1) has been updated to reflect the correct relationship.

      (2) The figure presenting the model of the authors appears to be missing.

      We thank the reviewer for identifying this error, it has been corrected (Figure 3 in the new manuscript). 

      (3) The authors describe phosphorylation as irreversible in the intro, but then consider reversible phosphorylation in their model, which may be confusing to readers.

      We thank the reviewer for identifying this source of possible confusion. We have clarified that dephosphorylation is taken to be a distinct irreversible reaction, see lines 105 - 112.

      (4) The authors reuse similar names, e.g., network activity, kinase activity, signaling activity, activity. This is confusing.

      We apologize for the confusion. We note that, within the context of our model, there are important distinctions between signaling activity (the amount of signaling competent receptors) and kinase activity (value corresponding to the phosphorylation rate). We have attempted to use these different terms correctly and are happy to make clarifying corrections if there are any places where a term is misused.  

      (5) Several parameters are defined only in the captions of the figures, such as \beta and \rho.

      We thank the reviewer for identifying this omission, we have added the definitions of beta and rho to the main text (see line 129). 

      (6) The sentence at line 137 lacks some words: "Below, we kinetic...".

      We thank the reviewer for identifying this error, we have added the missing words (“Below, we show how kinetic…”).

      (7) The sentence at line 183 lacks some words: "When kinase activity...".

      We thank the reviewer for identifying this error. We have now corrected it. 

      (8) Figure 5 is very small.

      We will work with the production team to increase the size of this figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization, and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity-dependent induction and time course of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial of contextual fear conditioning. In a series of ex vivo experiments, the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAMlabelled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted.

      Reviewer #2 (Public review): 

      Summary: 

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths: 

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses: 

      With regard to the new viral tool, a direct comparison between the new tool FLEN and existing cFos reporters is missing. 

      Reviewer #1 (Recommendations for the authors):

      I have only minor suggestions for the authors to consider. 

      (1) In the in vitro characterization, the percentage of labelled neurons seems very low after a powerful and prolonged activation. It was somewhat surprising and raised the question of how accurately the FLEN construct reflects endogenous cFOS activity. Could the authors speak to this?

      The reviewer is correct that the level of FLEN positive neurons, as compared to mCherry positive neurons, is low as compared to studies using viral infection with RAM vectors in neuronal cultures (Sorensen et al, 2016, Sun et al, 2020), which is around 70-80% following chemical stimulation. The authors do not provide evidence however for a comparison with endogenous c-Fos activity in cell cultures. The reason for a discrepancy in the effect of chemical stimulation of cultured neurons is not clear, but may depend on culture conditions which may vary between labs. 

      FLEN was constructed using a mouse c-Fos promoter (-355 to +109) (Cen et al, 2003). To answer the reviewer’s question we performed an additional experiment in cultured neurons in which we found that 77.1 % of FLEN positive neurons were also c-fos positive neurons (using immunocytochemistry).

      (2) The authors compare the two labelling strategies and interpret their data with the presumption that both label a similar set of active neurons. This is particularly relevant when they suggest there might be a progressive increase in the excitability of active neurons with time. This is certainly a possibility, but the authors should also consider other possibilities that the two markers might label different populations of neurons. For example, if they require different thresholds for activation, it is possible that one is more sensitive to activity than the other. As these are unknown variables the authors should temper the interpretation accordingly.

      Indeed, the reviewer is correct that this limitation should be discussed. We have added this as a point of discussion in the text (line 355-358). In the article describing the RAM strategy (Sorensen et al, 2016) the authors use RAM to label DG neurons activated during an experience in a context A (Figure 4). Exploiting the fact that engram cells are re-activated when the animal is re-exposed to the same environment of training (memory recall), they performed c-Fos staining 90 minutes following either context A or context B re-exposure. The RAM-c-Fos overlap percentage was higher in A-A rather than A-B (A-A was a bit more than 20%). This means that RAM has captured a group of cells during training that, at least in part, were re-activated during recall. This could in part support the assumption that RAM and c-Fos share a certain overlap. Of course, this was done in DG, while we worked in CA3. In addition, both strategies label in their great majority c-Fos+ neurons (see above answer to point #1). This can not completely rule out the possibility that FLEN and RAM label partly distinct population of activated cells. 

      (3) An increase in the frequency of synaptic events is observed in neurons labelled with both markers. The authors propose that this may be due to an increase in synaptic contacts based on prior studies. However, as this is the first functional assessment why not consider changes in release probability as a mechanism for this finding? 

      We have added this as a possibility in the text (line 362-363).

      (4) It would be useful to include plots of the average frequency of m/sEPSCs and m/sIPSCs in Figures 4 and 5. These figures could also be combined into a single figure.

      We agree with the reviewer that figure 4 and 5 could be merged into a single figure. In the revised version, figure 5A becomes panel C in figure 4. Text and figure descriptions were adjusted accordingly.

      Reviewer #2 (Recommendations for the authors): 

      (1) Abstract, line 24: "In contrast, FLEN+ CA3 neurons show an increased number of excitatory inputs." RAM+ neurons also show an increased number of excitatory inputs, so this is not "in contrast". Also, not just excitatory, but also inhibitory synaptic inputs are more numerous in cFos+ neurons. Please improve the summary of your findings.

      “In contrast” referred to the fact that FLEN+ neurons do not show differences in excitability as compared to FLEN- neurons, as mentioned in the previous sentence. We now provide a more explicit sentence to explain this point: “On the other hand, like RAM+ neurons, FLEN+ CA3 neurons show an increased number of excitatory inputs.”

      (2) Novel tool: Destabilized cFos reporters were introduced 23 years ago and are also part of the TetTag mouse. I am not sure that changing the green fluorescent protein to a different version merits a new acronym (FLEN). To convince the readers that this is more than a branding exercise, the authors should compare the properties (brightness, folding time, stability) of FLEN to e.g. the d2EGFP reporter introduced by Bi et al. 2002 (J Biotechnol. 93(3):231) and show significant improvements.

      We thank the reviewer for this comment which compelled us to evaluate the features of other tools used to label neurons activated following contextual fear conditioing. The key properties of FLEN as compared to other tools used to label engrams is that: (i) it is a viral tool, as opposed to transgenic mice, (ii) a c-fos promoter drives the expression of a brightly fluorescent protein allowing their identification ex vivo for functional analysis, (iii) the fluorescent protein is rapidly destabilized, providing the possibility to label neurons only a few hours after their activation by a behavioural task.

      We did not find any viral tools providing the possibility to label c-fos activated neurons for functional assesment. We have not been able to find references for the use of the d2EGFP reporter introduced by Bi et al. 2002 in a behavioural context. One of the major difference and improvement is certainly the brightness of ZsGreen. In cell cultures, ZsGreen1 showed a 8.6-fold increase in fluorescence intensity as compared with EGFP (Bell et al, 2007).

      Amongst tools with comparable properties, eSARE was developed based on a synthetic Arc promoter driving the expression of a destabilized GFP (dEGFP) (Kawashima et al 2013). We initially used ESARE–dGFP but unfortunately, in our experimental conditions we found that the signal to noise ratio was not satisfactory (number of cells label in the home cage vs. following contextual fear conditining).

      We developed a viral tool to avoid the use of transgenic reporter lines which require laborious breeding and is experimentally less flexible. Nevertheless, many transgenic mice based on the expression of fluorescent proteins under the control of IEG promoters have been developed and used. Some of these mice show a time course of expression of the transgene which is comparable to FLEN. For instance, in organotypic slices from Tet-Tag mice, the time course of expression of EGFP slices follows with a small delay endogenous cFOS expression, and starts decaying after 4 hours (Lamothe-Molina et al, 2022). However, the fluorescence was too weak to visualize neurons in the slice (Christine Gee, personal communication), and imaging is perfomed after immunocytochemistry against GFP. 

      Therefore, we feel that the name given to the FLEN strategy is legitimate. The features of the FLEN strategy were summarized in the discussion (Lines 318-322).

      (3) Line 214: "...FLEN+ CA3 PNs do not show differences in [...] patterns of bursting activity as compared to control neurons." It looks quite different to me (Figure 3E). Just because low n precludes meaningful statistical analysis, I would not conclude there is no difference.

      We agree with the reviewer that the data in Figure 3E are not conclusive due to small sample size, which limits the reliability of statistical comparison. Additionally, the classification of bursting neurons is highly dependent on the specific criteria used, which vary considerably across the literature. To avoid overinterpretation or misleading conclusions, we decided to remove the panel E of Figure 3 showing the fraction of bursting neurons. Nevertheless, we draw the attention to the more robust and interpretable results: RAM⁺ neurons exhibit an increase in firing frequency and a distinct action potential discharge pattern, data which we believe are informative of altered excitability.

      (4) Line 304: Remove the time stamp.

      This was done.

      (5) Line 334: "...results may be explained by an overall increased activity of CA1 neurons..." I don't understand - isn't CA1 downstream of CA3? 

      The reviewer is correct that the sentence was misleading. We removed the reference to CA1, as it was more of a general principle about neuronal activity.

      (6) Line 381: "resolutive", better use "sensitive". 

      This was changed.

      (7) Figure S3: Fear-conditioned animals were 3 days off Dox, controls only 2 days. As RAM expression accumulates over time off Dox, this is not a fair comparison.

      We thank the reviewer for pointing out the incorrect reporting of the experimental design in Figure S3 panel A (bottom), which could lead to misinterpretation of results. In fact, the two groups of mice (CFC vs. HC) underwent all experimental steps in parallel. Specifically, both groups were maintained on and off Doxycycline for the same duration and received viral injection on the same day. 48 hours after Dox withdrawal, the CFC group was trained for contextual conditioning, while the HC group remained in the home cage in the holding room. All animals were thus sacrificed 72 hours after Dox removal. We have corrected the figure to accurately reflect this timeline.

      (8) Please provide sequence information for c-cFos-ZsGreen1-DR. Which regulatory elements of the cFos promoter are included, is the 5' NTR included? This information is very important.

      The information is now provided in the Methods section.

      (9) Please provide the temperature during pharmacological treatments (TTX etc.) before fixation.

      The pharmacological treatment was performed in the incubator at 37°C, this is now indicated in the methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      Major comments:

      (1) Interpretation of key results and relationship between different parts of the manuscript. The manuscript begins with an information-transmission ansatz which is described as ”independent of the computational goal” (e.g. p. 17). While information theory indeed is not concerned with what quantity is being encoded (e.g. whether it is sensory periphery or hippocampus), the goal of the studied system is to *transmit* the largest amount of bits about the input in the presence of noise. In my view, this does not make the proposed framework ”independent of the computational goal”. Furthermore, the derived theory is then applied to a DDC model which proposes a very specific solution to inference problems. The relationship between information transmission and inference is deep and nuanced. Because the writing is very dense, it is quite hard to understand how the information transmission framework developed in the first part applies to the inference problem. How does the neural coding diagram in Figure 3 map onto the inference diagram in Figure 10? How does the problem of information transmission under constraints from the first part of the manuscript become an inference problem with DDCs? I am certain that authors have good answers to these questions - but they should be explained much better.

      We are very thankful to the reviewer for highlighting the potential confusion surrounding these issues, in particular the relationship between the two halves of the paper – which was previously exacerbated by the length of the paper. We have now added further explanations at different points within the manuscript to better disentangle these issues and clarify our key assumptions. We have also significantly cut the length of the paper by moving more technical discussions to the Methods or Appendices. We will summarise these changes here and also clarify the rationale for our approach and point out potential disagreements with the reviewer.

      Key to our approach is that we indeed do not assume the entire goal of the studied neural system (whether part of the sensory system or not) is to transmit the largest amount of information about the stimulus input (in the presence of noise). In fact, general computations, including the inference of latent causes of inputs, often require filtering out or ignoring some information in the sensory input. It is thus not plausible that tuning curves in general (i.e. in an arbitrary part of the nervous system) are optimised solely with regards to the criterion of information transmission. Accordingly we do not assume they are entirely optimised for that purpose. However, we do make a key assumption or hypothesis (which like any hypothesis might turn out to be partly or entirely wrong): that (1) a minimal feature of the tuning curve (its scale or gain) is entirely free to be optimised for the aim of information transmission (or more precisely the goal of combating the detrimental effect of neural noise on coding fidelity), (2) other aspects of the population tuning curve structure (i.e. the shape of individual tuning curves and their arrangement across the population) are determined by (other) computational goals beyond efficient coding. (Conceptually, this is akin to the modularization between indispensible error correction and general computations in a digital computer, and the need for the former to be performed in a manner that is agnostic as to the computations performed.) We have added two paragraphs in the manuscript which present the above rationale and our key hypothesis or assumption. The first of these was added to the (second paragraph of the) Introduction section, and the second is a new paragraph following Eq. 1 (which is about the gain-shape decomposition of the tuning curves, and the optimisation of the former based on efficient coding) of Results.

      Our paper can be divided into two parts. In the first part, we develop a general, computationally agnostic (in the above sense, just as in the digital computer example), efficient coding theory. In the second part, we apply that theory to a specific form of computation, namely the DDC framework for Bayesian inference. The latter theory now determines the tuning curve shapes. When combined with the results of the first part (which dictate the tuning curve scale or gain according to efficient coding theory), this “homeostatic DDC” model makes full predictions for the tuning curves (i.e., both scale and shape) and how they should adapt to stimulus statistics.

      So to summarise, it is not the case that the problem of information transmission (or rather mitigating the effect noise on coding fidelity under metabolic constraints), dealt with in the first part, has become a problem of Bayesian inference. But rather, the dictates of efficient coding for optimal gains for coding fidelity (under constraints) have been applied to and combined with a computational theory of inference.

      We have added new expository text before and after Eq. 17 in Sec. 2.7 (at the beginning of the second part of the paper on homeostatic DDCs) to again make the connection with the first part and the rationale for its combination with the original DDC framework more clear.

      With the changes outlined above, we believe and hope the connection between the two parts (which we agree with the reviewer, was indeed rather obscure previously) has been adequately clarified.

      (2) Clarity of writing for an interdisciplinary audience. I do not believe that in its current form, the manuscript is accessible to a broader, interdisciplinary audience such as eLife readers. The writing is very dense and technical, which I believe unnecessarily obscures the key results of this study.

      We thank the reviewer for this comment. We have taken several steps to improve the accessibility of this work for an interdisciplinary audience. Firstly, several sections containing dense, mathematical writing have now been moved into appendices or the Methods section, out from the main text; in their place we have made efforts to convey the core of the results, and to providing intuitions, without going into unnecessary technical detail. Secondly, we have added additional figures to help illustrate key concepts or assumptions (see Fig. 1B clarifying the conceptual approach to efficient coding and homeostatic adaptation, and Fig. 8A describing the clustered population). Lastly, we have made sure to refer back to the names of symbols more often, so as to make the analysis easier to follow for a reader with an experimental background.

      (3) Positioning within the context of the field and relationship to prior work. While the proposed theory is interesting and timely, the manuscript omits multiple closely related results which in my view should be discussed in relationship to the current work. In particular, a number of recent studies propose normative criteria for gain modulation in populations: • Duong, L., Simoncelli, E., Chklovskii, D. and Lipshutz, D., 2024. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems

      Tring, E., Dipoppa, M. and Ringach, D.L., 2023. A power law describes the magnitude of adaptation in neural populations of primary visual cortex. Nature Communications, 14(1), p.8366.

      Ml ynarski, W. and Tkaˇcik, G., 2022. Efficient coding theory of dynamic attentional modulation. PLoS Biology

      Haimerl, C., Ruff, D.A., Cohen, M.R., Savin, C. and Simoncelli, E.P., 2023. Targeted V1 co-modulation supports task-adaptive sensory decisions. Nature Communications • The Ganguli and Simoncelli framework has been extended to a multivariate case and analyzed for a generalized class of error measures:

      Yerxa, T.E., Kee, E., DeWeese, M.R. and Cooper, E.A., 2020. Efficient sensory coding of multidimensional stimuli. PLoS Computational Biology

      Wang, Z., Stocker, A.A. and Lee, D.D., 2016. Efficient neural codes that minimize LP reconstruction error. Neural Computation, 28(12),

      We thank the reviewer again for bringing these works to our attention. For each, we explain whether we chose to include them in our Discussion section, and why.

      (1) Duong et al. (2024): We decided not to discuss this manuscript, as our assessment is that it is very relevant to our work. That study starts with the assumption that the goal of the sensory system under study is to whiten the signal covariance matrix, which is not the assumption we start with. A mechanistic ingredient (but not the only one) in their approach is gain modulation. However, in their case it is the gains of computationally auxiliary inhibitory neurons that is modulated and not (as in our case) the gain the (excitatory) coding neurons (i.e. those which encode information about the stimulus and whose response covariance is whitened). These key distinction make the connection with our work quite loose and we did not discuss this work.

      (2) Tring et al. (2023): We have added a discussion of the results of this paper and its relationship to the results of our work and that of Benucci et al. This appears in the 7th paragraph of the Discussion. This study is indeed highly relevant to our paper, as it essentially replicates the Benucci et al. experiment, this time in awake mice (rather than anesthetised cats). However, in contrast to the resul‘ts of Benucci et al., Tring et al. do not find firing rate homeostasis in mouse V1. A second, remarkable finding of Tring et al. is that adaptation mainly changes the scale of the population response vector, and only minimally affects its direction. While Tring et al. do not portray it as such, this behaviour amounts to pure stimulus-specific adaptation without the neuron-specific factor found in the Benucci et al. results (see Eq. 24 of our manuscript). As we discuss in our manuscript, when our homeostatic DDC model is based on an ideal-observer generative model, it also displays pure stimulus-specific adaptation with no neuronal factor. Our final model for Benucci’s data did contain a neural factor, because we used a non-ideal observer DDC (in particular, we assumed a smoother prior distribution over orientations compared to the distribution used in the experiment - which has a very sharp peak – as it is more natural given the inductive biases we expect in the brain). The resultant neural factor suppresses the tuning curves tuned to the adaptor stimulus. Interestingly, when gain adaptation is incomplete, and happens to a weaker degree compared to what is necessary for firing rate homeostasis, an additional neural factor emerges that is greater than one for neurons tuned to the adaptor stimulus. These two multiplicative neural factors can approximately cancel each other; such a theory would thus predict both deviation from homeostasis and approximately pure stimulus-specific adaptation. We plan to explore this possibility in future work.

      (3) Ml ynarski and Tkaˇcik (2022): We are now citing and discussing this work in the Discussion (penultimate paragraph), in the context of a possible future direction, namely extending our framework to cover the dynamics of adaptation (via a dynamic efficient gain modulation and dynamic inference). We have noted there that Mlynarski have used such a framework (which while similar has key technical differences with our approach) based on a task-dependent efficient coding objective to model top-down attentional modulation. By contrast, we have studied bottom-up and task-independent adaptation, and it would be interesting to extend our framework and develop a model to make predictions for the temporal dynamics of such adaptation.

      (4) Haimerl et al. (2023): We have elected not to include this work within our discussion either, as we do not believe it is sufficiently relevant to our work to warrant inclusion. Although this paper also considers gain modulation of neural activity, the setting and the aims of the theoretical work and the empirical phenomena it is applied to are very different from our case in various ways. Most importantly, this paper is not offering a normative account of gain modulation; rather, gain modulation is used as a mechanism for enabling fast adaptive readouts of task relevant information.

      (5) Yerxa et al. (2020): We have now included a discussion of this paper in our Discussion section. Note that, even though this study generalises the Ganguli and Simoncelli framework to higher diemsnions, just like that paper it still places strict requirements (which are arguably even more stringent in higher dimensions) on the form of the tuning curves in the population, viz. that there exists a differentiable transform of the stimulus space which renders these unimodal curves completely homogeneous (i.e., of the same shape, and placed regularly and with uniform density).

      (6) Wang et al. (2016): We have included this paper in our discussion as well. As above, this paper does not consider general tuning curves, and places the same constraint on their shape and arrangement as in Ganguli and Simoncelli paper.

      More detailed comments and feedback:

      (1) I believe that this work offers the possibility to address an important question about novelty responses in the cortex (e.g. Homann et al, 2021 PNAS). Are they encoding novelty per-se, or are they inefficient responses of a not-yet-adapted population? Perhaps it’s worth speculating about.

      We are not sure why the relatively large responses to “novel” or odd-ball stimuli should be considered inefficient or unadapted: in the context in which those stimuli are infrequent odd-balls (and thus novel or surprising when occurring), efficient coding theory would indeed typically predict a large response compared to the (relatively suppressed) responses to frequently occurring stimuli. Of course, if the statistics change and the odd-ball stimulus now becomes frequent, adaptation should occur and would be expected to suppress responses to this stimulus. As to the question of whether (large) responses to infrequent stimuli can or should be characterised as novelty responses: this is partly an interpretational or semantic issue – unless it is grounded in knowledge of how downstream populations use this type of coding in V1, which could then provide a basis for solidly linking them to detection of novelty per se. In short, our theory, could be applied to Homann et al.’s data, but we consider that beyond the scope of the current paper.

      (2) Clustering in populations - typically in efficient coding studies, tuning curve distributions are a consequence of input statistics, constraints, and optimality criteria. Here the authors introduce randomly perturbed curves for each cluster - how to interpret that in light of the efficient coding theory? This links to a more general aspect of this work - it does not specify how to find optimal tuning curves, just how to modulate them (already addressed in the discussion).

      We begin by addressing the reviewer’s more general concern regarding the fact that our theory does not address the problem of finding optimal tuning curves, only that of modulating them optimally. As we expound within the updated version of the paper (see the newly expanded 3rd paragraph in Sec. 2.1 and the expanded 2nd paragraph in Introduction), it is not plausible that the sole function of sensory systems, and neural circuits more generally, is the transmission of information. There are many other computational tasks which must be performed by the system, such as the inference of the latent causes of sensory inputs. For many such tasks, it is not even desirable to have complete transmission of information about the external stimulus, since a substantial portion of that information is not important for the task at hand, and must be discarded. For example, such discarding of information is the basis of invariant representations that occur, e.g., in higher visual areas. So we recognise that tuning curve shapes are in general dictated and shaped by computational goals beyond transmission of information or error correction. As such, we have remained agnostic as to the computational goals of neural systems and therefore the shape of the tuning curve. We have made the assumption and adopted the postulate that those computational goals determine the shape of the tuning curves, leaving the gains to be adjuted freely for the purpose of mitigating the effect noise on coding fidelity (this is similar to how error correction is done in computers independendently of the computations performed). by assuming that those computational goals are captured adequately by the shape of tuning curves, this leaves us free to optimise the gains of those curves for purely information theoretic objectives. Finally, we note that the case where the tuning curve shapes are additionally optimised for information transmission is a special case of our more general approach. For further discussion, see the updated version of our introduction.

      We now turn to our choice to model clusters using random perturbations. This is, of course, a toy model for clustering tuning curves within a population. With this toy model we are attempting to capture the important aspects of tuning curve clusters within the population while not over-complicating the simulations. Within any neural population, there will be tuning curves that are similar; however, such curves will inevitably be heterogeneous, as opposed to completely identical. Thus, when we cluster together similar curves there will be an “average” cluster tuning curve (found by, e.g., normalising all individual curves and taking the average), which all other tuning curves within the cluster are deviations from. The random perturbations we apply are our attempt to capture these deviations. However, note that the perturbations are not fully random, but instead have an “effective dimensionality” which we vary over. By giving the perturbations an effective dimensionality, we aim to capture the fact that deviations from the average cluster tuning curve may not be fully random, and may display some structure.

      (3) Figure 8 - where do Hz come from as physical units? As I understand there are no physical units in simulations.

      We have clarified this within the figure caption. The within-cluster optimisation problem requires maximising a quadratic program subject to a constraint on the total mean spike count of the cluster. The objective for the quadratic program is however mathematically homogeneous. So we can scale the variables and parameters in a consistent to be in units of Hz – i.e., turn them into mean firing rates, instead of mean spike counts, with an assumption on the length of the coding time interval. We fix this cluster firing rate to be k × 5 Hz, so that the average single-neuron firing rate is 5 Hz (based on empirical estimates – see our Sec. 2.5). This agrees with our choice of µ in our simulations (i.e., µ = 10) if we assume a coding interval of 0.1 seconds.

      (4) Inference with DDCs in changing environments. To perform efficient inference in a dynamically changing environment (as considered here), an ideal observer needs some form of posterior-prior updating. Where does that enter here?

      A shortcoming of our theory, in its current form, is that it applies only to the system in “steady-state”, without specifying the dynamics of how adaptation temporlly evolves (we assume the enrivonment has periods of relative stability that are of relatively long duration compared to the dynamical timescales of adaptation, and consider the properties of the well-adapted steady state population). Thus our efficient coding theory (which predicts homeostatic adaptation under the outlined conditions) is silent on the time-course over which homeostasis occurs. Likewise, the DDC theory (in its original formulation in Vertes & Sahani) is silent on dynamic updating of posteriors and considers only static inference with a fixed internal model. We have now discuss a new future directoin in the Discussion (where we cite the work of Mlynarski and Tkacik) to point out that our theory can in principle be extended (based on dynamic inference and efficient coding) to account for the dynamics of attention, but this is beyond the scope of the current work.

      (5) Page 6 - ”We did this in such a way that, for all , the correlation matrices, (), were derived from covariance matrices with a 1/n power-law eigenspectrum (i.e., the ranked eigenvalues of the covariance matrix fall off inversely with their rank), in line with the findings of Stringer et al. (2019) in the primary visual cortex.” This is a very specific assumption, taken from a study of a specific brain region - how does it relate to the generality of the approach?

      Our efficient coding framework has been formulated without relying on any specific assumptions about the form of the (signal or noise) correlation matrices in cortex. The homeostatic solution to this efficient coding problem, however, emerges under certain conditions. But, as we demonstrate in our discussion of the analytic solutions to our efficient coding objective and the conditions necessary for the validity of the homeostatic solution, we expect homeostasis to arise whenever the signal geometry is sufficiently high-dimensional (among other conditions). By this we mean that the fall-off of the eigenvalues of the signal correlation matrix must be sufficiently slow. Thus, a fall-off in the eigenvalue spectrum slower than 1/n would favor homeostasis even more than our results. If the fall off was faster, then whether or not (and to what degree) firing rate homeostasis becomes suboptimal depends on factors such as the fastness of the fall-off and also the size of the population. Thus (1) rate homeostasis does not require the specific 1/n spectrum, but that spectrum is consistent with the conditions for optimality of rate homeostasis, (2) in our simulations we had to make a specific choice, and relying on empirical observations in V1 was of course a well-justified choice (moreover, as far as we are aware, there have been no other studies that have characterised the spectrum of the signal covariance matrix in response to natural stimuli, based on large population recordings).

      Reviewer #2 (Public Review):

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough as far as I can tell. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in the cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed close to the optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific and neuron-specific adaptation in V1.

      We thank the reviewer for these assessments.

      Weaknesses:

      The novelty and significance of the work are not presented clearly. The relation to other theoretical work, particularly Ganguli and Simoncelli and other efficient coding theories, is explained in the Discussion but perhaps would be better placed in the Introduction, to motivate some of the many choices of the mathematical models used here.

      We thank the reviewer for this comment; we have updated our introduction to make clearer the relationship between this work and previous works within efficient coding theory. Please see the expanded 2nd paragraph of Introduction which gives a short account of previous efficient coding theories and now situates our work and differentiates it more clearly from past work.

      The manuscript is very hard to read as is, it almost feels like this could be two different papers. The first half seems like a standalone document, detailing the general theory with interesting results on homeostasis and optimal coding. The second half, from Section 2.7 on, presents a series of specific applications that appear somewhat disconnected, are not very clearly motivated nor pursued in-depth, and require ad-hoc assumptions.

      We thank the reviewer for this suggestion. The reviewer is right to note that our paper contains both the exposition of a general efficient coding theory framework in addition to applications of that framework. Following your advice we have implemented the following changes. (1) significantly shortened or entirely moved some of the less central results in the second half of Results, to the Methods or appendices (this includes the entire former section 2.7 and significant shortening of the section on implementation of Bayes ratio coding by divisive normalisation). (2) We have added a new figure (Fig 1B) and two long pieces of text to the (2nd paragraph of) Introduction, after Eq. (1), and in Sec. 2.7 (introducing homeostatic DDCs) to more clearly explain and clarify the assumptions underlying our efficient coding theory, and its connection with the second half of the Results (i.e. application to DDC theory of Bayesian inference), and better motivate why we consider the homeostatic DDC.

      For instance, it is unclear if the main significant finding is the role of homeostasis in the general theory or the demonstration that homeostatic DDC with Bayes Ratio coding captures V1 adaptation phenomena. It would be helpful to clarify if this is being proposed as a new/better computational model of V1 compared to other existing models.

      We see the central contribution of our work as not just that homeostasis arises as a result of an efficient coding objective, but also that this homeostasis is sufficient to explain V1 adaptation phenomena - in particular, stimulus specific adaptation (SSA) - when paired with an existing theory of neural representation, the DDC (itself applied to orientation coding in V1). Homeostatic adaptation alone does not explain SSA; nor do DDCs. However, when the two are combined they provide an explanation for SSA. This finding is significant, as it unifies two forms of adaptation (SSA and homeostatic adaptation) whose relationship was not previously appreciated. Our field does not currently have a standard model of V1, and we do not claim to have provided one either; rather, different models have captured different phenomena in V1, and we have done so for homeostatic SSA in V1.

      Early on in the manuscript (Section 2.1), the theory is presented as general in terms of the stimulus dimensionality and brain area, but then it is only demonstrated for orientation coding in V1.

      The efficient coding theory developed in Section 2 is indeed general throughout, we make no assumptions regarding the shape of the tuning curves or the dimensionality of the stimulus. Further, our demonstrations of the efficient coding theory through numerical simulations - make assumptions only about the form of the signal and noise covariance matrices. When we later turn our attention away from the general case, our choice to focus on orientation coding in V1 was motivated by empirical results demonstrating a co-occurrence of neural homeostasis and stimulus specific adaptation in V1.

      The manuscript relies on a specific response noise model, with arbitrary tuning curves. Using a population model with arbitrary tuning curves and noise covariance matrix, as the basis for a study of coding optimality, is problematic because not all combinations of tuning curves and covariances are achievable by neural circuits (e.g. https://pubmed.ncbi.nlm.nih.gov/27145916/ )

      First, to clarify, our theory allows for complete generality of neural tuning curve shapes, and assumes a broad family of noise models (which, while not completely arbitrary, includes cases of biological relevance and/or models commonly used in the theoretical literature). Within this class of noise covariance models, we have shown numerical results for different values for different parameters of the noise covariance model, but more importantly, have analytically outlined the general properties and requirements on noise strength and structure (and its relationship to tuning curves and signal structure) under which homeostatic adaptation would be optimal. Regarding the point that not all combinations of tuning curves and noise covariances occur in biology or are achievable by neural circuits: (1) If we are guessing correctly the specific point of the reviewer’s reference to the review paper by Kohn et al. 2016, we have in fact prominently discussed the case of information limiting noise which corresponds to a specific relationship between signal structure (as determined by tuning curves) and noise structure (as specified by the noise covariance matrix). Our family of noise models include that biologically relevant case and we have indeed paid it particular attention in our simulations and discussions (see discussion of Fig. 7 in Sec. 2.3, and that of aligned noise in Sec. 2.5). (2) As for the more general or abstract point that not all combinations of noise covariance and tuning curve structures are achievable by neural circuits, we can make the following comments. First, in lieu of a full theoretical or empirical understanding of the achievable combinations (which does not exist), we have outlined conditions for homeostatic adaptations under a broad class of noise models and arbitrary tuning curves. If some combinations within this class are not realised in biology, that does not invalidate the theoretical results, as the latter have been derived under more general conditions, which nevertheless include combinations that do occur in biology and are achievable by neural circuits (which, as pointed out, include the important case of aligned noise and signal structure – as reviewed in Kohn et al.– to which we have paid particular attention).

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the ’adapter’ is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here.

      The theory we provide predicts that, under certain (specified) conditions, we ought to see deviation from exact homeostatic results; indeed, we provide a first order approximation to the optimal gains in this case which quantifies such deviations when they are small. However, unfortunately the form of this deviation depends on a precise choice of stimulus statistics (e.g. the signal correlation matrix, the noise correlation matrix averaged over all stimulus space, and other stimulus statistics), in contrasts to the universality of the homeostatic solution, when it is a valid approximation. In our model of Benucci et al.’s experiment, we restrict to a simple one-dimensional stimulus space (corresponding to orientated gratings), without specifying neural responses to all stimuli; as such, we are not immediately able to make predictions about whether the homeostatic failure can be predicted using the specific form of deviation from homeostasis. However, we acknowledge that this is a weakness of our analysis, and that a more complete investigation would address this question. For reasons of space, we elected not to pursue this further. We have added a paragraph to our Discussion (8th paragraph) explaining this.

      Reviewer#1 (Recommendations for the authors):

      (1) To make the article more accessible I would suggest the following:

      (a) Include a few more illustrations or diagrams that demonstrate key concepts: adaptationof an entire population, clustering within a population, different sources of noise, inference with homeostatic DDCs, etc.

      We thank the reviewer for this suggestion - we have added an additional figure in (Figure 8, Panel A) to explain the concept of clustering within a population. We also added a new panel to Figure 1 (Figure 1B) which we hope will clarify the conceptual postulate underlying our efficient coding framework and its link to the second half of the paper.

      (b) Within the text refer to names of quantities much more often, rather than relying onlyon mathematical symbols (e.g. w,r,Ω, etc).

      We thank the reviewer for the suggestion; we have updated the text accordingly and believe this has improved the clarity of the exposition.

      (2) It is hard to distill which components of the considered theory are crucial to reproducing the experimental observations in Figure 12. Is it the homeostatic modulation, efficient coding, DDCs, or any combination of those or all of them necessary to reproduce the experiment? I believe this could be explained much better, also with an audience of experimentalists in mind.

      We have updated the text to provide additional clarity on this matter (see the pointers to these changes and additions in the revised manuscript, given above in response to your first comment). In particular, reproducing the experimental results requires combining DDCs with homeostatic modulation – with the latter a consequence of our efficient coding theory, and not an independent ingredient or assumption.

      (3) It would be good to comment on how sensitive the results are to the assumptions made, parameter values, etc. For example: do conclusions depend on statistics of neural responses in simulated environments? Do they generalize for different values of the constraint µ? This could be addressed in the discussion / supplementary material.

      This issue is already discussed extensively within the text - see Sec. 2.4, Analytical insight on the optimality of homeostasis, and Sec. 2.5, Conditions for the validity of the homeostatic solution to hold in cortex. In these sections, we outline that - provided a certain parameter combination is small - we expect the homeostatic result to hold. Accordingly, we anticipate that our numerical results will generalise to any settings in which that parameter combination remains small.

      (4) How many neurons/units were used for simulations?

      We apologies for omitting this detail; we used 10,000 units for our simulations. We have edited both the main text and the methods section to reflect this.

      (5) Typos etc: a) Figure 5 caption - the order of panels B and C is switched. b) Figure 6A - I suggest adding a colorbar.

      Thank you. We have relabelled the panels B and C in the appropriate figures so that the ordering in the figure caption is correct. We feel that a colourbar in figure 6A would be unnecessary, since we are only trying to convey the concept of uniform correlations, rather than any particular value for the correlations; as such we have elected not to add a colourbar. We have, however, added a more explicit explanation of this cartoon matrix in the figure caption, by referring to the colors of diagonal vs off-diagonal elements.

      Reviewer#2 (Recommendations for the authors):

      The text on page 10, with the perturbation analysis, could be moved to a supplement, leaving here only the intuition.

      We thank the reviewer for this suggestion; we have moved much of the argument into the appendix so as to not distract the reader with unnecessary technical details.

      Text before eq. 12 “...in cluster a maximize the objective...” should be ‘minimize’?

      The cluster objective as written is indeed maximised, as stated in the text. Note that, in the revised manuscript, this argument has been moved to an appendix to reduce the density of mathematics in the main text.

      Top of page 25 “S<sub>0</sub> and S<sub>0</sub>” should be “S<sub>0</sub> and S<sub>1</sub>”?

      Thank you, we have corrected the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context of the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented. 

      Strengths: 

      The authors present convincing evidence for the expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate the role of the receptor in driving nerve injury-induced pain in rodent models. 

      Weaknesses: 

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid. 

      Thank you very much for your constructive feedback.

      In response to your suggestion, we have used more specific markers to distinguish excitatory (VGLUT2) and inhibitory (VGAT) neurons via in situ hybridization. These analyses revealed that GPR30 is predominantly expressed in excitatory neurons of the superficial dorsal horn (SDH), as presented in the Results section (lines 117-120) and in Figure 2A-B.

      Additionally, we performed a quantitative analysis to determine the extent of co-localization between GPR30+ and CCK+ neurons. The data were included in the Results (lines 131–132) and Figure 2G.

      Reviewer #2 (Public review):

      Using a variety of experimental manipulations, the authors show that the membrane estrogen receptor G protein-coupled estrogen receptor (GPER/GPR30) expressed in CCK+ excitatory spinal interneurons plays a major role in the pain symptoms observed in the chronic constriction injury (CCI) model of neuropathic pain. Intrathecal application of selective GPR30 agonist G-1 induced mechanical allodynia and thermal hyperalgesia in male and female mice. Downregulation of GPR30 in CCK+ interneurons prevented the development of mechanical and thermal hypersensitivity during CCI. They also show the up modulation of AMPA receptor expression by GPR30. 

      Generally, the conclusions are supported by the experimental results. I also would like to see significant improvements in the writing and the description of results. 

      Methodological details for some of the techniques are rather sparse. For example, when examining the co-localization of various markers, the authors do not indicate the number of animals/sections examined. Similarly, when examining the effect of shGper1, it is unclear how many cells/sections/animals were counted and analyzed. 

      In other sections, there is no description of the concentration of drugs used (for example, Figure 4H). In Figures 4C-E, there is no indication of the duration of the recordings, the ionic conditions, the effect of glutamate receptor blockers, etc 

      Some results appear anecdotal in the way they are described. For example, in Figure 5, it is unclear how many times this experiment was repeated. 

      We sincerely appreciate your valuable feedback and thoughtful recommendations.

      To address your concerns regarding methodological transparency, we have added the following details to the revised manuscript:

      The number of animals and sections analyzed in co-localization studies.

      The number of cells/sections/animals used in each quantification following shGper1 treatment.

      The concentrations of drugs administered (e.g., in Figure 4H).

      Detailed recording conditions, including duration, ionic composition, and pharmacological conditions (Figures 4C-E).

      In addition, we have thoroughly revised the writing throughout the manuscript to enhance clarity and precision in the description of our findings.

      Reviewer #3 (Public review): 

      Summary: 

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein-coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile, and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from the primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30-dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported. 

      Strengths: 

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis. 

      Weaknesses: 

      The primary weakness in this manuscript involves overextending the interpretations of the data to propose a direct link between corticospinal projections signaling through GPR30 on this CCK+ population of spinal dorsal horn neurons. For example, even in the cropped images presented, GPR30 is present in many other CCK-negative neurons. Only about a quarter of the cells labeled by the anterograde viral tracing experiment from S1 are CCK+. Since no direct evidence is provided for S1 signaling through GPR30, this conclusion should be revised. 

      Thank you for your encouraging comments and critical insights.

      We fully acknowledge the concern regarding the proposed direct involvement of corticospinal projections in modulating nociceptive behavior via GPR30 in CCK+ neurons. While our anterograde tracing experiments suggest anatomical overlap, we agree that definitive evidence of functional connectivity is lacking.

      Accordingly, we have revised the Abstract, Discussion, and Graphical Abstract to present our findings more cautiously. We now describe our observations as indicating that S1 projections potentially interact with GPR30<sup>+</sup> spinal neurons, rather than asserting a definitive functional link.

      To support this revised interpretation, we performed additional quantitative analyses examining the co-localization among S1 projections, CCK+, and GPR30+ neurons. Furthermore, we clarified that the chemogenetic activation studies targeted a mixed neuronal population and did not exclusively manipulate CCK+ neurons.

      These changes aim to better align our conclusions with the presented data and provide a more nuanced framework for future investigations.

      Reviewer #1 (Recommendations for the authors): 

      Major corrections 

      (1) Figure 2: The authors conclude that GPR30 is mainly expressed in excitatory spinal neurons because they are labeled by a virus with a Camk2 promoter. While there is evidence that Camk2 is specific to excitatory neurons in the brain, based on RNAseq datasets (e.g. Linnarsson Lab, http://mousebrain.org/adolescent/genesearch.html ) this is less clear cut within the spinal cord. A more direct way to assess the relative expression of GPR30 in excitatory versus inhibitory neurons would be to perform immunohistochemistry or FISH with GPR30/Vglut2/Vgat. 

      Alternatively, if this observation is not crucial for the overall arch of the story, I recommend the authors eliminate these data, as they do not support the idea that GPR30 is mainly in excitatory neurons. 

      We thank the reviewer for highlighting this important limitation. To strengthen our conclusion regarding the neuronal identity of GPR30-expressing cells, we performed fluorescent in situ hybridization (FISH) using vGluT2 (marker for excitatory neurons) and VGAT (marker for inhibitory neurons). The results confirmed that GPR30 is predominantly expressed in vGluT2-positive excitatory neurons within the spinal cord. These new data are presented in the revised manuscript (lines 117-120) and shown in Figure 2A-B.

      (2) (2a) Figure 2: The authors also report that GPR30 is expressed in most CCK+ spinal neurons. A more rigorous way to present the data would be to perform quantification and report the % of CCK neurons that are GPR30. 

      (2b) More importantly, it is unclear what % of GPR30 neurons are CCK+. These types of quantifications would provide useful insights into the heterogeneity of CCK and GPR30 neuron populations, and help align findings of experiments using the behavioral pharmacology using GRP antagonists to the knockdown of Gper1 in CCK spinal neurons - for instance, does a population of GRP30+/CCK- neurons exist? If so, it would be worth discussing what role (if any) that population might play in nerve injury-induced mechanical allodynia. 

      Understanding the breakdown of GPR30 populations becomes even more relevant when the authors characterize which cell types are targeted by descending projections from S1. It is clear that the vast majority of CCK+ neurons that receive descending input from S1 neurons are GPR30+, but there are many other GPR30+ neurons that do not receive input from SI neurons presented in 5M. Is this simply because only a small fraction of CCK+/GPR30+ neurons are targeted by descending S1 projections, or could they represent a distinct population of GPR30 neurons? 

      (2a) We appreciate the suggestion. Quantification showed that approximately 90% of CCK⁺ neurons express GPR30, and about 50% of GPR30⁺ neurons co-express CCK. These data are now provided in the revised Results (lines 131-132) and in Figure 2F-G.

      (2b) Indeed, our data reveal that a substantial portion of GPR30⁺ neurons do not co-express CCK. While this study focuses on GPR30 function in CCK⁺ neurons, we recognize the potential relevance of GPR30⁺/CCK⁻ populations. We have addressed this point in the Discussion (lines 303-306):

      “However, it should be noted that half of GPR30⁺ neurons are not co-localized with CCK⁺ neurons, and further studies are needed to explore the function of these GPR30⁺/CCK⁻ neurons in neuropathic pain.”

      Regarding descending input, our data in Figure 5 show that S1 projections selectively innervate a subset (~30%) of CCK⁺ neurons, most of which co-express GPR30. This suggests that S1-targeted CCK⁺/GPR30⁺ neurons may represent a functionally distinct population. We have added clarification to the revised manuscript, while acknowledging that further studies are needed to elucidate the roles of non-targeted GPR30⁺ neurons.

      (3) Throughout the manuscript both male and female mice were used in experiments. Rather than referring to male and female mice as different genders, it would be more appropriate to describe them as different sexes. 

      As suggested, we have replaced all instances of “gender” with “sex” throughout the revised manuscript.

      (4) Figure 5: To increase the ease of interpreting the figure, in panels 5J and 5N, it would be helpful to indicate directly on the figure panel which another marker was assessed in double-labeling analyses.

      We have revised Figures 5J and 5N to include clear labels identifying the markers used in double-labeling analyses, to improve interpretability.

      Minor corrections: 

      (1) Line 36, I believe the authors mean to say "GPER/GPR30 in spinal neurons", rather than just "spinal". 

      Corrected as suggested. The sentence now reads (line 34):

      “Here we showed that the membrane estrogen receptor G-protein coupled estrogen receptor (GPER/GPR30) in spinal neurons was significantly upregulated in chronic constriction injury (CCI) mice…”

      (2) There are minor grammatical errors throughout the manuscript that interfere with comprehension. Proofreading/editing of the English language use may be beneficial. 

      We have thoroughly revised the manuscript for clarity and corrected grammatical and syntactic errors to improve readability.

      (3) Line 169-170, reads "Known that EPSCs are mediated by glutamatergic receptors like AMPA receptors and several studies have been reported the relationship between GPR30 and AMPA receptor25,29". Rewriting the sentence such that it better describes what the known relationship is between GPR30 and AMPA would be helpful in setting up the rationale of the experiment in Figure 4. 

      We have rewritten this section to better clarify the rationale behind the electrophysiological experiments (lines 161-164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors, and emerging evidence suggesting that GPR30 enhances excitatory transmission by promoting clustering of glutamatergic receptor subunits, we examined whether GPR30 modulates EPSCs via AMPA receptor-dependent mechanisms.”

      (4) Line 198-199 "Then we explored the possible connections among GPR30, S1-SDH projections and CCK+ neuron." In the context of spinal circuitry, "connections" may raise the expectation that synaptic connectivity will be evaluated. What I think best describes what the authors investigated in Figure 5 is the "relationship" between GPR30, S1-SDH projections, and CCK+ neurons. 

      We have revised the sentence accordingly (lines 184-186):

      “Building on previous findings suggesting a functional interaction between S1-SDH projections and spinal CCK⁺ neurons, our current study aimed to further elucidate the structural relationship among GPR30, S1-SDH projections, and CCK⁺ neurons.”

      (5) Figure 5: To increase the ease of interpreting the figure, in panels 5J and FN, it would be helpful to indicate directly on the figure panel which other marker was assessed in double-labeling analyses. 

      We have added direct labels to figure panels to clarify double-labeled analyses in the revised Figure 5J and 5N.

      Reviewer #2 (Recommendations for the authors): 

      (1) Can the authors provide more detail about the distribution of CCK+ cells in the spinal cord and, in particular, the localization of double-stained (CCK/cfos) neurons? 

      We thank the reviewer for this suggestion. To better characterize the distribution of CCK⁺ neurons within the spinal dorsal horn (SDH), we performed immunostaining in CCK-tdTomato mice using lamina-specific markers: CGRP (lamina I), IB4 (lamina II), and NF200 (lamina III–V). Our results demonstrate that CCK⁺ neurons are primarily localized in the deeper laminae of the SDH. These findings are now described in the revised Results (lines 126–129) and shown in Figure 2E.

      In addition, we conducted c-Fos immunostaining in CCK-Ai14 mice and found increased activation of CCK⁺ neurons following CCI. This supports the involvement of CCK⁺ neurons in neuropathic pain. These data are included in the Results (lines 129–131) and Supplementary Figure S4.

      (2) Figure 2A. There is no formal quantification of the percentage of TdTomato+ neurons that are also CCK+. The description of these results is insufficient. 

      We appreciate this point and have revised the description of Figure 2A accordingly. To strengthen our analysis, we conducted additional FISH experiments with vGluT2 and VGAT probes. Quantification revealed that GPR30 is predominantly expressed in excitatory neurons (approximately 60%). These data are shown in the revised Results (lines 117-119) and Figures 2A-B and S3. This supports our conclusion that GPR30 is largely localized to excitatory spinal interneurons.

      (3) Figure 4H. What is the evidence that these are AMPA-mediated currents? This is not explained in the text. 

      Thank you for raising this point. We now provide detailed experimental procedures to clarify that the recorded EPSCs are AMPA receptor–mediated. Specifically, spinal slices from CCK-Cre mice were used, and excitatory postsynaptic currents were recorded in the presence of APV (100 μM, NMDA receptor blocker), bicuculline (20 μM, GABA_A receptor blocker), and strychnine (0.5 μM, glycine receptor blocker), ensuring that the observed currents were AMPA-dependent. These methodological details are now clearly described in the revised Results (lines 165–173) and supported by prior literature (Zhang et al., J Biol Chem 2012; Hughes et al., J Neurosci 2010).

      (1) Yan Zhang, Xiao Xiao, Xiao-Meng Zhang, Zhi-Qi Zhao, Yu-Qiu Zhang (2012). Estrogen facilitates spinal cord synaptic transmission via membrane-bound estrogen receptors: implications for pain hypersensitivity. J Biol Chem. Sep 28;287(40):33268-81.

      (2) Ethan G Hughes, Xiaoyu Peng, Amy J Gleichman, Meizan Lai, Lei Zhou, Ryan Tsou, Thomas D Parsons, David R Lynch, Josep Dalmau, Rita J Balice-Gordon (2010). Cellular and synaptic mechanisms of anti-NMDA receptor encephalitis. J Neurosci. 2010 Apr 28;30(17):5866-75.

      (4) What is the signaling mechanism leading to a larger amplitude of currents after G-1 infusion? 

      We thank the reviewer for this important question. G-1 is a selective agonist for GPR30. Based on previous studies by Luo et al. (2016), we speculate that activation of GPR30 may increase the clustering of glutamatergic receptor subunits at postsynaptic sites, thereby enhancing AMPA receptor-mediated currents. While our current study did not directly address the intracellular signaling cascade, we have incorporated this mechanistic speculation in the Discussion.

      Jie Luo, X.H., Yali Li, Yang Li, Xueqin Xu, Yan Gao, Ruoshi Shi, Wanjun Yao, Juying Liu, Changbin Ke (2016). GPR30 disrupts the balance of GABAergic and glutamatergic transmission in the spinal cord driving to the development of bone cancer pain. Oncotarget 7, 73462-73472. 10.18632/oncotarget.11867.

      (5) Figure 4I. Please include error bars. 

      We have revised Figure 4I to include error bars, as requested.

      (6) Line 198. What is the evidence that AAV2/1 EF1α FLP is an antegrade trans monosynaptic marker? 

      We thank you for this request. AAV2/1 has been widely used for anterograde monosynaptic tracing based on its properties (Wang et al., Nat Neurosci 2024; Wu et al., Neurosci Bull 2021): (1) it infects neurons at the injection site and undergoes active anterograde transport; (2) newly assembled viral particles are released at synapses and infect postsynaptic partners; (3) in the absence of helper viruses, the spread halts at the first synapse, ensuring monosynaptic restriction. We have elaborated on this in the revised manuscript (line 198), citing Wang et al. (Nat Neurosci 2024) and Wu et al. (Neurosci Bull 2021).

      (1) Hao Wang, Qin Wang, Liuzhe Cui, Xiaoyang Feng, Ping Dong, Liheng Tan, Lin Lin, Hong Lian, Shuxia Cao, Huiqian Huang, Peng Cao, Xiao-Ming Li (2024). A molecularly defined amygdalaindependent tetra-synaptic forebrain-tohindbrain pathway for odor-driven innate fear and anxiety. Nat Neurosci. 2024 Mar;27(3):514-526.

      (2) Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (7) Figure 5G. I do not understand the logic of this experiment. A Cre AAV is injected in the S1 cortex. Why should this lead to the expression of tdTomato on a downstream (postsynaptic?) neuron? The authors should quote the literature that supports this anterograde transsynaptic transport.

      We appreciate this question. As described in previous studies (e.g., Wu et al., Neurosci Bull 2021), AAV2/1-Cre injected into the S1 cortex leads to Cre expression in projection targets due to transsynaptic anterograde transport. Subsequent injection of a Cre-dependent AAV (AAV2/9-DIO-mCherry) into the spinal cord enables specific labeling of postsynaptic neurons that receive input from S1. We have clarified this mechanism in line 206 and provided the appropriate citation.

      Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (8) The same question arises when interpreting the results obtained in Figure 6.

      We thank the reviewer for the question, and we have addressed it in point (7).

      (9) Line 257. How do the authors envision that estrogen would change its modulation of GPR30 under basal and neuropathic conditions? Is there any evidence for this speculation? 

      We thank the reviewer for raising this thoughtful question. In the current study, we focused on pharmacologically manipulating GPR30 activity via its selective agonist and antagonist. We did not directly investigate how endogenous estrogen regulates GPR30 under physiological and neuropathic states. We have recognized this limitation and highlighted the need for future research to investigate this regulatory mechanism.

      (10-20) In my opinion, the entire manuscript needs a careful revision of the English language. While one can follow the text, it contains numerous grammatical and syntactic errors that make the reading far from enjoyable. I am highlighting just a few of the many errors. 

      We appreciate the reviewer’s honest assessment. The manuscript has undergone thorough language editing by a native English speaker to correct grammatical errors, improve clarity, and enhance overall readability. We also restructured several sections, particularly the Discussion, to improve logical flow.

      (21) The discussion of results is a bit disorganized, with disconnected sentences and statements, and somewhat repetitive. For example, lines 303 to 306 lack adequate flow. It is also quite long and includes general statements that add little to the discussion of the new findings (lines 326-333). 

      We agree and have revised the Discussion extensively. Disconnected or repetitive sentences (e.g., lines 303-306, 326-333) have been removed or rewritten. For instance, we added a new transitional paragraph (lines 307-311) to improve flow:

      “Abnormal activation of neurons in the SDH is a key contributor to hyperalgesia, and enhanced excitatory synaptic transmission is a major mechanism driving increased neuronal excitability. Therefore, we evaluated excitatory postsynaptic currents (EPSCs) and observed increased amplitudes in CCK⁺ neurons following CCI, suggesting elevated excitability in these neurons.”

      We also removed redundant generalizations to maintain a focused discussion of our novel findings.

      Reviewer #3 (Recommendations for the authors): 

      (1) What is the distribution of GPR30 throughout the spinal cord and DRG? The authors demonstrate that this can overlap with a CCK+ population, but there are many GPR30+ and CCK negative neurons, even in the cropped images presented. It would be helpful to quantify the colocalization with CCK. 

      We thank the reviewer for this important point. As shown in the revised manuscript, GPR30 is expressed in both the spinal cord and dorsal root ganglia (DRG). However, our updated data (Figure 1B) demonstrate that Gper1 mRNA levels in the DRG are not significantly altered after CCI, suggesting a limited involvement of DRG GPR30 in neuropathic pain. These results are described in the revised Results (line 94).

      Regarding spinal co-expression, we performed a detailed quantification. Approximately 90% of CCK⁺ neurons express GPR30, while about 50% of GPR30⁺ neurons are CCK⁺. These co-localization results are now included in the revised Results and presented in Figure 2G.

      (2) It is clear that CCI and GPR30 influence excitatory synaptic transmission in CCK+ neurons. However, these experiments do not fully support the authors' claims of a postsynaptic upregulation of AMPARs. Comparing amplitudes and frequencies of spontaneous EPSCs cannot necessarily distinguish a pre- vs postsynaptic change since some of these EPSCs can arise from spontaneous action potential firing. I suggest revising this conclusion. 

      We appreciate these insightful comments. We fully agree that our data from spontaneous EPSC recordings (sEPSCs) in CCK⁺ neurons are not sufficient to distinguish between pre- and postsynaptic mechanisms, as sEPSCs may include spontaneous presynaptic activity. Therefore, we have revised the text throughout the manuscript to avoid overstating conclusions related to postsynaptic AMPA receptor upregulation.

      (3) What is the rationale for the evoked EPSC experiments from electrical stimulation in "the deep laminae of SDH?" I do not think that this experiment can rule out a presynaptic contribution of GPR30 to the evoked responses, particularly if these are Gs-coupled at presynaptic terminals. Paired-pulse stimulations could help answer this question, otherwise, alternative interpretations, also related to the point above, should be provided. 

      We thank the reviewer for this thoughtful critique. Indeed, electrical stimulation of the deep SDH laminae does not exclude presynaptic involvement, especially considering that GPR30 is a G protein–coupled receptor (GPCR) and could act presynaptically. We agree that paired-pulse ratio (PPR) analysis would be more informative in distinguishing pre- from postsynaptic effects, but this was not performed due to technical limitations in our current experimental setup.

      Accordingly, we have revised our interpretations in both the Results and Discussion to acknowledge that our data do not rule out presynaptic contributions. We now state that GPR30 activation enhances EPSCs in CCK⁺ neurons, while further studies are needed to dissect the precise site of action.

      (4) I appreciate the challenging nature of the trans-synaptic viral labeling approaches, but the chemogenetic and Gper knockdown experiments do not selectively target this CCK+ population of deep dorsal horn neurons. The data are clear that each of these components (descending corticospinal projections, CCK neurons, and GPR30) can modulate nociceptive hypersensitivity, but I do not agree with the overall conclusion that each of are directly linked as the authors propose. I recommend revising the overall conclusion and title to reflect the convincing data presented. 

      We thank the reviewer for this critical observation. We agree that while our data show functional roles for descending cortical input, CCK⁺ neurons, and GPR30 in modulating pain hypersensitivity, the evidence does not establish a definitive direct circuit integrating all three components.

      In response, we have revised our conclusions to reflect this limitation. Specifically, we avoided claiming a direct functional link among S1 projections, CCK⁺ neurons, and GPR30. Instead, we now propose that GPR30 modulates neuropathic pain primarily through its action in CCK⁺ spinal neurons, with potential involvement of descending facilitation from the somatosensory cortex.

      Additionally, we have revised the manuscript title to better reflect our mechanistic focus:<br /> “GPR30 in spinal CCK-positive neurons modulates neuropathic pain.”

      Minor Corrections

      (1) The authors should refer to mice by sex, not gender. 

      Corrected throughout the manuscript.

      (2) Page 9, line 195: "significantly" is used to refer to co-localization of 28.1%. What is this significant to? 

      We have revised the sentence to accurately describe the observed percentage, without implying statistical significance:

      “Our co-staining results revealed that a high proportion of CCK⁺ S1-SDH postsynaptic neurons expressed GPR30” (line 198-199).

      (3) I recommend modifying some of the transition phrases like "by the way," "what's more," and "besides". 

      All informal expressions have been replaced with academic alternatives including “Furthermore,” “Additionally,” and “Moreover.”

      (4) Additional guides to mark specific laminae in the dorsal horn would be useful. 

      We added immunostaining with laminar markers (CGRP for lamina I and NF200 for lamina III–V), and these data are now shown in Figure 2E and described in the Results (lines 126-129).

      (5) Page 5, line 115: immunochemistry should be immunohistochemistry. 

      Corrected as suggested.

      (6) Page 6, line 136: "Confirming the structural connnections" was not demonstrated here. Perhaps co-localization between GPR30 and CCK+. 

      The text was revised to “To functionally interrogate GPR30 and CCK⁺ neurons in neuropathic pain...” (line 133).

      (7) Page 8, line 166: unsure what "took and important role" means. 

      This phrasing was corrected for clarity and replaced with an accurate scientific description.

      (8) Page 8, line 168: "IPSCs of spinal CCK+ neurons" implies that they are sending inhibitory inputs. 

      We revised the term to “EPSCs” to correctly reflect excitatory synaptic currents in CCK⁺ neurons.

      (9) Page 8, line 169: "Known that EPSCs" is missing an introductory phrase. 

      The sentence was rewritten to include an appropriate introductory clause (lines 161–164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors...”

      (10) Page 10, line 227 and 228: "adequately" and "sufficiently" should be adequate and sufficient. 

      We corrected these terms to the proper adjective forms: “adequate” and “sufficient” (lines 224-225).

    1. Author response:

      (1) Maternal lactation assay and PVN oxytocin neuron identity

      Reviewers and editors noted that the maternal lactation assay felt out of place (Editors, R1, R2) and asked for clearer validation of AAV specificity in the PVN (R3). These issues are linked: the primary purpose of the lactation assay was to physiologically validate that the recorded neurons are oxytocinergic, as PVNOT neurons exhibit well-established pulsatile activity during lactation.

      In response, we will (i) explicitly frame the lactation assay as a validation experiment, (ii) streamline its presentation to sit naturally with our identity-validation rationale, and (iii) clarify our AAV targeting and expression controls; we will also address our oxytocin immunohistochemistry quantification and its limitations (we observed notable intra-individual and technical variability in oxytocin immunoreactivity), which motivated the complementary physiological approach.

      (2) Clarifications and analyses.

      The reviewers pointed to several analyses, inferences, and conclusions that should be clarified. We will clarify: (i) the oxytocin histology in Figure 1 (marker definitions and quantification), (ii) the roles of floor versus ambient temperature, and (iii) further elucidate some of the quantitative links among behavioral state, neural activity, and body temperature (e.g., behavior bout duration vs. neural responses and Tb), (iv) the computer vision methodology. These additions will address the reviewers’ requests for clearer inferences and presentation.

      (3) Optogenetic inhibition. 

      We appreciate the suggestion to include an inhibition experiment (Editors, R1, R2). While interesting, this is beyond the scope of the current revision. Our stimulation experiments were designed to functionally test a specific observation from calcium imaging, namely, that PVNOT neurons show bursts of heightened activity at transitions from quiescence to arousal/thermogenesis, and to assess causal sufficiency for thermogenic/arousal-related readouts. We will make this rationale explicit, discuss the scope limits of the current dataset, and note inhibition as an important direction for future work.

    1. Author response:

      Reviewer #1 (Public review):

      The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging.  However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution. 

      We greatly appreciate the reviewer’s encouragement. Our team is fully committed to maintaining clarity and rigor in the design, execution, and reporting of this study. We are grateful to the reviewers for bringing these issues to our attention. We also acknowledge and are working on that several statistical analyses could be reperformed to better emphasize our focus on the genetic effect of ADH5 deletion in mice of the same age.

      Reviewer #2 (Public review):

      Strengths: 

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function. 

      We greatly appreciate the reviewer’s encouragement. 

      Weaknesses: 

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. 

      We thank the reviewer for the insightful remark, and we agree with the reviewer that sex needs to be considered as a biological variable. We will assess ADH5 expression in aged female mice.

      (2)  It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B). 

      We thank the reviewer for the comment/suggestion. Indeed, we have measured the ADH5 expression in both brown adipose tissue (BAT) and inguinal adipose tissue (iWAT). We regret that we did not include our results in the first submission and will provide these results in the revised manuscript.

      (3)  For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo? 

      We thank the reviewer for their thoughtful comment and will provide detailed information in the revised manuscript.

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured.

      We apologize for not making these critical points clearer in the first submission. In the revised manuscript we will include, in detail, the logistics of the experiments in the materials and methods section, figure annotation and figure legends.  

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice? 

      We thank the reviewer for the insightful remark, and we will measure general protein Snitrosylation status in the BAT of HSF1A-treated mice. 

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice? 

      We regret that we did not describe our results clearly in the first submission and will provide detailed information in the revised manuscript.

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?  

      We regret that we did not present results clearly in the first submission and will provide detailed information in the revised manuscript.

      (8) Figure 3B looks a bit odd. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels? 

      We will provide detailed information in the revised manuscript.

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? 

      We thank the reviewer for the insightful comment. We will present these results as suggested in the revised manuscript.

    1. Author response:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In the revision, we will:

      Clarify that while epidemics occur in other parts of sub-Saharan Africa, our results may indicate a different epidemiological narrative in The Gambia, with sustained but low-level circulation (hyperendemicity).

      Discuss how model assumptions (e.g. seasonality, homogenous mixing) may bias results toward stable dynamics.

      Highlight the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In revisions we will:

      Clarify this distinction in the manuscript to avoid overinterpretation.

      Emphasize the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause abortions and neonatal deaths, these occur during relatively rare epidemics. In the Gambian context, where we’re not observing such large episodic outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we will acknowledge this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for M herds in the dry season were not included in the appendix due to an oversight, though demographic turnover was incorporated in the model code. We will add the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay/seroreversion) is an important consideration in RVFV serology, but whether this reflects true loss of protective immunity after natural infection remains unknown. Biologically, it is plausible that infected cattle develop long-lasting protection, as suggested by studies in humans, but there is an absence of longitudinal field data. From a modelling perspective, our aim was to predict age-seroprevalence curve dependent on FOI estimates and assess its ability to reproduce observed cross-sectional seroprevalence patterns. We therefore adopted a parsimonious SIR framework, treating loss of seropositivity as a potential explanation for the observed age disparity rather than modelling it as loss of immunity. In revisions we will:

      Clarify this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      Further discuss the seropositivity decay rates predicted in our survey and their possible relation to test sensitivity.

      Highlight that while a SIRS structure could generate different long-term dynamics, evaluating this requires stronger evidence for true immunity loss; we consider this an important future modelling direction.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment. Disease-induced mortality was included in the serocatalytic model through the mortality parameter (γ), but we recognise that this might not have been sufficiently clear in the text. In revisions we will clarify in the Methods and Appendix.

      (7) Clarifying previous vs. current study components

      We will revise the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We will expand the limitations section to specifically identify the assumptions contributing most to uncertainty. We will then outline how these may bias transmission dynamics and intervention estimates.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not be ideally suited to exploring them. In the revised manuscript, we will remove this analysis and emphasize how our modelling framework is more suited to exploring cattle vaccination scenarios, including targeting of specific herd types (e.g. T vs. M vs. L). We note that we are currently developing separate work focused on vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Shigella flexneri is a bacterial pathogen that is an important globally significant cause of diarrhea. Shigella pathogenesis remains poorly understood. In their manuscript, Saavedra-Sanchez et al report their discovery that a secreted E3 ligase effector of Shigella, called IpaH1.4, mediates the degradation of a host E3 ligase called RNF213. RNF213 was previously described to mediate ubiquitylation of intracellular bacteria, an initial step in their targeting of xenophagosomes. Thus, Shigella IpaH1.4 appears to be an important factor in permitting evasion of RNF213-mediated host defense.

      Strengths:

      The work is focused, convincing, well-performed, and important. The manuscript is well-written.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of the novelty and importance of our study. We provide a comprehensive response to each of the reviewer’s specific recommendations below and highlight any changes made to the manuscript in response to those recommendations.

      Reviewer #1 (Recommendations for the authors):

      (1) In the abstract (and similarly on p.10), the authors claim to have shown "IpaH1.4 protein as a direct inhibitor of mammalian RNF213". However, they do not show the interaction is direct. This, in my opinion, would require demonstrating an interaction between purified recombinant proteins. I presume that the authors are relying on their UBAIT data to support the direct interaction, but this is a fairly artificial scenario that might be prone to indirect substrates. I would therefore prefer that the 'direct' statement be modified (or better supported with additional data). Similarly, on p.7, the section heading states "S. flexneri virulence factors IpaH1.4 and IpaH2.5 are sufficient to induce RNF213 degradation". The corresponding experiment is to show sufficiency in a 293T cell, but this leaves open the participation of additional 293T-expressed factors. So I would remove "are sufficient to", or alternatively add "...in 293T cells".

      We agree with the reviewer and made the recommended changes to the text in the abstract, in the results section on page 7, and in the Discussion on page 11. During the revision of our manuscript two additional studies were published that provide convincing biochemical evidence for the direct interaction between IpaH1.4 and RNF213 (PMID: 40205224; PMID: 40164614). These studies address the reviewer’s concern extensively and are now briefly discussed and cited in our revised MS.

      (2) In the abstract the authors state "Linear (M1-) and lysine-linked ubiquitin is conjugated to bacteria by RNF213 independent of the linear ubiquitin chain assembly complex (LUBAC)." However, it is not shown that RNF213 is able to directly perform M1-ubiquitylation. It is shown that RNF213 is required for M1-linked ubiquitylation in IpaH1.4 or MxiE mutants, this is different than showing conjugation is done by RNF213 itself. This should be reworded.

      We agree and edited the text accordingly

      (3) Introduction: one of the main points of the paper is that RNF213 conjugates linear ubiquitin to the surface of bacteria in a manner independent of the previously characterized linear ubiquitin conjugation (LUBAC) complex. This is indeed an interesting result, but the introduction does not put this discovery in much context. I would suggest adding some discussion of what was known, if anything, about the type of Ub chain formed by RNF213, and specifically whether linear Ub had previously been observed or not.

      We now provide context in the Introduction on page 3 and briefly discuss previous work that had implicated LUBAC in the ubiquitylation of cytosolic bacteria. We emphasize that LUBAC specifically generates linear (M1-linked) ubiquitin chains, while the types of ubiquitin linkages deposited on bacteria through RNF213-dependent pathways had remained unidentified.

      (4) Figure 3C: is the difference in 7KR-Ub between WT and HOIP KO cells significant? If so, the authors may wish to acknowledge the possibility that HOIP partially contributes to M1-Ub of MxiE mutant Shigella

      The frequencies at which bacteria are decorated with 7KR-Ub is not statistically different between WT and HOIP KO cells. We have included this information in the panel description of Figure 3.

      (5) On page 11, the authors state that "...we observed that LUBAC is dispensable for M1-linked ubiquitylation of cytosolic S. flexneri ∆ipaH1.4. We found that lysine-less internally tagged ubiquitin or an M1-specific antibody bound to S. flexneri ∆ipaH1.4 in cells lacking LUBAC (HOIL-1KO or HOIPKO) but failed to bind bacteria in RNF213-deficient cells". In fact, what is shown is that M1-ubiquitylation in ∆ipaH1.4 infection is RNF213-dependent (5E), but the work with lysine mutants, HOIP or HOIL-1 KOs are all with ∆mxiE, not ∆ipaH1.4 (3B) in this version of the manuscript. Ideally, the data with ∆ipaH1.4 could be added, but alternatively, the conclusion could be re-worded.

      We now include the data demonstrating that staining of ∆ipaH1.4 with an M1-specific antibody is unchanged from WT cells in HOIL-1 KO and HOIP KO cells. These data are shown in supplementary data (Fig. S3E) and referred to on page 9 of the revised manuscript.

      (6) The UBAIT experiment should be explained in a bit more detail in the text. The approach is not necessarily familiar to all readers, and the rationale for using Salmonella-infected ceca/colons is not well explained (and seems odd). Some appropriate caution about interpreting these data might also be welcome. Did HOIP or HOIL show up in the UBAIT? This perhaps also deserves some discussion.

      As expected, HOIP (listed under its official gene name Rnf31 in the table of Fig.S2B) was identified as a candidate IpaH1.4 interaction partner as the third most abundant hit from the UBAIT screen. Remarkably, Rnf213 was the hit with the highest abundance in the IpaH1.4 UBAIT screen. To address the reviewer’s comments, we now explain the UBAIT approach in more detail and provide the rational for using intestinal protein lysates from Salmonella infected mice. The text on page 8 reads as follows: “To investigate potential physical interactions between IpaH1.4 and IpaH2.5, we reanalyzed a previously generated dataset that employed a method known as ubiquitin-activated interaction traps (UBAITs) (32). As shown in Fig. S2A, the human ubiquitin gene was fused to the 3′ end of IpaH2.5, producing a C-terminal IpaH2.5-ubiquitin fusion protein. When incubated with ATP, ubiquitin-activating enzyme E1, and ubiquitin-conjugating enzyme E2, the IpaH2.5-ubiquitin "bait" protein is capable of binding to and ubiquitylating target substrates. This ubiquitylation creates an iso-peptide bond between the IpaH2.5 bait and its substrate, thereby enabling purification via a Strep affinity tag incorporated into the fusion construct (32). IpaH2.5-ubiquitin bait and IpaH3-ubiquitin control proteins were incubated with lysates from murine intestinal tissue. To detect interaction partners in a physiologically relevant setting, we used intestinal lysates derived from mice infected with Salmonella, which in contrast to Shigella causes pronounced inflammation in WT mice and therefore better simulates human Shigellosis in an animal model. Using UBAIT we identified HOIP (Rnf31) as a likely IpaH2.5 binding partner (Fig. S2B), thus confirming previous observations (28) and validating the effectiveness our approach. Strikingly, we identified mouse Rnf213 as the most abundant interaction partner of the IpaH2.5-ubiquitin bait protein (Fig. S2B). Collectively, our data and concurrent reports showing direct interactions between IpaH1.4 and human RNF213 (36, 37) indicate that the virulence factors IpaH1.4 and IpaH2.5 directly bind and degrade mouse as well as human RNF213.”

      (7) It would be helpful if the authors discussed their results in the context of the prior work showing IpaH1.4/2.5 mediate the degradation of HOIP. Do the authors see HOIP degradation? If indeed HOIP and RNF213 are both degraded by IpaH1.4 and IpaH2.5, are there conserved domains between RNF213 and HOIP being targeted? Or is only one the direct target? A HOIP-RNF213 interaction has previously been shown (https://doi.org/10.1038/s41467-024-47289-2). Since they interact, is it possible one is degraded indirectly? To help clarify this, a simple experiment would be to test if RNF213 degraded in HOIP KO cells (or vice-versa)?

      We appreciate the reviewer’s suggestions. We conducted the proposed experiments and found that WT S. flexneri infections result in RNF213 degradation in both WT and HOIP KO cells. Similarly, we found that HOIP degradation was independent of RNF213. We have included these data in Figs. 5A and S3B of our revised submission. A study published during revisions of our paper demonstrates that the LRR of IpaH1.4 binds to the RING domains of both RNF213 and LUBAC (PMID: 40205224). We refer to this work in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors find that the bacterial pathogen Shigella flexneri uses the T3SS effector IpaH1.4 to induce degradation of the IFNg-induced protein RNF213. They show that in the absence of IpaH1.4, cytosolic Shigella is bound by RNF213. Furthermore, RNF213 conjugates linear and lysine-linked ubiquitin to Shigella independently of LUBAC. Intriguingly, they find that Shigella lacking ipaH1.4 or mxiE, which regulates the expression of some T3SS effectors, are not killed even when ubiquitylated by RNF213 and that these mutants are still able to replicate within the cytosol, suggesting that Shigella encodes additional effectors to escape from host defenses mediated by RNF213-driven ubiquitylation.

      Strengths:

      The authors take a variety of approaches, including host and bacterial genetics, gain-of-function and loss-of-function assays, cell biology, and biochemistry. Overall, the experiments are elegantly designed, rigorous, and convincing.

      Weaknesses:

      The authors find that ipaH1.4 mutant S. flexneri no longer degrades RNF213 and recruits RNF213 to the bacterial surface. The authors should perform genetic complementation of this mutant with WT ipaH1.4 and the catalytically inactive ipaH1.4 to confirm that ipaH1.4 catalytic activity is indeed responsible for the observed phenotype.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of our work, especially its scientific rigor. We conducted the experiment suggested by the reviewer and included the new data in the revised manuscript. As expected, complementation of the ∆ipaH1.4 with WT IpaH1.4 but not with the catalytically dead C338S mutant restored the ability of Shigella to efficiently escape from recognition by RNF213 (Figs. 5C-D).

      Reviewer #2 (Recommendations for the authors):

      The authors should perform genetic complementation of the ipaH1.4 mutant with WT ipaH1.4 and the catalytically inactive ipaH1.4 to confirm that ipaH1.4 catalytic activity is indeed responsible for the observed phenotype.

      We performed the suggested experiment and show in Figs. 5C-D that complementation of the ∆ipaH1.4 mutant with WT IpaH1.4 but not with the catalytically dead C338S mutant restored the ability of Shigella to efficiently escape from recognition by RNF213. These data demonstrate that the catalytic activity of IpaH1.4 is required for evasion of RNF213 binding to the bacteria.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether and how Shigella avoids cell-autonomous immunity initiated through M1-linked ubiquitin and the immune sensor and E3 ligase RNF213. The key findings are that the Shigella flexneri T3SS effector, IpaH1.4 induces degradation of RNF213. Without IpaH1.4, the bacteria are marked with RNF213 and ubiquitin following stimulation with IFNg. Interestingly, this is not sufficient to initiate the destruction of the bacteria, leading the authors to conclude that Shigella deploys additional virulence factors to avoid this host immune response. The second key finding of this paper is the suggestion that M1 chains decorate the mxiE/ipaH Shigella mutant independent of LUBAC, which is, by and large, considered the only enzyme capable of generating M1-linked ubiquitin chains.

      Strengths:

      The data is for the most part well controlled and clearly presented with appropriate methodology. The authors convincingly demonstrate that IpaH1.4 is the effector responsible for the degradation of RNF213 via the proteasome, although the site of modification is not identified.

      Weaknesses:

      (1)The work builds on prior work from the same laboratory that suggests that M1 ubiquitin chains can be formed independently of LUBAC (in the prior publication this related to Chlamydia inclusions). In this study, two pieces of evidence support this statement -fluorescence microscopy-based images and accompanying quantification in Hoip and Hoil knockout cells for association of M1-ub, using an antibody, to Shigella mutants and the use of an internally tagged Ub-K7R mutant, which is unable to be incorporated into ubiquitin chains via its lysine residues. Given that clones of the M1-specific antibody are not always specific for M1 chains, and because it remains formally possible that the Int-K7R Ub can be added to the end of the chain as a chain terminator or as mono-ub, the authors should strengthen these findings relating to the claim that another E3 ligase can generate M1 chains de novo.

      (2) The main weakness relating to the infection work is that no bacterial protein loading control is assayed in the western blots of infected cells, leaving the reader unable to determine if changes in RNF213 protein levels are the result of the absent bacterial protein (e.g. IpaH1.4) or altered infection levels.

      (3)The importance of IFNgamma priming for RNF213 association to the mxiE or ipaH1.4 strain could have been investigated further as it is unclear if RNF213 coating is enhanced due to increased protein expression of RNF213 or another factor. This is of interest as IFNgamma priming does not seem to be needed for RNF213 to detect and coat cytosolic Salmonella.<br /> Overall, the findings are important for the host-pathogen field, cell-autonomous/innate immune signaling fields, and microbial pathogenesis fields. If further evidence for LUBAC independent M1 ubiquitylation is achieved this would represent a significant finding.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of our work and its significance. We provide a comprehensive response to the main three critiques listed under ‘weaknesses’ and also have responded to each of the reviewer’s specific recommendations below. We highlight any changes made to the manuscript in response to those recommendations.

      (1) As the reviewer correctly pointed out, 7KR ubiquitin cannot only be used for linear ubiquitylation but can also function as a donor ubiquitin and can be attached as mono-ubiquitin to a substrate or to an existing ubiquitin chain as a chain terminator. To distinguish between 7KR INT-Ub signals originating from linear versus mono-ubiquitylation, we followed the reviewer’s advice and generated a N-terminally tagged 7KR INT-Ub variant. The N-terminal tag prevents linear ubiquitylation but still allows 7KR INT-Ub to be attached as a mono-ubiquitin. We found that the addition of this N-terminal tag significantly reduced but not completely abolished the number of Δ_mxiE_ bacteria decorated with 7KR INT-Ub. These data are shown in a new Fig. S1 and indicate that 7KR lacking the N-terminal tag is attached to bacteria both in the form of linear (M1-linked) ubiquitin and as donor ubiquitin, possibly as a chain terminator. While we cannot rule out that the anti-M1 antibodies used here cross-react with other ubiquitin linkages, we reason that the 7KR data strongly argues that linear ubiquitin is part of the ubiquitin coat encasing IpaH1.4-deficient cytosolic Shigella. Collectively, our data show that both linear and lysine-linked (especially K27 and K63) ubiquitin chains are part of the RNF213-dependent ubiquitin coat on the surface of IpaH1.4 mutants. And furthermore, our data strongly indicate that this ubiquitylation of IpaH1.4 mutants is independent of LUBAC.

      (2) We used GFP-expressing strains of S. flexneri for our infection studies and were therefore able to use GFP expression as a loading control. We have incorporated these data into our revised figures. These new data (Figs. 4A, 5A, and S3B) show that bacterial infection levels were comparable between WT and mutant infections and that therefore the degradation of RNF213 (or HOIP – see new data in Fig. S3B) is not due to differences in infection efficiency.

      (3) We agree with the reviewer that the mechanism by which RNF213 binds to bacteria is an important unanswered question. Similarly, whether other ISGs have auxiliary functions in this process or whether binding efficiencies vary between different bacterial species are important questions in the field. However, these questions go far beyond the scope of this study and were therefore not addressed in our revisions.

      Reviewer #3 (Recommendations for the authors):

      (1) An N-terminally tagged K7R-ub should be used as a control to test whether the signal found around the mutant shigella is being added via the N terminal Met into chains. As it is known that certain batches of the M1-specific antibodies are in fact not specific and able to detect other chain types, the authors should test the specificity of the antibody used in this study (eg against different di-Ub linkage types) and include this data in the manuscript.

      We agree with the reviewer in principle. The anti-linear ubiquitin (anti-M1) monoclonal antibody, clone 1E3, prominently used in this study was tested by the manufacturer (Sigma) by Western blotting analysis and according to the manufacturer “this antibody detected ubiquitin in linear Ub, but not Ub K11, Ub K48, Ub K63.” However, this analysis did not include all possible Ub linkage types and thus the reviewer is correct that the anti-M1 antibody could theoretically also detect some other linkage types. To address this concern, we added new data during revisions demonstrating that 7KR INT-Ub targeting to S. flexneri is largely dependent on the N-terminus (M1) of ubiquitin. Our combined observations therefore overwhelmingly support the conclusion that linear (M1-linked) as well as K-linked ubiquitin is being attached to the surface of IpH1.4 S. flexneri bacteria in an RNF213-dependent and LUBAC-independent manner.

      (2) The M1 signal detected on bacteria with the antibody is still present in either Hoip or Hoil KO’s but due to the potential non-specificity of the antibody, the authors should test whether K7R ub is detected on bacteria in the Hoil ko (in addition to Hoip KO). This would strengthen the authors’ data on LUBAC-independent M1 and is important because Hoil can catalyse non-canonical ubiquitylation.

      The specific linear ubiquitin-ligating activity of LUBAC is enacted by HOIP. We show that linear ubiquitylation of susceptible S. flexneri mutants as assessed by anti-M1 ubiquitin staining or 7KR INT-Ub recruitment occurs in HOIPKO cells at WT levels (Figs. 3B, 3C, S3E [new data]). In our view , these data unequivocally show that the observed linear ubiquitylation of cytosolic S. flexneri ipaH1.4 and mxiE mutants is independent of LUBAC.

      (3) For Figure 4A, do mxiE bacteria show similar invasion - authors should include a bacterial protein control to show levels of bacteria in WT and mxiE infected conditions. A similar control should be included in Figure 5A.

      We used GFP-expressing strains of S. flexneri for our infection studies and were therefore able to use GFP expression as a loading control. We have incorporated these data into our revised figures. These new data (Figs. 4A, 5A, and S3B) show that bacterial infection levels were comparable between WT and mutant infections and that therefore the degradation of RNF213 (or HOIP – see new data in Fig. S3B) is not due to differences in infection efficiency.

      (4) Can the authors speculate why IFNg priming is needed for the coating of Shigella mxiE mutant but not in the case of Salmonella or Burkholderia? Is this just amounts of RNF213 or something else?

      In our studies we did not directly compare ubiquitylation rates of cytosolic Shigella, Burkholderia, and Salmonella bacteria with each other under the same experimental conditions. However, such a direct comparison is needed to determine whether IFNgamma priming is required for RNF213-dependent bacterial ubiquitylation of some but not other pathogens. Two papers published during the revisions of our manuscript (PMID: 40164614, PMID: 40205224) reports robust RNF213 targeting to IpaH1.4 Shigella mutants in unprimed cells HeLa cells (whereas we used A549 and HT29 cells). Therefore, differences in reagents, cell lines, and/or other experimental conditions may determine whether IFNgamma priming is necessary to observe substantial RNF213 translocation to cytosolic bacteria.

      (5) Typos - there are several, but this is hard to annotate with line numbers so the authors should proofread again carefully.

      We proofread the manuscript and corrected the small number of typos we identified

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and Colleagues present a study aimed at demonstrating the feasibility of repeated ultrasound localization microscopy (ULM) recording sessions on mice chronically implanted with a cranial window transparent to US. They provided quantitative information on their protocol, such as the required number of Contrast enhancing microbubbles (MBs) to get a clear image of the vasculature of a brain coronal section. Also, they quantified the co-registration quality over time-distant sessions and the vasodilator effect of isoflurane.

      Strengths:

      The study showed a remarkable performance in recording precisely the same brain coronal section over repeated imaging sessions. In addition, it sheds light on the vasodilator effect of isoflurane (an anesthetic whose effects are not fully understood) on the different brain vasculature compartments, although, as the Authors stated, some insights in this aspect have already been published with other imaging techniques. The experimental setting and protocol are very well described.

      Wang and co-authors submitted a revised version of their study, which shows improvements in the clarity of the data description.

      However, the flaws and limitations of this study are substantially unchanged.

      The main issues are:

      Statistics are still inadequate. The TOST test proposed in this revised version is not equivalent to an ANOVA. Indeed, multivariate analyses should be the most appropriate, given that some quantifications were probably made on multiple vessels from different mice. The 3 reviewers mentioned the flaws in statistics as the primary concern.

      Response 01: We thank the reviewer for raising this important point. We fully acknowledge the limitations of our current statistical analysis. We would like to clarify that the TOST procedure was applied exclusively to the measurements taken from the same vessel segment in the same animal across different time points, with the purpose of evaluating the consistency of vessel diameter measurements. We recognize that the statistical analysis in this study remains limited, which we have acknowledged as a key limitation in the manuscript. This constraint arises primarily from the limited number of animals, and our analysis should be interpreted as a representative case study rather than a generalized statistical conclusion. We have revised the manuscript to clarify these points and to more explicitly acknowledge the statistical limitations.

      (Line 329) “Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      No new data has been added, such as testing other anesthetics.

      Response 02: We acknowledge that the current study does not include data involving other anesthetics, and we have also discussed this point in our initial response. In fact, we did attempt to use other anesthetics such as ketamine. However, we found it difficult to draw reliable conclusions due to experimental limitations such as variable anesthesia recovery profiles and injection timing, as elaborated in the following paragraphs. Therefore, we decided not to include these data in the current study to avoid potential misinterpretation.

      One major limitation of our experimental setup is that imaging in the awake state is necessarily conducted after a brief period of isoflurane-anesthesia. This brief anesthesia allows for the intravenous injection of microbubbles via the tail vein. Isoflurane is particularly suited for this purpose due to its rapid onset and offset. Mice can recover quickly once the gas is withdrawn, which enables relatively consistent post-anesthesia imaging in the awake state.

      In contrast, other anesthetic agents present challenges. Their recovery profiles are slower, more variable, and less controllable. Reversal drugs can be administered to awaken the animals, but they add another variability. These may lead to greater fluctuations in cerebral hemodynamics and factors introduce uncertainty in the timing of bolus microbubble injection. As such, our current setup is not ideal for systematically comparing different anesthetics and could yield misleading results.

      A more appropriate strategy for comparing awake ULM imaging with different anesthetics would be performing awake imaging first, followed by imaging under anesthesia. This would ensure that the awake condition is free from residual anesthetic effects. However, this method raises higher requirement in bubble delivery, as no anesthesia can be used for the intravenous injection.

      To address this, we are actively exploring another solution using indwelling jugular vein catheterization. By surgically implanting a catheter into the jugular vein prior to imaging, we can establish a stable and reproducible route for microbubble delivery in fully awake animals without any anesthesia induction. This method has the potential to enable direct and reliable comparisons across different physiological states. However, the implementation of this technique and the associated experimental findings go beyond the scope of the current study and will be presented in a future manuscript.

      In the present work, we have emphasized the methodological limitations of our approach and clarified that our primary goal is to highlight the necessity and feasibility of awake-state ULM imaging. The focus is not to comprehensively characterize the effects of different anesthetic agents on microvascular brain flow. We appreciate your understanding and interest in this important future direction. 

      Based the responses and previous revision, we have further refined the discussion of the relevant limitations:

      (Line 324) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging. Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      (Line 347) “Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions and the short imaging window available after bolus injection. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. In addition, since microbubbles are rapidly cleared from circulation, the duration of effective imaging is limited to only a few minutes, which also overlaps with the anesthesia recovery period, constraining the usable awake-state imaging window. Future improvement on microbubble infusion using an indwelling jugular vein catheter presents a promising alternative to address these limitations. This method allows for stable microbubble infusion without the need for anesthesia induction, ensuring that the awake imaging condition is free from residual anesthetic effects. Moreover, it has the potential to extend the duration of imaging sessions, offering a longer and more stable time window for data acquisition. Furthermore, by performing ULM imaging in the awake state first, instead of starting with anesthetized imaging, researchers can achieve a more rigorous comparison of how various anesthetics influence cerebral microvascular dynamics relative to the awake baseline.”

      The Authors still insist on using the term Vascularity which they define as: 'proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.'. Why not use apparent cerebral blood volume or just CBV? Introducing an unnecessary and redundant term is not scientifically acceptable. In this revised version, vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes. Rev2 also raised this point.

      Response 03: Thank you for revisiting this important point. We acknowledge that the term vascularity is difficult to interpret for readers, and we also recognize that we did not sufficiently justify its use in the earlier version.

      Based on your suggestion, we have now replaced all instances of “vascularity” with “fractional vessel area”. While the underlying definition remains the same, fractional vessel area offers a more intuitive description. The term “fractional” denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes, such as Figures 4i–k to evaluate various brain regions. We would also like to clarify that this was not introduced as an unnecessary or redundant term, but rather as a more suitable metric for longitudinal ULM analysis. We did consider using apparent cerebral blood volume (CBV), estimated from microbubble counts. However, we found that it was less robust and meaningful in the context of longitudinal ULM comparisons. Below we provide further justification for using the vessel area instead:

      (1) Using the vessel area is more robust:

      In longitudinal ULM comparisons, normalization across time points is essential to enable fair and meaningful comparisons. In our study, we normalized the data based on a cumulative 5 million microbubbles (e.g., Fig. 2). Other normalization strategies could also be adopted, as long as the resulting vascular maps reach a sufficiently saturated state. However, even with normalization, it remains important to use a quantitative metric that is minimally biased and invariant to experimental fluctuations across time points. Vessel area, derived from binarized vessel maps, is less sensitive to variations in acquisition time and microbubble concentration. This is because repeated microbubble trajectories through the same location are not counted multiple times. In contrast, apparent CBV, calculated from the microbubble counts, is more susceptible to different concentration conditions. Since repeated detections in the same location accumulate, the metric can be dependent on injection efficiency and imaging duration. While CBV may still be valid under well-controlled, steady-state conditions, we found the vessel area to be a more robust and reliable metric for longitudinal analysis under our current bolus-injection protocol.

      (2) Using the vessel area is more meaningful:

      Compared to CBV, the vessel area provides a more direct representation of structural characteristics such as vessel diameter. Anesthesia-induced vasodilation leads to an increase in vessel diameter. Although local diameter changes can be assessed by manually selecting vessel segments, this approach is labor-intensive and prone to selection bias. To enable a more comprehensive and objective assessment of such morphological changes, fractional vessel area provides a more informative alternative to CBV, as it captures diameter-related variations at a global or regional scale, and avoids potential biases associated with manually selecting specific vessels or regions.

      In response to: vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes.

      We agree that blood vessels cannot be generated in a few minutes. Vascularity (now fractional vessel area) should be interpreted as apparent vessel density, which reflects a probabilistic estimate of vessel density based on the detectable microbubble. 

      Both apparent vessel density and apparent CBV are indirect, sampling-based approximations of vascular features, and both are fundamentally limited by microbubble detection sensitivity. Low microbubble concentrations lead to underestimation of both CBV and vessel area. A change from zero to non-zero in these metrics does not imply the physical appearance or disappearance of vessels, but rather reflects a change in the likelihood of detecting flow in each region.

      In summary, while neither fractional vessel area (vascularity in previous versions) nor apparent CBV is a perfect metric due to the inherent limitations of ULM, we believe the vessel area provides a more robust and meaningful parameter for our longitudinal comparisons. We have revised the main text to include this explanation and acknowledge the limitations and interpretation of fractional vessel area more explicitly.

      Revision in Results:

      (Line 181) “To validate the broader applicability of our findings, we conducted ROI-based analyses using fractional vessel area and mean velocity as primary metrics. These metrics extended the analysis of vessel diameter and flow velocity to entire brain regions or selected ROIs, which provides a more objective assessment of cerebral blood flow changes at a global scale and reduces the bias associated with manually selecting vessel segments. For vessel area measurements, the term fractional denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes.”

      Revision in Methods: definition of vascularity

      (Line 571) “In ROI-based analysis, we focused on two primary parameters: fractional vessel area and mean velocity. Fractional vessel area was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal. Mean velocity was calculated by averaging all non-zero pixel of velocity estimates within the ROI. The velocity distribution within each ROI was also visualized using violin plots, as shown in Fig. 2, 4 and 6, to illustrate the range and density of flow velocity estimates across different acquisition. In this study, we focused on these two metrics because they represent the most straightforward extension of single-vessel analysis to brain-wide vascular changes.”

      We put our ROI analysis code on GitHub and added a “Code availability” section. We hope it can serve as a foundation for users to explore different quantitative metrics in their own longitudinal ULM studies. We hope to provide an example to inspire further exploration.

      (Line 578) “Code availability

      To support quantitative longitudinal analysis of ULM data, we developed an open-source MATLAB application (https://github.com/ekerwang/ULMQuantitativeAnalysis). This tool is designed to facilitate ROI-based analysis of ULM images for longitudinal comparisons. It supports multiple quantification metrics, including but not limited to vessel area and mean velocity used in this study. Users can select and adapt different metrics based on their specific applications, as a wide range of ULM-based quantification metrics have been developed for different pathological and pharmacological studies.”

      The long-term recordings mentioned by the Authors refer to the 3-week time frame analyzed in this study. However, within each acquisition, the time available from imaging is only a few minutes (< 10', referring to most of the plots showing time courses) after the animals' arousal from isoflurane and before bubbles disappear. This limitation should be acknowledged.

      Response 04: Thank you for this comment. We agree that the current imaging sessions are constrained by the short time window available after the animal’s arousal from isoflurane and before bubbles disappear. This limitation indeed restricts the duration of usable awake-state imaging in our current bolus injection protocol. As discussed earlier, we are actively exploring the use of a jugular vein catheterization approach to address this limitation. This approach has the potential to extend the imaging session duration and provide a longer, more stable time window. We have now acknowledged this limitation more explicitly in the revised Discussion section.

      (Line 347) “Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions and the short imaging window available after bolus injection. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. In addition, since microbubbles are rapidly cleared from circulation, the duration of effective imaging is limited to only a few minutes, which also overlaps with the anesthesia recovery period, constraining the usable awake-state imaging window. Future improvement on microbubble infusion using an indwelling jugular vein catheter presents a promising alternative to address these limitations. This method allows for stable microbubble infusion without the need for anesthesia induction, ensuring that the awake imaging condition is free from residual anesthetic effects. Moreover, it has the potential to extend the duration of imaging sessions, offering a longer and more stable time window for data acquisition. Furthermore, by performing ULM imaging in the awake state first, instead of starting with anesthetized imaging, researchers can achieve a more rigorous comparison of how various anesthetics influence cerebral microvascular dynamics relative to the awake baseline.”

      The more precise description of the number of mice and blood vessels analyzed in Figure 6 makes it apparent the limited number of independent samples used to support the findings of this work. A limitation that should be acknowledged. The newly provided information added as Supplementary Figure 1 should be moved to the main text, eventually in the figure legends. The limited data in support of the findings was also highlighted by Rev2 and, indirectly, by Rev3.

      Response 05: We acknowledge the limited number of independent samples used in this study. In the revised manuscript, we have explicitly emphasized this limitation in the Discussion section. Specifically, we added the following statement:

      (Line 329) “Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      Following your suggestion, we have also moved the newly provided information (the table in Supplementary Figure 1) into figure captions. In addition, we have modified in the Methods section to ensure that this information is clear.

      (Line 406) “Eight healthy female C57 mice (8-12 weeks) were used for this study, numbered as Mouse 1 to Mouse 8. Three mice (Mouse 1–3) were used to compare imaging results between awake and anesthetized states (Fig. 3 and 4). Three additional mice (Mouse 4–6) underwent longitudinal imaging over a three-week period (Fig. 5 and 6). Among them, Mouse 4 was also used as an example to demonstrate the overall system schematic and saturation conditions (Fig. 1 and 2). Several mice (Mouse 2, 6, 7, and 8) exhibited suboptimal cranial window quality or image artifacts and were included to illustrate common surgical or imaging issues (Supplementary Fig. 1). The specific usage of each animal is also annotated in the corresponding figure captions.”

      Reviewer #2 (Public Review):

      The authors present a very interesting collection of methods and results using brain ultrasound localization microscopy (ULM) in awake mice. They emphasize the effect of the level of anesthesia on the quantifiable elements assessable with this technique (i.e. vessel diameter, flow speed, in veins and arteries, area perfused, in capillaries) and demonstrate the possibility of achieving longitudinal cerebrovascular assessment in one animal during several weeks with their protocol.

      The authors made a good rewriting of the article based on the reviewers' comments. One of the message of the first version of the manuscript was that variability in measurements (vessel diameter, flow velocity, vascularity) were much more pronounced under changes of anesthesia than when considering longitudinal imaging across several weeks. This message is now not quite mitigated, as longitudinal imaging seems to show a certain variability close to the order of magnitude observed under anesthesia. In that sense, the review process was useful in avoiding hasty conclusion and calls for further caution in ULM awake longitudinal imaging, in particular regarding precision of positioning and cancellation of tissue motion.

      Strengths:

      Even if the methods elements considered separately are not new (brain ULM in rodents, setup for longitudinal awake imaging similar to those used in fUS imaging, quantification of vessel diameters/bubble flow/vessel area), when masterfully combined as it is done in this paper, they answer two questions that have been longrunning in the community: what is the impact of anesthesia on the parameters measured by ULM (and indirectly in fUS and other techniques)? Is it possible to achieve ULM in awake rodents for longitudinal imaging? The manuscript is well constructed, well written, and graphics are appealing.

      The manuscript has been much strengthened by the round of review, with more animals for the longitudinal imaging study.

      Weaknesses:

      Some weaknesses remain, not hindering the quality of the work, that the authors might want to answer or explain.

      When considering fig 4e and fig 4j together: it seems that in fig 4e the vascularity reduction in the cortical ROI is around 30% for downward flow, and around 55% for upward flow; but when grouping both cortical flows in fig 4j, the reduction is much smaller (~5%), even at the individual level (only mouse 1 is used in fig 4e). Can you comment on that?

      Response 06: Thank you for carefully pointing this out. This discrepancy arises primarily from differences in ROI selections.

      The vascularity metric (now we changed the term into fractional vessel area, based on Reviewer 1’s comments) is calculated as the proportion of vessel-occupied pixels relative to the total ROI area. As such, it is best suited for longitudinal comparisons within the same ROI rather than across-ROI comparisons, particularly when the size and vessel composition of the ROIs differ.

      In Fig. 4e, the cortical ROI includes mostly the penetrating vessels, which are selected due to their clear distinction between upward (venous) and downward (arterial) flow directions. Pial vessels were intentionally excluded because flow direction alone does not reliably distinguish arteries from veins in these surface vessels. Thus, the goal of this analysis was to indicate arteriovenous differences, rather than to represent the full cortical vascular changes.

      In contrast, the ROIs used in Fig. 4j aim to provide a more comprehensive view of cortical vascular responses without distinguishing flow direction. That’s why both penetrating and pial vessels are included. Since pial vessels showed relatively smaller vascularity changes within the coronal cross-sections analyzed in our study, their inclusion in the cortical ROI likely contributed to the smaller overall reduction in vascularity observed in Figure 4j.

      To address this potential confusion, we have added further clarification in the Results section of the revised manuscript.

      (Line 209) “It is worth noting that prior analyses (Fig. 4d–h) aimed to illustrate arteriovenous differences. Since pial vessels are difficult to distinguish as arteries or veins based on flow direction in coronal plane imaging, they were excluded from the ROI selection in those analyses. In the current whole-brain comparisons (Fig. 4i-k), the cortical ROIs no longer exclude pial vessels, since distinguishing between arteries and veins is not required. This aims to provide a more comprehensive representation of cortical vasculature.”

      When considering fig 4e, fig 4j, fig 6e and fig 6i altogether, it seems that vascularity can be highly variable, whether it be under anesthesia or vascular imaging, with changes between 5 to 40%. Is this vascularity quantification worth it (namely, reliable for example to quantify changes in a pathological model requiring longitudinal imaging)?

      Response 07: Thank you for raising this important point. We found that imaging in the awake state is inherently more variable than under anesthesia. In contrast, anesthetized imaging offers a more controlled and stable physiological condition, as anesthesia suppresses many sources of variation. For pathological studies, if the vascular or hemodynamic changes induced by anesthesia do not interfere with the scientific question being addressed, imaging under anesthesia can still be a practical and effective approach, due to its experimental simplicity and better physiological consistency.

      The higher variability observed in awake imaging arises from both physiological fluctuations in animals and unavoidable experimental inconsistencies, such as small misalignment on the imaging plane across sessions. If the research question aims to avoid the confounding effects of anesthesia, then instead of suppressing variation through anesthesia, it is important to acknowledge the natural baseline variation in the awake state. However, efforts should be made to minimize technical sources of variation. We have added a brief discussion of this issue at the end of the manuscript to reflect this consideration.

      (Line 396) “However, it is also important to note that although longitudinal awake imaging presents promise to avoid the confounding effects of anesthetics, imaging under anesthesia remains more convenient and controllable in many cases. For applications where the physiological question of interest is not sensitive to anesthesia-induced vascular effects, anesthetized imaging still offers a simpler and more stable approach. Awake imaging inherently exhibits greater physiological variability. However, care must be taken at the experimental level to minimize confounding sources of variation, such as stress level of the animal or handling inconsistencies, to ensure that the measurements are physiologically meaningful.”

      Regarding whether fractional vessel area (formerly referred to as vascularity) is a worthwhile metric for longitudinal quantification: based on our experience and comparisons, we found vessel area to be relatively robust and informative (see also Response 02 to Reviewer 1 for details). However, we acknowledge that other quantitative metrics—such as microbubble count, tortuosity, or flow directionality—may be more suitable depending on the specific pathological model or research question. How these metrics perform in awake imaging and longitudinal disease models is indeed an open and important question. We hope our work can serve as a foundation to inspire further investigation in this direction. To facilitate such exploration, we have developed and open-sourced a MATLAB-based analysis tool that supports multiple quantitative ULM metrics for longitudinal comparison. We encourage users to adapt and extend this framework to evaluate different quantitative metrics.

      (Line 578) “Code availability

      To support quantitative longitudinal analysis of ULM data, we developed an open-source MATLAB application (https://github.com/ekerwang/ULMQuantitativeAnalysis). This tool is designed to facilitate ROI-based analysis of ULM images for longitudinal comparisons. It supports multiple quantification metrics, including but not limited to vessel area and mean velocity used in this study. Users can select and adapt different metrics based on their specific applications, as a wide range of ULM-based quantification metrics have been developed for different pathological and pharmacological studies.”

      Reviewer #2 (Recommendations For The Authors):

      Images in figure 4 lack color bars.

      Response 08: Thank you for pointing this out. The color bars for the images in Figure 4 are the same as those used in the corresponding images in Figure 3. We have now added the explanation of color bars to the revised version of Figure 4 caption.

      Fig 4d: upward and downward are probably swapped.

      Response 09: Thank you for pointing this out, and we apologize for the oversight. They were mistakenly swapped. We have corrected this error in the revised figure.

      No quantitative conclusions are drawn regarding the changes in vessel diameter under anesthesia? Is it not significant? If it is not then why bring changes in diameter to our attention in fig 3 (white arrows) and figure 4b?

      Response 10: Our intention in highlighting diameter changes in Figure 3 (white arrows) and Figure 4b was to provide an illustrative example of isoflurane-induced diameter changes at the single-vessel level. These examples are meant to serve as case studies, not as the basis for broad statistical conclusions.

      In the initial version of the manuscript, we attempted to draw quantitative conclusions by measuring vessel diameters from ten manually selected vessel segments at each location. However, based on feedback from other reviewers, we decided to remove this analysis in the revised version. Manual selection of vessel segments is highly subjective and prone to bias, limiting its reliability for quantitative interpretation.

      Instead, we focused on ROI-based analysis using fractional vessel area (formerly referred to as vascularity), which reflects widespread changes in vessel diameter across regions. It is a more generalizable and less biased metric for quantifying vascular diameter changes.

      We further explained this in the Results section:

      (Line 181) “To validate the broader applicability of our findings, we conducted ROI-based analyses using fractional vessel area and mean velocity as primary metrics. These metrics extended the analysis of vessel diameter and flow velocity to entire brain regions or selected ROIs, which provides a more objective assessment of cerebral blood flow changes at a global scale and reduces the bias associated with manually selecting vessel segments. For vessel area measurements, the term fractional denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes.”

      Line 210 "In summary, statistical analysis revealed a decrease in individual vessel diameter" this does not seem to be supported by this version of the manuscript as no analysis is done on a representative group of vessels for the diameter.

      Response 11: Thank you for pointing out this important issue. In line with our previous response (Response 10), we would like to clarify that the analysis of individual vessel diameter was intended to serve as an example study, rather than a statistically supported conclusion based on a group of vessels. To avoid confusion, we have removed the phrase “statistical analysis revealed a decrease in individual vessel diameter” from the manuscript. 

      The meaning of the *** in fig 6b and 6c should be clarified as: -it is not explicitly stated - the equivalence test interpretation is less usual than other tests.

      Response 12: We thank the reviewer for pointing out this important issue. We agree that the use of asterisks (***) in Fig. 6b and 6c may have led to confusion, as such markers are typically associated with statistical significance in difference testing. In our case, the analysis was based on the two one-sided test (TOST) procedure to assess statistical equivalence, which is indeed less commonly used and could be misinterpreted.

      To address this, we have replaced the asterisks *** in the figure with the label “equiv.”, which more clearly reflects the intended interpretation. Additionally, we have revised the figure caption and the main text to explicitly state that these markers denote statistical equivalence (not difference) as determined by TOST, with the equivalence margin defined as three times the standard deviation of one week.

      (Figure 6 Caption) “Statistical analysis was performed using the two one-sided test (TOST) to evaluate consistency of measurement. The label “equiv.” indicates statistically equivalent measurements (p < 0.001), defined as interweek differences smaller than three times the standard deviation of one week.”

      (Line 240) “Statistical testing of equivalence was conducted using the two one-sided test (TOST) procedure, which evaluates whether the difference between two time points falls within a predefined equivalence margin. Specifically, equivalence is defined as the inter-week difference being smaller than three times the standard deviation of one week. A statistically significant result in TOST (p < 0.001) supports the interpretation that the measurements are statistically equivalent, which is denoted as “equiv.” in the figures.”

      Line 237 and following: please consider rephrasing into "To further generalize these findings and examine longitudinal variation in ROI-based analysis, we used Mouse 4 as an example to show the consistency of blood flow density across different flow directions in the cortex (Fig. 6d) and extended the quantitative analysis to all three mice (Fig. 6e) (individual ULM upward and downward flow images for all three mice over the threeweek longitudinal study period can be found in Supplementary Fig. 4)." The paragraph will make much more sense.

      Response 13: We appreciate your helpful rephrasing. We have fully adopted your proposed revision to enhance the clarity and coherence of the text. The sentence now reads exactly as you recommended:

      (Line 250): “To further generalize these findings and examine longitudinal variation in ROI-based analysis, we used Mouse 4 as an example to show the consistency of blood flow density across different flow directions in the cortex (Fig. 6d) and extended the quantitative analysis to all three mice (Fig. 6e) (individual ULM upward and downward flow images for all three mice over the three-week longitudinal study period can be found in Supplementary Fig. 4).”

      Line 248: "While arterial and venous flow velocity distributions exhibit clear distinctions, their variations over the three weeks remained acceptable" the meaning of acceptable remains elusive.

      Response 14: Thank you for pointing out the ambiguity in the phrase “remained acceptable”. To improve clarity and precision, we have revised the sentence to provide a more informative description. The updated sentence now reads:

      (Line 261) “While arterial and venous flow velocity distributions exhibit clear distinctions, the distribution shapes remained relatively consistent across the three weeks. Specifically, variation in median velocity were within 1 mm/s. In contrast, anesthesia-induced changes can lead to velocity shifts exceeding 1 mm/s.”

      Line 253: consider rephrasing in "Despite subcortical regions showing the largest vascularity variability consecutive to anesthesia-induced changes, vascularity in those regions was relatively stable values in the longitudinal study" as otherwise the link between the 2 parts of the sentence feels odd.

      Response 15: Thank you for your constructive suggestion regarding the logical flow of the sentence. We fully agree with your point and have revised the sentence exactly as you proposed.

      (Line 268) “Despite subcortical regions showing the largest vascularity variability consecutive to anesthesia-induced changes, vascularity in those regions was relatively stable values in the longitudinal study.”

    1. Author response:

      Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      We appreciate the reviewer’s point that expanding to additional animal models would be valuable. In the study, we have so far tested our approaches on C. elegans and Jellyfish. Given that this is considered a ‘very, very minor weakness’ and that it does not directly affect the results or analyses in the paper, we think this might be better to address in future work.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      We agree that it would be valuable to benchmark other labs’ software pipelines on our datasets. We note that most papers in this area, which describe those pipelines, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ on our data might not represent those pipelines in their best light when compared to our pipeline that was developed with our data in mind. Data from different microscopy platforms can be surprisingly different and we wouldn’t want to perform an analysis that had this bias. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

      Indeed, some pre-processing was performed on images before registration and neuron identification -- understanding these nuances can be important. The pre-processing steps are described in the Results section and detailed in the Methods. They are also all available in our open-source software. For BrainAlignNet, the key steps were: (1) selecting image registration problems, (2) cropping, and (3) Euler alignment. Steps (1) and (3) were critically important and are extensively discussed in the Results and Discussion sections of our study (lines 142-144, 218-234, 318-323, 704-712). Step (2) is standard in image processing. For AutoCellLabeler and CellDiscoveryNet, the pre-processing was primarily to align the 4 NeuroPAL color channels to each other (i.e. make sure the blue/red/orange/etc channels for an animal are perfectly aligned). This is also just a standard image processing step to ensure channel alignment. Thus, the more “custom” pre-processing steps were extensively discussed in the study and the more “common” steps are still described in the Methods. The implementation of all steps is available in our open-source software.

      Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      We agree that it would be valuable to benchmark many labs’ software pipelines on some common datasets, ideally from several different research labs. We note that most papers in this area, which describe the other pipelines that have been developed, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ and comparing the results to our pipeline (where we have extensive expertise) might bias the performance metrics in favor of our software. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      We appreciate that the machine learning field moves fast. Our goal was not to invent entirely novel machine learning tools, but rather to apply and optimize tools for a set of challenging, unsolved biological problems. We began with the somewhat simpler architectures described in our study and were largely satisfied with their performance. It is conceivable that newer approaches would perhaps lead to even greater accuracy, flexibility, and/or speed. But, oftentimes, simple or classical solutions can adequately resolve specific challenges in biological image processing.

      Regarding CellDiscoveryNet, our claim of unsupervised training is precise: CellDiscoveryNet is trained end-to-end only on raw images, with no human annotations, pseudo-labels, external classifiers, or metadata used for training, model selection, or early stopping. The loss is defined entirely from the input data (no label signal). By standard usage in machine learning, this constitutes unsupervised (often termed “self-supervised”) representation learning. Downstream clustering is likewise unsupervised, consuming only image pairs registered by CellDiscoveryNet and neuron segmentations produced by our previously-trained SegmentationNet (which provides no label information).

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Regarding BrainAlignNet: we agree that we trained on each species’ own data (worm, jellyfish) and we would suggest other labs working on new organisms to do the same based on our current state of knowledge. It would be fantastic if there was an alignment approach that generalized to all possible cases of non-rigid-registration in all animals – an important area for future study. We also agree that pre-alignment was critical in worms and jellyfish, which we discuss extensively in our study (lines 142-144, 318-321, 704-712).

      Regarding AutoCellLabeler: the animals were not recorded in any standardized pose and were not aligned to each other beforehand – they were basically in a haphazard mix of poses and we used image augmentation to allow the network to generalize to other poses, as described in our study. It is still possible that AutoCellLabeler is somehow brittle to pose changes (e.g. perhaps extremely curved worms) – while we did not detect this in our analyses, we did not systematically evaluate performance across all possible poses. However, we do note that this network was able to label images taken from freely-moving worms, which by definition exhibit many poses (Figure 5D, lines 500-525); aggregating the network’s performance across freely-moving data points allowed it to nearly match its performance on high-SNR immobilized data. This suggests a degree of robustness of the AutoCellLabeler network to pose changes.

      Regarding ANTSUN 2.0: we agree that there are some hyperparameters (described in our study) that affect ANTSUN performance. We agree that it would be worthwhile to fully automate setting these in future iterations of the software.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Please see our response to your point (1) under Weaknesses above.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      There are other efforts in the literature to solve the neuron tracking and neuron identification problems in C. elegans (please see paragraphs 4 and 5 of our Introduction, which are devoted to describing these). However, they are quite different in the approaches that they use, compared to our study. For example, for neuron tracking they use t->t+1 methods, or model neurons as point clouds, etc (a variety of approaches have been tried). For neuron identification, they work on extracted features from images, or use statistical approaches rather than deep neural networks, etc (a variety of approaches have been tried). Our assessment is that each of these diverse approaches has strengths and drawbacks; we agree that a meta-analysis of the design choices used across studies could be valuable.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      We agree that there are some ANTSUN 2.0 hyperparameters (described in our Methods section) that could affect the quality of neuron tracking. It would be worthwhile to fully automate setting these in future iterations of the software, ensuring that the hyperparameter settings are robust to variation in data/experiments.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      We wish to remind the reviewer that we developed BrainAlignNet for use in worms and jellyfish. These two animals have different distributions of neurons and radically different anatomy and movement patterns. Data from the two organisms was collected in different labs (Flavell lab, Weissbourd lab) on different types of microscopes (spinning disk, epifluorescence). We believe that this is a good initial demonstration that the approach has robustness across different settings.

      Regarding comparisons to other labs’ C. elegans data processing pipelines, we agree that it will be extremely valuable to compare performance on common datasets, ideally collected in multiple different research labs. But we believe this should be performed collaboratively so that all software can be utilized in their best light with input from each lab, as described above. We agree that such a comparison would be very valuable.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

      Regarding worms vs jellyfish pre-processing: we actually had the exact opposite reaction to that of the reviewer. We were surprised at how similar the pre-processing was for these two very different organisms. In both cases, it was essential to (1) select appropriate registration problems to be solved; and (2) perform initialization with Euler alignment. Provided that these two challenges were solved, BrainAlignNet mostly took care of the rest. This suggests a clear path for researchers who wish to use this approach in another animal. Nevertheless, we also agree with the reviewer’s caution that a totally different use case could require some re-thinking or re-strategizing. For example, the strategy of how to select good registration problems could depend on the form of the animal’s movement.

      Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      We performed dataset-specific training for image registration and neuron identification, and we would encourage new users to do the same based on our current state of knowledge. This highlights how standardization of whole-brain imaging data across labs is an important issue for our field to address and that, without it, variations in imaging conditions could impact software utility. We refer the reviewer to an excellent study by Sprague et al. (2025) on this topic, which is cited in our study.

      However, at the same time, we wish to note that it was actually reasonably straightforward to take the BrainAlignNet approach that we initially developed in C. elegans and apply it to jellyfish. Some of the key lessons that we learned in C. elegans generalized: in both cases, it was critical to select the right registration problems to solve and to preprocess with Euler registration for good initialization. Provided that those problems were solved, BrainAlignNet could be applied to obtain high-quality registration and trace extraction. Thus, our study provides clear suggestions on how to use these tools across multiple contexts.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      We respectfully disagree with this critique. We considered the alternative suggested by the reviewer (in their private comments to the authors) of comparing against a manually annotated dataset. But this annotation would require manually linking ~150 neurons across ~1600 timepoints, which would require humans to manually link neurons across timepoints >200,000 times for a single dataset. These datasets consist of densely packed neurons rapidly deforming over time in all 3 dimensions. Moreover, a single error in linking would propagate across timepoints, so the error tolerance of such annotation would be extremely low. Any such manually labeled dataset would be fraught with errors and should not be trusted. Instead, our approach relies on a simple, accurate assumption: GFP expression in a neuron should be roughly constant over a 16min recording (after bleach correction) and the levels will be different in different neurons when it is sparsely expressed. Because all image alignment is done in the red channel, the pipeline never “peeks” at the GFP until it is finished with neuron alignment and tracking. The eat-4 promoter was chosen for GFP expression because (a) the nuclei labeled by it are scattered across the neuropil in a roughly salt-and-pepper fashion – a mixture of eat-4-positive and eat-4-negative neurons are found throughout the head; and (b) it is in roughly 40% of the neurons, giving very good overall coverage. Our view is that this approach of labeling subsets of neurons with GFP should become the standard in the field for assessing tracking accuracy – it has a simple, accurate premise; is not susceptible to human labeling error; is straightforward to implement; and, since it does not require manual labeling, is easy to scale to multiple datasets. We do note that it could be further strengthened by using multiple strains each with different ‘salt-and-pepper’ GFP expression patterns.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      Our tracking accuracy requires (a) a careful selection of registration problems, (b) highly accurate registration of the selected registration problems, and (c) effective clustering. We extensively discussed the importance of the choosing of the registration problems in the Results section (lines 218-234 and 318-321), Discussion section (lines 704-708), and Methods section (955-970 and 1246-1250) of our paper. We also discussed the clustering aspect in the Results section (lines 247-259), Discussion section (lines 708-712), and Methods section (lines 1162-1206). In addition, our abstract states that the BrainAlignNet needs to be “incorporated into an image analysis pipeline,” to inform readers that other aspects of image analysis need to occur (beyond BrainAlignNet) to perform tracking.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

      The reviewer raises two points here: (1) whether AutoCellLabeler accuracy is impacted by ease of human labeling; and (2) what fraction of total neurons are identified. We address them one at a time.

      Regarding (1), we believe that the reviewer overlooked an important analysis in our study. Indeed, to assess its performance, one can only compare AutoCellLabeler’s output against accurate human labels – there is simply no way around it. However, we noted that AutoCellLabeler was identifying some neurons with high confidence even when humans had low confidence or had not even tried to label the neurons (Fig. 4F). To test whether these were in fact accurate labels, we asked additional human labelers to spend extra time trying to label a random subset of these neurons (they were of course blinded to the AutoCellLabeler label). We then assessed the accuracy of AutoCellLabeler against these new human labels and found that they were highly accurate (Fig. 4H). This suggests that AutoCellLabeler has strong performance even when some human labelers find it challenging to label a neuron. However, we agree that we have not yet been able to quantify AutoCellLabeler performance on the small set of neuron classes that humans are unable to identify across datasets.

      Regarding (2), we agree that knowing how many neurons are labeled by AutoCellLabeler is critical. For example, labeling only 3 neurons per animal with 100% accuracy isn’t very helpful. We wish to emphasize that we did not omit this information: we reported the number of neurons labeled for every network that we characterized in the study, alongside the accuracy of those labels (please see Figures 4I, 5A, and 6G; Figure 4I also shows the number of human labels per dataset, which the reviewer requested). We also showed curves depicting the tradeoff between accuracy and number of neurons labeled, which fully captures how we balanced accuracy and number of neurons labeled (Figures 5D and S4A). It sounds like the reviewer also wanted to know the total number of recorded neurons. The typical number of recorded neurons per dataset can also be found in the paper in Fig. 2E.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions. 

      Strengths: 

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis). 

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings. 

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes. 

      Weaknesses: 

      (1) The sample size for the study was not calculated, although it was a nested cohort study. 

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.  

      “Participants who were already enrolled in the cohort study were recruited to take part in the movement analysis study. At the time of recruitment, we found no published scientific studies detailing how to perform sample size calculations for research using GPS data in humans. Therefore, we opted to use convenience sampling instead. A target of 30 people per study area, balanced by gender and blind to their serological status, was chosen for this study.” [Lines 163 - 169]

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study. 

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decisionmaking when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.  

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions. 

      We thank the reviewer for highlighting this limitation. We have made this more clear in the discussion section: 

      “As a result, the findings are biased towards the more represented individuals, limiting their generalisability. Additionally, all participants are from specific areas in Salvador, which may further limit the generalisability to similar contexts.” [Lines 561 - 564]

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power. 

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.  

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149. 

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented. We have now clarified this in the manuscript.  

      “Individuals with less than 50 relocations within the study area were excluded from the analysis to ensure good model convergence. Details of these excluded individuals can be found in Supplementary Material I.” [Lines 183 – 186]

      (6) Some figures are not clear (see Figure 4 A & B). 

      We have improved the resolution of the image and believe it is more clear now. Please let us know if the resolution still is not clear enough.  

      (7) No statement on conflict of interest was included, considering sponsorship of the study. 

      The conflict of interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.  

      Reviewer #2 (Public review): 

      Summary: 

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status. 

      Strengths: 

      The authors assembled a rich dataset by collecting human GPS logger data, combined with fieldrecorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection). 

      Weaknesses: 

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed. 

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors.  

      Reviewing Editor Comments: 

      The manuscript would benefit from clearer articulation of SSF assumptions, data exclusions, and buffer choices, as well as improvements in figure clarity, to strengthen its generalizability and impact. 

      Please see replies to Reviewer #2 below regarding the assumptions (2.3), data exclusions (2.1) and buffer choices (2.2). We have improved Figure 4 clarity, please let us know if this is not sufficient.  

      Reviewer #1 (Recommendations for the authors): 

      (1) Provide comprehensive details on telemetry data collection for improved data quality and reproducibility. 

      Details for this are included under the “Methods/GPS Data” section. We have included a sentence to explain that we used to GPS device manufacturer’s software to programme them. We believe this provides enough information on how to collect the data for reproducibility, but please let us know if there is further information that we could provide.  

      “Individuals who consented to take part in this study were asked to wear GPS loggers for continuous periods of up to 48 hours, which could be repeated. The GPS loggers used were i-got U GT-600, set to record their location every 35 seconds. We used the manufacturer’s software to programme the devices. Data were collected between March and November 2022.” [Lines 172 - 176]

      (2) Check all figures and improve on clarity (see Figure 4). 

      We have updated Figure 4 and believe the resolution is better now. Please let us know if this it not the case from the readers perspective.  

      (3) Revisit sentence structures to improve readability and reduce overly complex phrasing. 

      We have reviewed the manuscript and made some changes to improve readability. 

      Reviewer #2 (Recommendations for the authors): 

      I thank Ruiz Cuenca et al. for putting together this interesting manuscript on the use of step selection functions for understanding exposure to leptospires in urban Brazil. I thoroughly enjoyed reading it and have a few suggestions that may improve the manuscript. 

      I also apologise, but I was not able to find some of the supplementary materials, for instance, Supplementary Material I. That may have been my oversight. 

      To eLife: These should have been included with the submitted manuscript file. Please let me know if it has to be resubmitted to eLife.

      (1) Descriptive statistics 

      Some more descriptive statistics would be helpful. For instance, what was the leptospirosis infection status of the six individuals who were removed due to having <50 points inside the area? As part of the analysis relies on exposure, defined as GPS locations within a 20m buffer of open sewers, community streams, and rubbish piles, it would be good to have some descriptive statistics around this. How many visits to these different sites did people make, and how did these statistics vary by study area, age, gender, and leptospirosis infection status? 

      We thank Reviewer #2 for highlighting this. Thanks to their comment, we noticed a mistake in the code which excluded more individuals from the summary statistics table than were actually excluded from the full analysis. There were only 2 individuals that had less than 50 relocations across the whole day (5 am to 9 pm) which were excluded from further analysis. The mistake has been rectified and the summary statistics updated. (see table 1)

      We have included the demographic details of excluded participants as a table in the supplementary material, which we have referenced to in the manuscript. We have also explained that the exclusion is to aid model convergence, as we found that too few relocations would result in SSF models not working properly.  

      “Individuals with less than 50 relocations within the study area were excluded from the analysis to ensure good model convergence. Details of these excluded individuals can be found in Supplementary Material I.” [Lines 183 – 186]

      We have also now included a table (Table 2),  to show more descriptive statistics of how much time individuals spent within each of the environmental buffers. 

      (2) Definitions of buffers 

      I was surprised that the authors chose a 20m buffer for each factor but 10m around the household.Could this be more clearly justified, especially given that there will be location errors in both the GPS location point and the GPS logger points? These buffers do appear quite small, particularly in an urban environment where obstruction from buildings can be expected to yield substantial GPS errors. 

      The 20 meter buffer represents an intense interaction with the point of interest. This distance was decided after visiting the sites and seeing the points of interest in person. The 10 meter buffer accounts for the size of dwellings in these areas. We have included these explanations in the new manuscript:  

      “The buffer rasters, one for each factor, were created using a 20 meter buffer around each reference point. The size of this buffer was decided after visiting the study areas and represented an area within which it could be considered a strong interaction with the point of interest.” [Lines 198 – 202]

      “Buffer rasters were also created for each individual’s household location, with a 10 meter buffer around each location.This represented space within and immediately outside each house.  This buffer size accounted for the size of dwellings in these study areas.” [Lines 205 - 208]

      (3) Assumptions of the step selection function 

      Step selection functions (SSFs) rely on a number of assumptions. Whether these assumptions are met needs to be critically discussed within the article. (For a discussion of the assumptions, I am relying on points raised in this article: Integrated step selection analysis: bridging the gap between resource selection and animal movement (2015): Tal Avgar, Jonathan R. Potts, Mark A. Lewis, Mark S. Boyce, DOI: https://doi.org/10.1111/2041-210X.12528). 

      First, SSFs typically assume each step is independent, conditional only on the previous step (Markovian process). This is violated in circular movements, for instance. Circular movements are highly likely in human movement as people will leave and return to their homes during the day. While this is partially addressed by conducting separate analyses by time of day, circular journeys can still exist within these segments. 

      Second, SSFs do not account for goal-oriented behaviour like intentional destination-seeking. So, for instance, when someone executes a plan to visit a specific stream to fetch drinking water, such behaviour is poorly approximated using SSFs because SSFs compare observed steps to random alternatives drawn from a movement kernel, assuming movement is opportunistic rather than intentional. 

      This is true of SSF that do not include movement attributes. However, in our SSF we have included both step lengths and turning angles, which, according to Avgar et al, should be enough to account for this goal-oriented behaviour. It may be clearer to call the model an integrated step selection function (iSSF), as they do in Avgar et al., which we can change in the next version of the manuscript.  

      Third, turning angles in human movement are often sharp due to regular street layout, which can violate the assumptions of SSFs, which usually assume smooth, correlated movement. 

      As this paper proposes SSFs as a novel method to measure exposure to environmentally transmitted pathogens, a discussion on the extent to which assumptions of SSFs are valid for this purpose should be included in the paper. 

      We thank Reviewer #2 for highlighting these points. We have included a section discussing these assumptions in detail: 

      “Additionally, these models have some underlying assumptions that may be violated in this study. Step-selection functions assume each step is independent, conditioned on the previous step. This can be violated by circular journeys. Although we attempted to account for these by analysing specific periods of the day, a higher temporal resolution of analysis may be needed if circular journeys are still present within each period. Another assumption is that movement is smooth through the environment. In urban environments this may not hold true, as street layouts may force sharp corners in movements. The effect of violating this assumption is not immediately clear and requires further methodological research to understand its significance. Finally, we assumed that by including movement characteristics (step lengths and turning angles) into our models, we were accounting for goal-oriented behaviour. These assumptions need to be considered in future studies that attempt to use step-selection functions to analyse human mobility.” [Lines 593 - 607]

      (4) Abstract 

      While it is highlighted in the abstract that this "study introduces a novel method for analysing human telemetry data in infectious disease research, providing critical insights for targeted interventions", I did not see any discussion about how the findings can inform interventions. 

      We thank Reviewer #2 for highlighting this. We have now removed this wording from the abstract to avoid misunderstanding.  

      (5) Effect sizes 

      It would have helped me if there had been some discussion around the size of these effects. Especially for the distance-based models, the effects seem very small. Maybe this is a misinterpretation on my part, but it would help to contextualise if the observed effect were small or large. 

      We agree with Reviewer #2 on this point and have now included a paragraph explaining that these effect sizes are indeed very small. We believe that this may be linked to the spatial scale of the rasters used (1 meter), as the selection coefficients represent changes with regards to increasing distances of 1 meter. This may not be that significant for human mobility. However, given the focus on analysing fine scale movement, we decided to keep the spatial scale of the rasters as small as possible. 

      “It is important to highlight that the effect sizes of the selection coefficients for the distance based rasters are very small and could be considered negligible. This may be linked to the spatial scale used, as these values represent increases of 1 meter. A coarser scale may have produced larger effect sizes that may have been easier to conceptualise. However, given the focus on fine-scale movement, we decided to keep this spatial scale for the analysis.” [Lines 421 - 427]

    1. Author response:

      We thank the reviewers for their constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by (i) providing an additional example of our method’s implementation and (ii) framing our contribution more clearly as a continuation of the line of research that characterises neuronal models in terms of their bifurcation structure.

      Experimental validation, however, is beyond the scope of this study. Constructing experimental bifurcation diagrams remains a major challenge, particularly for unstable branches. Although some techniques exist to approximate branches of unstable steady states, unstable limit cycles are far more difficult to capture. Additionally, in practice, many factors vary during recordings, and generating reliable diagrams would require a large number of tightly controlled experimental repetitions whose stability often cannot be ensured. Two-dimensional bifurcation diagrams, as needed for the analysis in our manuscript, are even more challenging to obtain because the extensive and stable recordings would have to be available from the same cell at different values of the second parameter (such as different extracellular potassium concentrations). At this stage, our method can be applied to the reduction of detailed conductance-based models, which themselves are constrained by experimental data (for example, gating functions fitted to voltage-clamp recordings). This way, simple yet dynamically faithful phenomenological models for efficient use in network analysis and simulation can be derived from more complex, biophysical models. In contrast to the traditional voltage fitting approach, these models can also capture changes in additional parameters (such as extracellular potassium concentration).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Raices et al., provides novel insights into the role and interactions between SPO-11 accessory proteins in C. elegans. The authors propose a model of meiotic DSBs regulation, critical to our understanding of DSB formation and ultimately crossover regulation and accurate chromosome segregation. The work also emphasizes the commonalities and species-specific aspects of DSB regulation.

      Strengths:

      This study capitalizes on the strengths of the C. elegans system to uncover genetic interactions between a large number of SPO-11 accessory proteins. In combination with physical interactions, the authors synthesize their findings into a model, which will serve as the basis for future work, to determine mechanisms of DSB regulation.

      Weaknesses:

      The methodology, although standard, lacks quantification. This includes the mass spectrometry data , along with the cytology. The work would also benefit from clarifying the role of the DSB machinery on the X chromosome versus the autosomes.

      • We have uploaded the MS data and added a summary table with the number of peptides and coverage.

      • We have added statistics to the comparisons of DAPI body counts.

      • We have provided additional images of the change in HIM-5 localization

      • We have quantified the overlap (or lack thereof) between XND-1 and HIM-17 and the DNA axis

      Reviewer #2 (Public Review):

      Summary:

      Meiotic recombination initiates with the formation of DNA double-strand break (DSB) formation, catalyzed by the conserved topoisomerase-like enzyme Spo11. Spo11 requires accessory factors that are poorly conserved across eukaryotes. Previous genetic studies have identified several proteins required for DSB formation in C. elegans to varying degrees; however, how these proteins interact with each other to recruit the DSB-forming machinery to chromosome axes remains unclear.

      In this study, Raices et al. characterized the biochemical and genetic interactions among proteins that are known to promote DSB formation during C. elegans meiosis. The authors examined pairwise interactions using yeast two-hybrid (Y2H) and co-immunoprecipitation and revealed an interaction between a chromatin-associated protein HIM-17 and a transcription factor XND-1. They further confirmed the previously known interaction between DSB-1 and SPO-11 and showed that DSB-1 also interacts with a nematodespecific HIM-5, which is essential for DSB formation on the X chromosome. They also assessed genetic interactions among these proteins, categorizing them into four epistasis groups by comparing phenotypes in double vs. single mutants. Combining these results, the authors proposed a model of how these proteins interact with chromatin loops and are recruited to chromosome axes, offering insights into the process in C. elegans compared to other organisms.

      Weaknesses:

      This work relies heavily on Y2H, which is notorious for having high rates of false positives and false negatives. Although the interactions between HIM-17 and XND-1 and between DSB-1 and HIM-5 were validated by co-IP, the significance of these interactions was not tested, and cataloging Y2H interactions does not yield much more insight.

      We appreciate that the reviewer recognized the value of our IP data, but we beg to differ that we rely too heavily on the Y2H. We also provide genetic analysis on bivalent formation to support the physical interaction data. We do acknowledge that there are caveats with Y2H, however, including that a subset of the interactions can only be examined with proteins in one orientation due to auto-activation. While we acknowledge that it would be nice to have IP data for all of the proteins using CRISPR-tagged, functional alleles, these strains are not all feasible (e.g. no functional rec-1 tag has been made) and are beyond the scope of the current work.

      Moreover, most experiments lack rigor, which raises serious concerns about whether the data convincingly supports the conclusions of this paper. For instance, the XND-1 antibody appears to detect a band in the control IP; however, there was no mention of the specificity of this antibody.

      We previously showed the specificity of this antibody in its original publication showing lack of staining in the xnd-1 mutant by IF (Wagner et al., 2010). To further address this, however, we have now included a new supplementary figure (Figure S1) demonstrating the specificity of the XND-1 antibody by Western blot. The antibody detects a distinct band in extracts from wild-type (N2) worms, but this band is absent in two independent xnd-1 mutant strains. This confirms that the antibody specifically recognizes XND-1, supporting the validity of the IP results shown in the main figures.

      Additionally, epistasis analysis of various genetic mutants is based on the quantification of DAPI bodies in diakinesis oocytes, but the comparisons were made without statistical analyses.

      We have added statistical analysis to all datasets where quantification was possible, strengthening the rigor and interpretation of our findings.

      For cytological data, a single representative nucleus was shown without quantification and rigorous analysis. The rationale for some experiments is also questionable (e.g. the rescue by dsb-2 mutants by him-5 transgenes in Figure 2), making the interpretation of the data unclear. Overall, while this paper claims to present "the first comprehensive model of DSB regulation in a metazoan", cataloging Y2H and genetic interactions did not yield any new insights into DSB formation without rigorous testing of their significance in vivo. The model proposed in Figure 4 is also highly speculative.

      Regarding the cytology, we provide new images and quantification of HIM-17 and XND-1 overlap with the DNA axes. We also added full germ line images showing HIM-5 localization in wild type and dsb-1 mutants, to provide a more complete and representative view of the observed phenotype. To further support our findings, we’ve also included images demonstrating that this phenotype is consistently observed with both in live worm with the the him-5::GFP transgene and in fixed worms with an endogenously tagged version of HIM-5.

      Reviewer #3 (Public Review):

      During meiosis in sexually reproducing organisms, double-strand breaks are induced by a topoisomerase-related enzyme, Spo11, which is essential for homologous recombination, which in turn is required for accurate chromosome segregation. Additional factors control the number and genome-wide distribution of breaks, but the mechanisms that determine both the frequency and preferred location of meiotic DSBs remain only partially understood in any organism.

      The manuscript presents a variety of different analyses that include variable subsets of putative DSB factors. It would be much easier to follow if the analyses had been more systematically applied. It is perplexing that several factors known to be essential for DSB formation (e.g., cohesins, HORMA proteins) are excluded from this analysis, while it includes several others that probably do not directly contribute to DSB formation (XND-1, HIM-17, CEP-1, and PARG-1).

      We respectfully disagree with the reviewer’s statement regarding the selection of factors included in our analysis. In this work, our focus was specifically on SPO-11 accessory factors — proteins that directly interact with or regulate SPO-11 activity during doublestrand break formation. Cohesins and chromosome axis proteins (such as the HORMA domain proteins) are essential for establishing the correct chromosome architecture that supports DSB formation, but there is no evidence that they are direct accessory factors of SPO-11. Therefore, they were intentionally excluded from this study to maintain a clear and focused scope on proteins that more directly modulate SPO-11 function.

      Conversely, XND-1, HIM-17, CEP-1, and PARG-1 have all been implicated in regulating aspects of SPO-11-mediated DSB formation or its immediate environment. Although their contributions mayinvolve broader chromatin or DNA damage response regulation, prior literature supports their inclusion as relevant modulators of SPO-11 activity, justifying their analysis within the context of this work.

      The strongest claims seem to be that "HIM-5 is the determinant of X-chromosome-specific crossovers" and "HIM-5 coordinates the actions of the different accessory factors subgroups." Prior work had already shown that mutations in him-5 preferentially reduce meiotic DSBs on the X chromosome. While it is possible that HIM-5 plays a direct role in DSB induction on the X chromosome, the evidence presented here does not strongly support this conclusion. It is also difficult to reconcile this idea with evidence from prior studies that him-5 mutations predominantly prevent DSB formation on the sex chromosomes, while the protein localizes to autosomes.

      HIM-5 is not the only protein that is autosomally enriched but preferentially affects the X chromosome: MES-4 and MRG-1 are both autosomally-enriched but influence silencing of the X chromosome. While HIM-5 appears autosomally-enriched, it does not appear to be autosomal-exclusive. While we would ideally perform ChIP to determine its localization on chromatin, this method for assaying DSB sites is likely insufficient to identify DSB sites which differ in each nucleus and for which there are no known hotspots in the worm.

      him-5 mutants confer an ~50% reduction in total number of breaks and a very profound change in break dynamics (seen by RAD-51 foci (Meneely et al., 2012)). Since the autosomes receives sufficient breaks in this context to attain a crossover in >98% of nuclei, this indicates that the autosomes are much less profoundly impacted by loss of DSB functions than is the X chromosome. Indeed, prior data from co-author, Monica Colaiacovo, showed that fewer breaks occur on the X (Gao, 2015) likely resulting from differences in the chromatin composition of the X and autosome resulting from X chromosome silencing.

      The conclusion that HIM-5 must be required for breaks on the X comes from the examination of DSB levels and their localization in different mutants that impair but do not completely abrogate breaks. In any situation where HIM-5 protein expression is affected (xnd-1, him-17, and him-5 null alleles), breaks on the X are reduced/ eliminated. By contrast, in dsb-2 mutants, where HIM-5 expression is unaffected, both X and autosomal breaks are impacted equally. As discussed above, in the absence of HIM-5 function, there are ~15 breaks/ nucleus. The Ppie1::him-5 transgene is expressed to lower levels than Phim-5::him-5, but in the best case, the ectopic expression of this protein should give a maximum of ~15 breaks (the total # of breaks is thought to be ~30/nucleus). By these estimates, Ppie-1::him-5; him-17 and him-5 null mutants have the same number of breaks. Yet, in the former case, breaks occur on the X; whereas in the latter they do not. The best explanation for this discrepancy is that HIM-5 is sufficient to recruits the DSB machinery to the X chromosome.

      The one experiment that seems to elicit the conclusion that HIM-5 expression is sufficient for breaks on the X chromosome is flawed (see below). The conclusion that HIM-5 "coordinates the activities of the different accessory sub-groups" is not supported by data presented here or elsewhere.

      We have reorganized the discussion to more directly address the reviewers’ concerns. We raise the possibility that HIM-5 has an important role in bringing together the SPO-11 and its interacting components (DSB-1/2/3) with the other DSB inducing factors, including those factors that regulating DSB timing (XND-1), coordination with the cell cycle (REC-1), association with the chromosome axis (PARG-1, MRE-11), and coupling to downstream resection and repair (MRE-11, CEP-1).  

      This raises a natural question: if HIM-5 has such a central role, why are the phenotypes of HIM-5 so mild? We propose that while the loss of DSBs on the X appears mild, more profound effects are seen in the total number, timing, and placement of the DSBs across the genome- all of which are diminished or altered in the absence of HIM-5. The phenotypes of him-5 loss reminiscent of those observed in Prdm9-/- in mice where breaks are relocated to transcriptional start sites and show significant delay in formation. As with PRDM9, the comparatively subtle phenotypes of HIM-5 loss do not diminish its critical role in promoting proper DSB formation in most mammals.

      Like most other studies that have examined DSB formation in C. elegans, this work relies on indirect assays, here limited to the cytological appearance of RAD-51 foci and bivalent chromosomes, as evidence of break formation or lack thereof. Unfortunately, neither of these assays has the power to reveal the genome-wide distribution or number of breaks. These assays have additional caveats, due to the fact that RAD-51 association with recombination intermediates and successful crossover formation both require multiple steps downstream of DSB induction, some of which are likely impaired in some of the mutants analyzed here. This severely limits the conclusions that can be drawn. Given that the goal of the work is to understand the effects of individual factors on DSB induction, direct physical assays for DSBs should be applied; many such assays have been developed and used successfully in other organisms.

      We appreciate the reviewer’s thoughtful comments. We agree that RAD-51 foci are an indirect readout of DSB formation and that their dynamics can be influenced by defects in downstream repair processes. However, in C. elegans, the available methods for directly detecting DSBs are limited. Unlike other organisms, C. elegans lacks γH2AX, eliminating the possibility of using γH2AX as a DSB marker. TUNEL assays, while conceptually appealing, have proven unreliable and poorly reproducible in the germline context. Similarly, RPA foci do not consistently correlate with the number of DSBs and are influenced by additional processing steps.

      Given these limitations, RAD-51 foci remain the most widely accepted surrogate for monitoring DSB formation in C. elegans. While we fully acknowledge the caveats associated with this approach — particularly the potential effects of downstream repair defects — RAD-51 analysis continues to provide valuable insight into DSB dynamics and regulation, especially when interpreted in combination with other phenotypic assessments.

      Throughout the manuscript, the writing conflates the roles played by different factors that affect DSB formation in very different ways. XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes, including genes encoding proteins that directly promote DSBs. Mutations in either xnd-1 or him-17 result in dysregulation of germline gene expression and pleiotropic defects in meiosis and fertility, including changes in chromatin structure, dysregulation of meiotic progression, and (for xnd-1) progressive loss of germline immortality. It is thus misleading to refer to HIM-17 and XND-1 as DSB "accessory factors" or to lump their activities with those of other proteins that are likely to play more direct roles in DSB induction.

      It is clear that we will not reach agreement about the direct vs indirect roles here of chromatin remodelers/transcription factors in break formation. In yeast, there is a precedent for SPP1 and in mouse for Prdm9, both of which could be described as transcription factors as well, as having roles in break formation by creating an open chromatin environment for the break machinery. We envision that these proteins function in the same fashion. The changes in histone acetylation in the xnd-1 mutants supports such a claim.

      We do not know what the reviewer is referring to in statement that “XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes.” While the Carelli et al paper indeed shows a role for HIM-17 in expression of many germline genes, there is only one reference to XND-1 in this manuscript (Figure S3A) which shows that half of XND-1 binding sites overlap with the co-opted germline promoters. There is no transcriptional data at all on xnd-1 mutants, save our studies (referenced herein) that XND-1 regulates him-5 expression.

      For example, statements such as the following sentence in the Introduction should be omitted or explained more clearly: "xnd-1 is also unique among the accessory factors in influencing the timing of DSBs; in the absence of xnd-1, there is precocious and rapid accumulation of DSBs as monitored by the accumulation of the HR strand-exchange protein RAD-51.

      We are not sure what is confusing here. The distribution of RAD-51 foci is significantly altered in xnd-1 mutants and peak levels of breaks are achieved as nuclei leave the transition zone (Wagner et al., 2010; McClendon et al., 2016). There is no other mutation that causes this type of change in RAD-51 distribution.

      "The evidence that HIM-17 promotes the expression of him-5 presented here corroborates data from other publications, notably the recent work of Carelli et al. (2022), but this conclusion should not be presented as novel here.

      We have clarified this in the text. We note that this paper showed alterations in him-5 levels by RNA-Seq but they did not validate these results with quantitative RT-PCR. Thus, our studies do provide an important validation of their prior results.

      The other factors also fall into several different functional classes, some of which are relatively well understood, based largely on studies in other organisms. The roles of RAD50 and MRE-11 in DSB induction have been investigated in yeast and other organisms as well as in several prior studies in C. elegans. DSB-1, DSB-2, and DSB-3 are homologs of relatively well-studied meiotic proteins in other organisms (Rec114 and Mei4) that directly promote the activity of Spo11, although the mechanism by which they do so is still unclear.

      Whilst we agree that we understand some of the functions of the homologs, there are clearly examples in other processes of conserved proteins adopting unique regulatory function. We should not presume evolutionary conservation until proven. Indeed the comparison between the Mer2 proteins becomes particularly relevant here. For example, the RMM complex in plants does not contain PRD3, although this protein is thought to have function in DSB formation and repair (Lambing et al, 2022; Vrielynck et al., 2021; Thangavel et al., 2023). In Sordaria, as well, the Mer2 homolog has distinct functions (Tesse et al., 2017).  

      Mutations in PARG-1 (a Poly-ADP ribose glycohydrolase) likely affect the regulation of polyADP-ribose addition and removal at sites of DSBs, which in turn are thought to regulate chromatin structure and recruitment of repair factors; however, there is no convincing evidence that PARG-1 directly affects break formation.

      Our prior collaborative studies on PARG-1 showed that is has a non-catalytic function that promote DSBs that is independent of accumulation of PAR (Janisiw et al., 2020; Trivedi et al., 2022)

      CEP-1 is a homolog of p53 and is involved in the DNA damage response in the germline, but again is unlikely to directly contribute to DSB induction.

      We respectfully disagree with the reviewer’s statement. While CEP-1 is indeed a homolog of p53 and plays a major role in the DNA damage response, prior work from Brent Derry’s lab and from our group (Mateo et al., 2016) demonstrated that specific cep-1 separationof-function alleles affect DSB induction and/or repair pathway choice independently of canonical DNA damage checkpoint activation. In particular, defects in DSB formation observed in certain cep-1 mutants can be rescued by exogenous irradiation, supporting a direct or closely linked role in promoting DSB formation rather than merely responding to damage. Thus, based on these functional data, we considered CEP-1 a relevant factor to include in our analysis. We have now clarified this rationale in the revised manuscript.

      HIM-5 and REC-1 do not have apparent homologs in other organisms and play poorly understood roles in promoting DSB induction. A mechanistic understanding of their functions would be of value to the field, but the current work does not shed light on this. A previous paper (Chung et al. G&D 2015) concluded that HIM-5 and REC-1 are paralogs arising from a recent gene duplication, based on genetic evidence for a partially overlapping role in DSB induction, as well as an argument based on the genomic location of these genes in different species; however, these proteins lack any detectable sequence homology and their predicted structures are also dissimilar (both are largely unstructured but REC-1 contains a predicted helical bundle lacking in HIM-5). Moreover, the data presented here do not reveal overlapping sets of genetic or physical interactions for the two genes/proteins. Thus, this earlier conclusion was likely incorrect, and this idea should not be restated uncritically here or used as a basis to interpret phenotypes.

      Actually, there is quite good bioinformatic analysis that the rec-1 and him-5 loci evolved from a gene duplication and that each share features of the ancestral protein (Chung et al., 2015). We are sorry if the reviewer casts aspersions on the prior literature and analyses. The homology between these genes with the ancestral protein is near the same degree as dsb-1, dsb-2, or dsb-3 to their ancestral homologs (<17%).

      DSB-1 was previously reported to be strictly required for all DSB and CO formation in C. elegans. Here the authors test whether the expression of HIM-5 from the pie-1 promoter can rescue DSB formation in dsb-1 mutants, and claim to see some rescue, based on an increase in the number of nuclei with one apparent bivalent (Figure 2C). This result seems to be the basis for the claim that HIM-5 coordinates the activities of other DSB proteins. However, this assay is not informative, and the conclusion is almost certainly incorrect. Notably, a substantial number of nuclei in the dsb-1 mutant (without Ppie-1::him-5) are reported as displaying a single bivalent (11 DAPI staining bodies) despite prior evidence that DSBs are absent in dsb-1 mutants; this suggests that the way the assay was performed resulted in false positives (bivalents that are not actually bivalents), likely due to inclusion of nuclei in which univalents could not be unambiguously resolved in the microscope. A slightly higher level of nuclei with a single unresolved pair of chromosomes in the dsb-1; Ppie-1::him-5 strain is thus not convincing evidence for rescue of DSBs/CO formation, and no evidence is presented that these putative COs are X-specific. The authors should provide additional experimental evidence - e.g., detection of RAD-51 and/or COSA-1 foci or genetic evidence of recombination - or remove this claim. The evidence that expression of Ppie-1::him-5 may partially rescue DSB abundance in dsb-2 mutants is hard to interpret since it is currently unknown why C. elegans expresses 2 paralogs of Rec114 (DSB-1 and DSB-2), and the age-dependent reduction of DSBs in dsb-2 mutants is not understood.

      We have removed this claim in part because we have been unable to create the triple mutants strains to analyze COSA-1 foci.

      To the point about 11 vs 12 DAPI bodies: the literature is actually replete with examples of 11 DAPI bodies vs 12 in mutants with no breaks:

      Hinman al., 2021: null allele of dsb-3 has an average of 11.6 +/- 0.6 breaks;

      Stamper et al, 2013, show just over 60% of dsb-1 nuclei with 12 DAPI bodies and 5-10% with 10 DAPI bodies. (Figure 1);

      In addition, we also previously showed (Machovina et al., 2016) that a subset of meiotic nuclei have a single RAD-51 focus and can achieve a crossover. RAD-51 foci in spo-11 were also reported in Colaiacovo et al., 2003.

      Several of the factors analyzed here, including XND-1, HIM-17, HIM-5, DSB-1, DSB-2, and DSB-3, have been shown to localize broadly to chromatin in meiotic cells. Coimmunoprecipitation of pairs of these factors, even following benzonase digestion, is not strong evidence to support a direct physical interaction between proteins.

      Similarly, the super-resolution analysis of XND-1 and HIM-17 (Figure 1EF) does not reveal whether these proteins physically interact with each other, and does not add to our understanding of these proteins functions, since they are already known to bind to many of the same promoters. Promoters are also likely to be located in chromatin loops away from the chromosome axis, so in this respect, the localization data are also confirmatory rather than novel.

      While the binding to promoters would be expected to be on DNA loops, that has not been definitively shown in the worm germ line. The supplemental data of the Carelli paper suggests that there are ~250 binding sites for each protein at these coopted promoters. This could not account for crossover map seen in C. elegans.

      The reviewer states correct that we do not reveal that these proteins interact, but we have shown that the two proteins co-IP and have a Y2H interaction. This interaction is supporedt by a recent publication (Blazickova et al., 2025) corroborating this conclusion and identifies XND-1 in HIM-17 co-IPs also in the presence of benzonase. We do now show, however, by immuno-localization that the two proteins appear to be adjacent, but nonoverlapping. As now described in the text, AlphaFold 3 modeling and structural analysis suggests that the two proteins do interact directly and that the tagged 5’ end of HIM-17 used in our studies is likely to be at least 200nm from the putative XND-1 binding interface, a distance that is consistent with our confocal images showing frequent juxtaposition of the two proteins.

      The phenotypic analysis of double mutant combinations does not seem informative. A major problem is that these different strains were only assayed for bivalent formation, which (as mentioned above) requires several steps downstream of DSB induction. Additionally, the basis for many of the single mutant phenotypes is not well understood, making it particularly challenging to interpret the effects of double mutants. Further, some of the interactions described as "synergistic" appear to be additive, not synergistic. While additive effects can be used as evidence that two genes work in different pathways, this can also be very misleading, especially when the function of individual proteins is unknown. I find that the classification of genes into "epistastasis groups" based on this analysis does not shed light on their functions and indeed seems in some cases to contradict what is known about their functions. ‘

      As described above, each of the proteins analyzed is thought to have a direct role in regulating meiotic DSB formation and single mutant phenotypes are consistent with this interpretation. In almost all-if not all- of these cases, IR induced breaks suppress univalent phenotypes (or uncover a downstream repair defect (e.g. in mre-11)) supporting this conclusion. We have changed the terminology from “epistasis groups” since this is not strict epistasis, but rather, “functional groups”.  

      The yeast two-hybrid (Y2H) data are only presented as a single colony. While it is understandable to use a 'representative' colony, it is ideal to include a dilution series for the various interactions, which is how Y2H data are typically shown.

      The Y2H data are presented as spots on a plate and are from three to four individual transformants per interaction tested, and are not individual colonies. The experiment was repeated in triplicate from different transformations. We have now made this clearer in the materials and methods section. This approach has been successfully used to examine protein interactions in our prior manuscripts of yeast and human proteins [Gaines et al (2015) Nat. Comms 6:7834; Kondrashova et al (2017) Cancer Discovery 7:984; Garcin et al (2019) PLoS Genetics 15:e1008355; Bonilla et al (2021) eLife 1: e68080) Prakash et al (2022) PNAS 119: e2202727119, etc]

      Additional (relatively minor) concerns about these data:

      (1) Several interactions reported here seem to be detected in only one direction - e.g., MRE-11-AD/HIM-5-BD, REC-1-AD/XND-1-BD, and XND-1-AD/HIM-17-BD - while no interactions are seen with the reciprocal pairs of fusion proteins. I'm not sure if some of this is due to pasting "positive" colony images into the wrong position in the grid, but this should be addressed.

      The asymmetry in the interactions observed is due to the well-known phenomenon in yeast two-hybrid (Y2H) assays where certain plasmids exhibit self-activation when fused in one orientation, making interpretation of reciprocal interactions challenging. In our experiment, some of the plasmids indeed showed self-activation in one direction, which likely accounts for the lack of interaction seen with the reciprocal pairs of fusion proteins. We have clarified this point in the Methods.

      (2) DSB-3 was only assayed in pairwise combinations with a subset of other proteins; this should be explained; it is also unclear why the interaction grids are not symmetrical about the diagonal.

      We have now completed the analysis by adding the interactions of DSB-3 with the remaining proteins that were missing from the initial set.

      (3) I don't understand why the graphic summaries of Y2H data are split among 3 different figures (1, 2, and 3).

      We chose to split the graphic summaries of the Y2H data across Figures 1, 2, and 3 because we felt this organization better aligns with the flow of the results presented in each figure. Each set of interactions is shown in the context of the specific experiments and findings discussed in those sections, which we believe helps provide a clearer and more logical presentation of the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: B) The IP is difficult to interpret - there is a band of the corresponding size to XND-1 in the control lane calling into question the specificity of the IP/Western.

      We added a supplemental figure with the specificity of the antibody showing that there is a background non-specific band.

      C) More information about the mass spectrometry should be included. No indication of the number of times a peptide was identified, or the overall coverage of the identified proteins.

      Done

      This is important as in the results section (line 114) the authors indicate that there was "strong" interaction yet there is no way to assess this.

      D) Why wasn't hatching measured in the him-5p::him-5; him-17(ok424) strain?

      Great question. I guess we need to do this while back out for review. If anyone has suggestions of what to say here. Clearly we overlooked this point but do have the strain.

      E) Quantification of the cytology should be included.

      We have now quantified overlap between XND-1 and HIM-17

      Figure 2: C) Statistics should be included.

      Done

      E) Quantification should be included for the cytology. I recommend changing the eals15 to HIM-5.

      We included better images showing whole gonads instead of one or two nuclei. We were not sure what the reviewers want us to quantify here since the relocalization of the protein to the cytoplasm is very clear.

      I have a general issue with the use of the term epistasis - this is used to order gene function based on different mutant phenotypes, usually with null alleles. While I think the authors have valid points with how they group the different SPO-11 accessory proteins, I do not think they should use the word epistasis, but rather genetic interactions.

      We appreciate the reviewers thoughts on this matter and have removed the term epistasis and use functional groups or genetic interactions throughout the text.

      Figure 4 and the nature of the X chromosome: First, I think it would help the non-C. elegans reader to include a little more information on the X chromosome with respect to its differences compared to the autosomes. I also think that, if possible, it would be beneficial to include a model of the X in Figure 4.

      We have added more about X/autosome differences in the intro and during the discussion of HIM-5 function and have added a figure showing difference in the behavior of the X/autosomes during DSB/crossover formation.

      Minor points:

      Abstract: Given the findings of Silva and Smolikove on SPO-11 breaks, I recommend removing "early" from line 28 in the Abstract.

      Done

      Introduction (line 93): I think "biochemical studies" is a stretch here - I recommend "interaction studies".

      Done

      Results: (lines 160-161): mutations are not required for breaks. Line 172, there is a problem with the sentence.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) Figure 1B- The signal for XND-1 seems to appear both in the control and him-17::HA IP. Do the authors have tested the specificity of the XND-1 antibody?

      We included a supplementary figure demonstrating the specificity of the XND-1 antibody by Western blot. This was also previously published (Wagner et al., 2010)

      (2) Figure 1D - can the authors provide an explanation why the him-5p::him-5 transgene that drives a higher expression than pie-1p::him-5 fails to suppress the Him phenotype seen in him-17? What are the HIM-5 levels like in these two strains compared to N2 and him-17 null mutants? Can this information provide explanation for the differential effect of the him-5 transgenes?

      We previously reported that him-5p::him-5 drives higher expression than pie-1p::him-5 (McClendon et al, 2016).

      The reason that him-5p::him-5 does not rescue, despite higher wild type expression is that HIM-17 directly regulates expression of him-5. Since HIM-17 does not regulate the pie-1 promoter, the pie-1p::him-5 construct can at least partially suppress the him-17 mutation.

      We have (hopefully) explained this better in the text.  

      (3) Line 102- the subheading "HIM-5 is the essential factor for meiotic breaks in the Xchromosome" may not be appropriate for this section. This is what has previously been known. However, the results in Figure 1 demonstrate that a him-5 transgene can partially rescue the him-17 and ¬xnd-1 phenotype, but not that it is essential for meiotic DSB formation on X chromosomes.

      We think some of the concern here is sematic and have changed the phraseology to say that HIM-5 is SUFFICIENT for DSBs on the X… which had not previously been shown.

      Vis-à-vis the X chromosome, in all genetic backgrounds examined, the absence of HIM-5 consistently results in a complete lack of DSBs on the X. For instance, in dsb-2 mutants— where HIM-5 is still expressed—DSBs are reduced genome-wide, but the X chromosome occasionally retains breaks. In contrast, even a weak allele of him-17 results specifically in the loss of X chromosome breaks, underscoring a unique requirement for HIM-5 in promoting DSBs on the X. While Figure 1 shows that a him-5 transgene can partially rescue him-17 and xnd-1 phenotypes, the consistent observation that X breaks are absent without HIM-5 supports its classification as sufficient for DSB formation on the X chromosome.

      (4) Figure 1E - please consider enlarging the images and showing multiple examples.

      Done.

      I also suggest that the authors perform a more rigorous analysis to support the conclusion that XND-1 and HIM-17 localize away from the axis by quantifying multiple images and doing line-scan analysis.

      Provided. New images are provided in both, the main and supplemental figures, and quantification is included. There is no detectable overlap of the two protein with one another or the DNA axes (see quantification of overlap in Fig. 1).

      (5) Line 162 - This is the first mention of DSB-1, DSB-2, and DSB-3 in the paper. DSB-1 and DSB-2 are Rec114 homologs in C. elegans (Tesse et al., 2017), while DSB-3 is a homolog of Mei4 (Hinman et al., 2021). These proteins should be properly introduced in the introduction with appropriate citations.

      Done. We appreciate the reviewer pointing out that this was the first reference to these genes.

      (6) Line 169 - the rationale for this experiment is unclear. Why did the Y2H interaction between HIM-5 and DSB-1 prompt the authors to test the rescue of dsb-1 or dsb-2 phenotypes by the ectopic expression of him-5? Do the authors have evidence that HIM-5 level is reduced in dsb-1 or dsb-2 mutants?

      We have reorganized this section to better explain the motivation for looking at these interactions. We did see a difference in the localization in HIM-5 in the dsb-1 mutant animals and we did have a sense that HIM-5 was critical for breaks on the X. We reasoned that it could have independent functions in promoting breaks that were not yet appreciated so wanted to do this experiment.

      (7) Line 172 - "very slightly reduced". This claim requires statistical analysis.

      We added statistical analysis, but we also removed this claim.

      (8) Figures 2C and 2D - Can the authors provide an explanation why the pie-1p::him-5 transgene fails to suppress the phenotypes in dsb-1, while the him-5p::him-5 trasgene can? Again, the rationale for these experiments is unclear. Because of this, the interpretation is also unclear.

      The difference in rescue between the pie-1p::him-5 and him-5p::him-5 transgenes likely reflects differences in expression levels. As previously shown (McClendon et al., 2016), the him-5p::him-5 construct results in significantly higher expression of HIM-5 protein compared to pie-1p::him-5. This elevated expression likely explains its ability to partially rescue the dsb-1 phenotype. In contrast, the lower expression driven by the pie-1 promoter is insufficient to compensate for the absence of dsb-1 function. We have clarified the rationale and interpretation of these experiments in the revised manuscript to better reflect this point.

      (9) Lines 184-185 - the data for endogenously tagged HIM-5::3xHA are not shown anywhere in the paper. This must be shown.

      We have added this in the supplemental figures.

      (10) Figure 2D and 2E - what does the localization of pie-1p::him-5::GFP (eaIs15) and him5p::him-5::GFP (eaIs4) look like in wild-type and dsb-1 mutants? Are the cytoplasmic aggregates caused by increased levels of HIM-5 expression? Can the differential behavior of him-5 transgenes provide explanation for differential rescues?

      We now show both live and fixed images of Phim-5::him-5::gfp transgenes, as well as the localization of the endogenously HA-tagged HIM-5 locus (Figure 2 and S3). In all cases, the protein is initially nuclear and then absent from meiotic nuclei with similar timing. The Ppie1::him-5 transgene was very difficult to image due to low expression (even in wild type) so it not shown here. We presume it is the slightly elevated level of expression of the Phim5::him-5::gfp that can explain the differential rescue.

      (11) Lines 221-222, where are the results shown? Please refer to Figure S3.

      Done

      (12) Figure S3 - these need statistical analyses.

      Done

      (13) Lines 230-231 - what about the rec-1; parg-1; cep-1 triple mutant?

      This is an excellent suggestion and not one we have not yet pursued. Given the lack of strong phenotypes in all combination of double mutants, we prioritized other experiments . However, we agree that examining the rec-1; parg-1; cep-1 triple mutant would provide a valuable test of whether these factors act in the same pathway, and we appreciate the reviewer highlighting this potential future direction.

      (14) Line 298 - I suggest the authors take a look at the Alphafold prediction of DSB-1/DSB-2/DSB-3 and the comparison to human and budding yeast Rec114/Mei4 complex in Guo et al., 2022 eLife, which could provide insights into the Y2H results.

      We thank the reviewer for these comments and have indeed used these interactions and predicted homologies to zero in a region of interaction between these proteins that resembles what is seen in humans and yeast with a dimer of REC114 like proteins wraps stabilizing a central Mei4 helix . This is now shown in Figure 3H, I. Satisfyingly, this modeling predicts that a trimer comprised of 2 DSB-1 proteins with DSB-3 is more stable than a DSB1-DSB-2-DSB-3 trimer. This might explain why DSB-2 is not required in young adults and only becomes essential as DSB-1 levels drop in older animals (Rosu et al., 2013)

      (15) Can the authors introduce mutations within the DSB-1 interfaces that disrupt the interaction to either SPO-11 or DSB-2?

      We have begun to address this question by introducing targeted mutations within DSB-1. As shown in Figure 3E and 3F, mutations in the C-terminal region of DSB-1—which includes a core of four α-helices—disrupt its interaction with DSB-2 and DSB-3, but not with SPO-11. These findings suggest that the C-terminus mediates interactions specifically with DSB2 and DSB-3

      (16) Line 323 - The him-5 phenotypes are too weak to support the idea that it serves as the linchpin for the whole DSB complex. Do the authors have an explanation for why him-5 mutants exhibit X-chromosome-specific DSB defects?

      In response to the reviewer, above, and in the text, we have included a more detailed explanation of why we think HIM-5 has a key role in coordinating meiotic break formation. Although, identified for its role on the X, the phenotypes associated with DSB formation in the mutant are really quite pleiotropic and severe.

      (17) Line 436 - C. elegans lacks DSB hotspots.

      Removed

      Minor comments:

      (1) Figure 1A - please show the raw data for the yeast two-hybrid.

      We show representative yeast colonies in Figure S3.

      (2) It looks like the labeling for Figure 1B and 1C are switched.

      Fixed.

      (3) Figure 1B - what does the red box indicate? Please explain it in the legend.

      It indicates the XND-1 band. We added that information in the legend.

      (4) Figure 1C - in the legend, it was noted that the results are from GFP pulldowns of HIM17::GFP. However, the method for Figure 1B and the method section noted that HIM-17 was tagged with 3xHA, and the pull-down was performed using anti-HA affinity matrix. Please reconcile this discrepancy.

      That’s because they were done in two different sets of experiments. For the IPs we used a HIM-17::HA strain and for the MS, a HIM-17::GFP strain.

      (5) Also in Figure 1C - please call Table S2 in the main text when discussing the mass spec results. Also, it is not clear what HIM-17 and GFP indicate in the table. What makes CKU80 different from the other proteins listed under GFP? Please explain more clearly in the legend.

      We have move the table to supplemental data where we have included all of the peptide counts and gene coverage. We have included in the revised method rationale for inclusion in this table which explains why CKU-80 differs.

      (6) Line 527 - it is unclear what experiment was done for HIM-17. Please revise it to indicate that this is for "HIM-17 immunoprecipitation". Also please indicate the strain used for HIM17 pull-down (AV280?).

      (7) Line 113- please be specific about how the HIM-17 IP was performed. Which epitope and strains are used for pull-downs?

      This indeed was AV280. This has been added to the text and methods.

      (8) Figure 1D- What does ND mean? In the text, it was stated that there was only a minor suppression of hatching rates. The hatching rate for him-5p::him-5; him-17 must have been measured, and the data must be presented.

      ND does mean not determined. We have removed the statement about “minor suppression”. We only tested the overall population dynamics in the Phim-5::him-5;him17(ok424) and the DAPI body counts. The failure to suppress the latter suggests there would be no enect on hatching rates, although we did not test this directly. Since we had done this for the Ppie-1::him-5;him-17 strain, we provided this information to further support the claims of genetic rescue by ectopic expression.

      (9) Line 151 - please specify that STED was used.

      We have removed the STED images, and just show the confocal images with Lightning Processing.

      (10) Figure 1E- the authors suggested that HIM-17 and XND-1 mainly localize to autosomes but not the X chromosome. However, there is not enough evidence that the chromosome excluded from HIM-17 staining is indeed an X chromosome.

      (11) Figure 1E (Line 154) - what are the active chromatin markers examined? Where are the data?

      We have previously shown that the chromosome lacking XND-1 staining is the X (Wagner et al., 2010). The X is heterochromatic and chromatin marks associated with active transcription, including H3K4me3 and HTZ-1 (a variant H2A), preferentially localize to autosomes, effectively anti-marking the X chromosome. As shown in the new Figure 1E, a single chromosome has very little XND-1 and HIM-17 associated proteins. This is the X chromosome.

      (12) Line 172 - It should be a comma instead of the period after "In dsb-1 mutants".

      Fixed

      (13) Figure S3H-K - I suggest the authors indicate the alleles of mre-11 (null vs. iow1) on the graph, similarly to him-5(e1490) to avoid confusion.

      Done

      (14) Lines 294 and 600 - Guo et al. 2022 is now published in eLife. The authors must cite the published paper, not the preprint.

      Fixed

      (15) Line 407 - the reference Carelli et al., 2022 is missing.

      Added

      (16) Line 766 - please remove "is" before nuclear.

      Done

      Reviewer #3 (Recommendations For The Authors):

      Major issues:

      In my view, the most interesting mechanistic finding in the paper is the evidence that HIM-5 may not bind to chromatin in the absence of DSB-1. If validated, this would suggest that HIM-5 is likely to be directly involved in a process that promotes break formation, in contrast to factors such as HIM-17 and XND-1. It does not, however, support the idea that HIM-5 is at the top of a hierarchy of DSB factors, as it is interpreted here. More importantly, the data supporting this claim are unconvincing; only a single image of an unfixed gonad from an animal expressing HIM-5::GFP is shown. Immunofluorescence should be performed and the results must be quantified.

      We have provided additional images of the HIM-5 relocalization to show that we observed this in both fixed and live worms with two different tagged strains. The exclusion from the nucleus is seen in all scenarios. Whether the protein now accumulates exclusively in the cytoplasm/ is destabilized is challenging to address with the fixed images due to the arbitrariness of defining “background” staining.

      More generally, this type of analysis, looking at the interdependence of different factors for their association with chromosomes, is much more informative than the genetic interaction data presented in the paper, which does not seem to provide any mechanistic insights into the functions of the factors analyzed. The paper could potentially be greatly improved through a more extensive, systematic analysis of the interdependence of DSBpromoting factors for their localization to chromosomes.

      We have at least added this for XND-1 and HIM-17 and show they are not interdependent for chromosome association. We also provide for the first time data on the localization of HIM-5 in the dsb-1 mutant. Many of the other interactions have already been shown in the literature and/or were not warranted base on the lack of genetic interaction we present here.

      Minor issues:

      The title is vague and inconclusive. A more concrete title summarizing the major findings would help readers to assess whether the work is of interest.

      We have discussed the title extensively with all authors and all would like to keep the current title.

      The authors claim that the expression of HIM-5 from a different promoter (Ppie-1::him-5) but not its endogenous promoter (Phim-5::him-5) can partially rescue the DSB defect in him-17 mutants. To support this claim, they should really quantify the germline expression of HIM-5 in wild-type, him-17, him-17; Ppie-1::him-5, and Phim-5::him-5; him-17.

      We had previously reported the expression in the N2 background of both transgenes (McClendon et al., 2016)

      Panel O appears to be missing from Figure S3.

      Fixed

      The evidence for chromosome fusions in cep-1; mre-11 mutants shown in S4D is not convincing and the claim should be removed unless stronger evidence can be obtained.

      A clearer image has been added

      The basis of the following statement is unclear: "Furthermore, rec-1;him-5 double mutants give an age-dependent severe loss of DSBs (like dsb-2 mutants) suggesting that the ancestral function of the protein may have a more profound effect on break formation." The manuscript does not seem to include data regarding age-dependent loss of DSBs and no other publication is cited to support this claim. The interpretation is also perplexing; I think that it may be predicated on the idea that REC-1 and HIM-5 are paralogs, but as stated above, this claim is not well supported and is likely specious.

      We have added the reference. This was shown in Chung et al., 2013 – the paper that presented the cloning of the rec-1 locus.

  2. Sep 2025
    1. Author response:

      We thank the reviewers and editors for their thoughtful and constructive feedback. We have carefully considered the comments and plan to revise the manuscript as follows:

      · Methods: We will expand the Methods section to provide additional details regarding the Pavlovian fear conditioning procedure, including instructions, experimental parameters, and the randomization process.

      · Figures and Statistical Reporting: We will break down some figures where appropriate and clearly display the distributions of key variables. We will also include additional statistical details in the main text and elaborate on the analyses where needed.

      · Language and Interpretation: We will revise the text to consistently use correlational rather than causal terminology, ensuring that our conclusions accurately reflect the findings from the fMRI data.

      · Computational Model of the Pulvinar: We will further elaborate on the assumptions and limitations of the intra-pulvinar model, discuss potential neural pathways and candidate regions (e.g., visual cortex), and highlight directions for future work, including studies in nonhuman primates to investigate anatomical connectivity.

      · Alternative Hypotheses of the mediodorsal thalamus-anterior pulvinar relationships: Other pulvinar subregions were already included as covariates in our hierarchical regression analyses, allowing us to account for anatomical proximity and shared variance. We will make this analysis more explicit and clarify the thinking process behind this analysis to allow readers to assess the specificity of the anterior pulvinar-mediodorsal thalamus relationship.

      · Limitations: We will add a dedicated subsection outlining key limitations, including considerations specific to fMRI studies.

      · Data Availability: All data and materials used in this study will be made available upon request from the corresponding author, subject to obtaining the necessary institutional authorization for the data-sharing agreement.

      We are confident that these revisions will enhance the clarity, transparency, and interpretability of the work, and we are grateful to the reviewers for their valuable suggestions. We will provide a detailed, point-by-point response along with the revised submission as soon as possible.

    1. Author response:

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript will be much stronger once we incorporate the requested changes.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs have to associate with the olfactory receptor co-receptor (Orco) in the cilium of the neuron to form functional OR-Orco complexes for odorant detection. Besides this chaperone function, Orco can form homomers with the potential to act as ionic pacemaker channels; a role which we explore in this study.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2016). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco Ligand Candidates (OLC) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). In that study, we could also demonstrate that OLC15 antagonizes the VUAA1 activation of Orco.

      Furthermore, we tested other published Orco antagonists in in vivo assays in intact hawkmoths, focusing on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific but instead affected different targets depending on time-of-day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Based on comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15.

      We will clarify the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We will include these additional qPCR experiments and edit the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints. We are currently working on the transcriptional control of Orco, both during ontogeny and throughout the day but this work in progress is beyond the scope of this manuscript.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). We will add the 2015 citation to the Modeling chapter in the Methods section to clarify this.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs. Thus, as the referee suggests, we will add text regarding the presence and localization of OR-Orco heteromers. However, we have indications that Orco homomers could indeed be present in the hawkmoth ORNs. In a heterologous expression system, MsexOrco expression alone was sufficient to increase intracellular Ca<sup>2+</sup> levels in response to VUAA1 application (Nolte et al., 2013). In differentiating primary cell cultures of hawkmoth antennae, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors, and Orco affected spontaneous activity (Nolte et al., 2016). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but cannot heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990).

      We will clarify our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during a very challenging long-term recording experiment over several days. In addition, we observed in our animal raising facility that in LD 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Here, we used isolated males that were never exposed to the female pheromones so that their circadian activity patterns readily disperse. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a free-running population. As requested by the referees in point (7), we will use additional tests for rhythmicity in each of our recordings and revise the manuscript accordingly.

      Assuming that hawkmoths need pheromone presence as additional Zeitgeber, we are currently working on a new set of experiments where we attempt to improve synchronization by exposure to LD cycles and pheromone before DD and OLC15 recordings. We will add these experiments to the manuscript.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording site is located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We will make this more clear in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs. This would indicate that all ORNs, whether they express pheromone- or general odorant receptors, could potentially share the same Orco-dependent spontaneous activity rhythms. In our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum.

      (5.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that these PKC and cGMP/cAMP-dependent regulations are present in other insect species. We are currently running thorough tip-recording experiments on the regulation of Orco gating, which are beyond the scope of this manuscript. However, we will add a set of experiments to this manuscript that demonstrates cAMP gating of Orco.

      (5.2)… and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper (Stengl and Schneider, 2024).

      (5.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro ((Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (reviews: Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)).

      (5.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a PKC- and cAMP-dependent modulation of Orco. These studies will be published in a follow-up publication.

      (6) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=34). Since 5/11 LD recordings and 7/10 DD recordings revealed daily/circadian rhythmicity and since many other physiological recordings at different ZTs of different members of our laboratory all revealed ZT-dependent pheromone-transduction we can be certain that the physiology of hawkmoth antennae is under strict circadian control. Please see also our response to (4) above commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      Nevertheless, we will follow the advice of the referees to apply additional tests for significance of rhythms in spontaneous activity, and we are thankful for the tests suggested that we were not aware of.

      (7) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      We will revise our data analysis, according to the valuable suggestions of the referees.

      However, based upon our previous studies with other Orco antagonists and different doses of OLC15 (Nolte et al., 2016) we found that 50 µM OLC15 is the best Orco antagonist dose in M. sexta to target Orco-dependent modulation of spontaneous action potential activity of hawkmoth olfactory receptor neurons. Please see also our response to (1).

      (8) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We will revise the discussion accordingly and clarify which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (9.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We currently search for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single nuclear transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript.

      (9.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. We will revise our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrate that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We will revise the discussion accordingly.

      b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We will add those experiments to the revised version of the manuscript (see our response to (2)).

      c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We will revise the manuscript accordingly.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We will revise the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      We will clarify the Methods section.

      References

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. doi:10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. J Exp Biol 206:1575–1588. doi:10.1242/jeb.00302

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Front Cell Neurosci 12:218. doi:10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. doi:10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Curr Biol 34:1414-1425.e5. doi:10.1016/j.cub.2024.02.042

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. doi:10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proc Natl Acad Sci 108:8821–8825. doi:10.1073/pnas.1102425108

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. J Exp Biol 172:345–354. doi:10.1242/jeb.172.1.345

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. doi:10.1038/22566

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. J Biol Rhythms 22:502–514. doi:10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. doi:10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. doi:10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. J Biol Rhythms 22:43–57. doi:10.1177/0748730406295462

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. J Biol Rhythms 29:318–331. doi:10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. J Biol Rhythms 27:388–397. doi:10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. doi:10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. doi:10.1523/ENEURO.0376-24.2024

      Stengl M. 2010. Pheromone Transduction in Moths. Front Cell Neurosci 4:133. doi:10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. J Comp Physiol A 174:187–194. doi:10.1007/BF00193785

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. J Comp Physiol A 199:897–909. doi:10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. J Neurosci 10:837–847. doi:10.1523/JNEUROSCI.10-03-00837.1990

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Front Physiol 14:1243455. doi:10.3389/fphys.2023.1243455

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Curr Biol 14:638–649. doi:10.1016/j.cub.2004.04.009

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla In: Locke M, Smith DS, editors. Insect Biology in the Future. Academic Press. pp. 735–763. doi:10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell Tissue Res 383:7–19. doi:10.1007/s00441-020-03363-x

    1. Author response:

      Reviewer #1 (Public review):

      Summary: 

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience. 

      Strengths: 

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses: 

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs—Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review): 

      Summary: 

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths: 

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses: 

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary: 

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths: 

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design. 

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism. 

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses: 

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

    1. Author response:

      We thank the editors and reviewers for their positive and constructive comments. The three most substantial points raised by the public review are the following:

      No explicit modelling of targeting of young men as a course to ending HIV. 

      We did not intend to imply that the epidemic could be ended by this alone, or even that targeting young men was the optimum strategy if resources were available for more general preventative interventions. The “last mile” for HIV will be a very complex scenario in which key populations will start to play an outsize role, and our modelling framework was not developed to consider it. As a result, we would not have confidence in modelling the decline of the viral population to zero. We shall be qualifying the existing language in the paper in order to make this clear.

      Subtype-specific disease progression data. 

      The criticism is that our modelling of disease progression was based on subtype B, while the HIV viral population in Zambia is overwhelmingly subtype C. Sensitivity to subtype has not been looked at in detail in this analysis as the literature suggests that the rate of CD4 decline does not differ between subtypes B and C.

      While some studies have shown differences in CD4 cell decline between subtypes, they have generally highlighted that subtype D progresses faster than other subtypes. Little evidence has been published on the differences between subtype B and C, and studies that do include both subtypes concluded that there was no significant difference in rates of CD4 decline between subtypes.

      No significant difference between rate of CD4 progression by subtype is evidenced in the following publications:<br /> - Klein et al. (2014) (N=9772)<br /> - Bouman et al. (2023) (although no subtype B)<br /> - Easterbrook et al. (2010) (N=861)

      While some studies have illustrated that "progression changes with HIV subtype", an interrogation of the underlying data highlights that subtype B is not included, e.g.<br /> - Kanki et al. (1999) looked at A versus "non-A subtype" but included no subtype B data.<br /> - Vasan et al. (2006) claims differences in rate of CD4 decline by subtype when compared to subtype D but includes no subtype B data.<br /> - Baeten et al. (2007) claims subtype D has faster progression that subtype A but includes no subtype B data.<br /> - Kiwanuka et al. (2008) claims differences in rate of CD4 decline but includes no subtype B data.<br /> - Amornkul et al. (2013) has no subtype B data.

      Furthermore, to explain why we used subtype B data to parameterise the model: usually, statistical analyses of CD4 count progression do not report parameters in a form that can be directly imported into models. Analysing summary statistics to include in models results in under-specified models of disease progression in simulations. For this reason we use the estimates from Cori et al. (2015); where the statistical analysis was specifically tailored to generate modelling parameters. The trade-off is therefore to use subtype C data with model misspecification, or subtype B data without; neither choice is perfect, and we chose the subtype B correctly specified estimates.

      The role of undiagnosed versus diagnosed and untreated subpopulations. 

      We will add an additional analysis us to compare age differences in sources and recipients according to the diagnostic status of the source.

      The rest of the comments in the public review ask for improvements in data presentation (including some additional statistical analyses) and to make sure qualitative claims are fully justified. We are happy to oblige with these, and will make our thinking clear on all points in the full response.

    1. Author response:

      We thank all three reviewers for their positive comments and valuable suggestions for improving the manuscript. A detailed blood stage analysis of LSA3-deificient parasites was conducted with, and led by, collaborators at Ehime University in a separate study that is currently in revision at another journal and will be published separately. We intend to cite the complementary publication once it is accepted for publication and to revise the wording in the current manuscript in accordance with suggested feedback. These changes will be reflected in the revised manuscript to be submitted as the eLife Version of Record.

    1. Author response:

      We thank both reviewers for their valuable comments. We have prepared a point-by-point response below.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The conclusions regarding the links between neural and behavioral mechanisms are mostly well supported by the data. However, what is less convincing is the authors' argument that their study offers evidence of 'priming'. An important hallmark of priming, at least as is commonly understood by cognitive scientists, is that it is stimulus specific: i.e., a repeated stimulus facilitates response times (repetition priming), or a repeated but previously ignored stimulus increases response times (negative priming). That is, it is an effect on a subsequent repeated stimulus, not ANY subsequent stimulus. Because (prime or target) stimuli are not repeated in the current experiments, the conditions necessary for demonstrating priming effects are not present. Instead, a different phenomenon seems to be demonstrated here, and one that might be more akin to approach/avoidance behavior to a novel or salient stimulus following an appetitive/aversive stimulus, respectively.

      (2) On a similar note, the authors' claim that 'priming' per se has not been well studied in non-human animals is not quite correct and would need to be revised. Priming effects have been demonstrated in several animal types, although perhaps not always described as such. For example, the neural underpinnings of priming effects on behavior have been very well characterized in human and non-human primates, in studies more commonly described as investigations of 'response suppression'.

      We thank the reviewer for these critical comments. After careful consideration of both reviews, we agree that “priming” may not be the most accurate term to describe the behavioral phenomenon. We plan to revise our terminology throughout the manuscript accordingly to better capture the generalized nature of the effect we observe.

      (3) The outcome measure - i.e., difference scores between the two odors or odor and non-odor (i.e., the number of flies choosing to approach the novel odor versus the number approaching the non-odor (air)) - appears to be reasonable to account for a natural preference for odors in the mock-trained group. However, it does not provide sufficient clarification of the results. The findings would be more convincing if these relative scores were unpacked - that is, instead of analyzing difference scores, the results of the interaction between group and odor preference (e.g., novel or air) (or even within the pre- and post-training conditions with the same animals) would provide greater clarity. This more detailed account may also better support the argument that the results are not due to conditioning of the US with pure air.

      We use the PI score as a standard metric to quantify all the odor preference in behavioral assays because it allows for robust comparison across different genetic or treatment groups under the same experimental setting. In T-maze, real time tracking of fly trajectories is technically difficult. With olfactory arenas, we showed some examples of fly distribution in quadrants over the entire odor choice test period (Figure 2—figure supplement 2) for both pre-trained and post-trained groups and discussed the trajectories in Discussion. We will ensure this point is clarified in the revised text.                       

      Reviewer #2 (Public review):

      […] They finally recorded from different mushroom body output neurons, including the one (MBON-γ4γ5) likely affected by the increased activity of the corresponding γ4 reward dopaminergic neurons after shock preexposure. They recorded odour-evoked responses from these neurons before and after shock preexposure, but did not find any plasticity, while they found a logical effect during spaced cycles of aversive training.

      We thank the reviewer for the summary. We would like to clarify that we did, in fact, observe plasticity in MBON-γ4γ5 following shock exposure, as shown in Figure 4B.

      Overall, the study is very interesting with a substantial amount of behavioural analysis and in vivo 2-photon calcium imaging data, but some major (and some minor) issues have to be resolved to strengthen their conclusions.

      (1) According to neuropsychological work (Henson, Encyclopedia of Neuroscience (2009), vol. 7, pp. 1055-1063), « Priming refers to a change in behavioral response to a stimulus, following prior exposure to the same, or a related, stimulus. Examples include faster reaction times to make a decision about the stimulus, a bias to produce that stimulus when generating responses, or the more accurate identification of a degraded version of the stimulus". Or "Repetition priming refers to a change in behavioural response to a stimulus following re-exposure" (PMID: 18328508). I therefore do not think that the effects observed by the authors are really the investigation of the neural mechanisms of priming. To me, the effect they observed seems more related to sensitisation, especially for the activation of sweet-sensing neurons. For the shock effect, it could be a safety phenomenon, as in Jacob and Waddell, 2020, involving (as for sugar reward) different subsets for short-term and long-term safety.

      As noted in our response to Reviewer #1, we plan to revise our use of the term “priming” in the manuscript to more accurately interpret the behavioral phenomenon.

      (2) The author missed the paper from Thomas Preat, The Journal of Neuroscience, October 15, 1998, 18(20):8534-8538 (Decreased Odor Avoidance after Electric Shock in Drosophila Mutants Biases Learning and Memory Tests). In this paper, one of the effects observed by the authors has already been described, and the molecular requirement of memory-related genes is investigated. This paper should be mentioned and discussed.

      We thank the reviewer for bringing this important reference to our attention. We will cite the Preat (1998) paper and discuss its relevant findings in relation to our own in the revised manuscript.

      (3) Overall, the bidirectional effect they observed is interesting; however, their results are not always clear, and the use of a delta PI is sometimes misleading. The authors have mentioned that shocks induced attraction to the novel odour, while they should stick to the increase or decrease in preference/avoidance.

      The ΔPI is calculated either as (trained PI – mock PI) for different animals or as (post PI – pre PI) for the same animals, with the specific calculation clarified in each figure legend. A positive ΔPI signifies an increase in preference for the odor, which is equivalent to a relative attraction or a decrease in avoidance.

      As not all experiments are done in parallel logic, it is not always easy to understand which protocol the authors are using. For example, only optogenetics is used in the appetitive preexposure. Does exposing flies to sugar or activating reward dopaminergic neurons also increase odour avoidance? The observed increased odour avoidance after optogenetic activation of sweet-sensing neurons involve reward (e.g., decreased response) and/or punishment (e.g., increased response) to increase odour avoidance?  

      We used different behavioral assays (T-maze or arena), stimuli (real shock or optogenetics), and protocols (different or same animal groups) to robustly demonstrate the phenomenon across platforms. We explained each protocol in the figures or texts, and we’ll make them clearer to follow in the revised version. We focused on activating a clean set of sugar sensing neurons because this optogenetic stimulus is an effective and efficient substitute to real sugar. We agree that testing reward dopaminergic neuron activation is a logical extension and will consider adding these experiments in the revised work.

      The author should always statistically test the fly behavioural performances against 0 to have an idea of random choice or a clear preference toward an odour.

      Our primary focus is on the change in preference induced by training, rather than the innate odor preference itself, which can be highly variable due to physiological and environmental factors. Statistical testing against 0 for innate preference scores is not standard practice in this specific paradigm, as the critical question is whether a treatment alters behavior relative to a control.

      On the appetitive side, the internal hunger state would play an important role. The author should test it or at least discuss it.

      For appetitive experiments, we always starve the flies on 1% agar for two days prior to behavioral tests to standardize their hunger state. We will consider adding fed flies as control groups in the revised work.

      (4) The authors found a discrepancy between genetic backgrounds; sometimes the same odour can be attractive or aversive.

      We observed minor discrepancies in innate odor preferences across genetic backgrounds, which is a known and common occurrence. Different genotypes and temperatures can result in different baseline PI scores. However, the key finding is that the relative change in odor preference following an aversive stimulus is consistent: it increases the relative preference for an odor compared to air. This sometimes reverses valence (aversion to attraction) and other times simply reduces aversion. Our analysis focuses on this consistent, relative change.

      Different effects between the T-maze and the olfactory arena are found. The authors proposed that: "Punishment priming effect was still not detected, probably due to the insensitivity of the optogenetic arena". This is unclear to me, considering all prior work using this arena. The author should discuss it more clearly.

      The punishment effect with CS+ present was reliably detected in the T-maze (Figure 1A) but was not significant in the olfactory arena (Figure 2—figure supplement 1B-C). We hypothesize that the olfactory arena assay is less sensitive than the T-maze for detecting such subtle behavioral changes. This is evidenced by the fact that even classical odor-shock conditioning yields lower PI in the arena (typically ~0.4) than in the T-maze (~0.8), likely due to the greater distance flies must explore and travel. The higher variance in the arena may therefore mask more modest effects. Here the effect under investigation was induced by optogenetically activating only a small subset of aversive dopaminergic neurons, a stimulus that is likely weaker than full electric shock. This reduced stimulus strength may have contributed to the challenge of detecting a significant effect in the less sensitive arena paradigm.

      They mentioned that flies could not be conditioned with air and electric shock. However, flies could be conditioned with the context + shock, which is changing in the T-maze and not in the optogenetic area.

      While flies can be conditioned to context, during the optogenetic stimulation period in the arena, the light is delivered uniformly across all four quadrants. Therefore, any potential context conditioning would be equivalent across the entire chamber and should not bias the final distribution of flies between the odor and air quadrants during the test, nor affect the calculated PI score.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CAT-tailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue. 

      Your acknowledgment of our study's pioneering elements is greatly appreciated.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated. The conclusions of this paper are mostly well-supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We are grateful for your acknowledgment of our study’s innovative approach and its possible influence on cancer therapy. We sincerely appreciate your valuable feedback. In response, this updated manuscript presents substantial new findings that reinforce our central argument. Moreover, we have broadened our data analysis and interpretation, as well as refined our methodological descriptions.

      Reviewer #2 (Public Review):

      This work explores the connection between glioblastoma, mito-RQC, and msiCAT-tailing. They build upon previous work concluding that ATP5alpha is CAT-tailed and explore how CAT-tailing may affect cell physiology and sensitivity to chemotherapy. The authors conclude that when ATP5alpha is CAT-tailed, it either incorporates into the proton pump or aggregates and that these events dysregulate MPTP opening and mitochondrial membrane potential and that this regulates drug sensitivity. This work includes several intriguing and novel observations connecting cell physiology, RQC, and drug sensitivity. This is also the first time this reviewer has seen an investigation of how a CAT tail may specifically affect the function of a protein. However, some of the conclusions in this work are not well supported. This significantly weakens the work but can be addressed through further experiments or by weakening the text.

      We appreciate the recognition of our study's novelty. To address your concerns about our conclusions, we have revised the manuscript. This revision includes new data and corrections of identified issues. Our detailed responses to your specific points are outlined below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1B, please replace the high-exposure blots of ATP5 and COX with representative results. The current results are difficult to interpret clearly. Additionally, it would be helpful if the author could explain the nature of the two different bands in NEMF and ANKZF1. Did the authors also examine other RQC factors and mitochondrial ETC proteins? I'm also curious to understand why CAT-tailing is specific to C-I30, ATP5, and COX-V, and why the authors did not show the significance of COX-V.

      We appreciate your inquiry regarding the data.  Additional attempts were made using new patient-derived samples; however, these results did not improve upon the existing ATP5⍺, (NDUS3)C-I30, and COX4 signals presented in the figure.  This is possibly due to the fact that CAT-tail modified mitochondrial proteins represent only a small fraction of the total proteins in these cells.  It is acknowledged that the small tails visible above the prominent main bands are not particularly distinct. To address this, the revised version includes updated images to better illustrate the differences. We believe the assertion that GBM/GSCs possess CAT-tailed proteins is substantiated by a combination of subsequent experimental findings. The figure (refer to new Fig. 1B) serves primarily as an introduction. It is important to note that the CAT-tailed ATP5⍺ plays a vital role in modulating mitochondrial potential and glioma phenotypes, a function which has been demonstrated through subsequent experiments.

      It is acknowledged that the CAT-tail modification is not exclusive to the ATP5⍺protein.  ATP5⍺ was selected as the primary focus of this study due to its prevalence in mitochondria and its specific involvement in cancer development, as noted by Chang YW et al.  Future research will explore the possibility of CAT tails on other mitochondrial ETC proteins. Currently, NDUS3 (C-I30), ATP5⍺, and COX4 serve as examples confirming the existence of these modifications. It remains challenging to detect endogenous CAT-tailing, and bulk proteomics is not yet feasible for this purpose. COX4 is considered significant.  We hypothesize that CAT-tailed COX4 may function similarly to the previously studied C-I30 (Wu Z, et al), potentially causing substantial mitochondrial proteostasis stress.  

      Concerning RQC proteins, our blotting analysis of GBM cell lines now includes additional RQC-related factors. The primary, more prominent bands (indicated by arrowheads) are, in our assessment, the intended bands for NEMF and ANKZF1.  Subsequent blotting analyses showed only single bands for both ANKZF1 and NEMF, respectively. The additional, larger molecular weight band of NEMF, which was initially considered for property analysis (phosphorylation, ubiquitination, etc.), was not examined further as it did not appear in subsequent experiments (refer to new Fig. S1C).

      References:

      Chang YW, et al. Spatial and temporal dynamics of ATP synthase from mitochondria toward the cell surface. Communications biology. 2023;6(1).

      Wu Z, et al. MISTERMINATE Mechanistically Links Mitochondrial Dysfunction With Proteostasis Failure. Molecular cell. 2019;75(4).

      (2) In addition to Figure 1B, it would be interesting to explore CAT-tailed mETC proteins in cancer tissue samples.

      This is an excellent point, and we appreciate the question. We conducted staining for ATP5⍺ and key RQC proteins in both tumor and normal mouse tissues. Notably, ATP5⍺ in GBM exhibited a greater tendency to form clustered punctate patterns compared to normal brain tissue, and not all of it co-localized with the mitochondrial marker TOM20 (refer to new Fig. S3C-E). Crucially, we observed a significant increase in NEMF expression within mouse xenograft tumor tissues, alongside a decrease in ANKZF1 expression (refer to new Fig. S1A, B). These findings align with our observations in human samples.

      (3) Please knock down ATP5 in the patient's cells and check whether both the upper band and lower band of ATP5 have disappeared or not.

      This control was essential and has been executed now. To validate the antibody's specificity, siRNA knockdown was performed. The simultaneous elimination of both upper and lower bands upon siRNA treatment (refer to new Fig. S2A) confirms they represent genuine signals recognized by the antibody.

      (4) In Figure 1C and ID, add long exposure to spot aggregation and oligomer. Figure 1D, please add the blots where control and ATP5 are also shown in NHA and SF (similar to SVG and GSC827).

      New data are included in the revised manuscript to address the queries. Specifically, the new Fig 1D now displays the full queue as requested, featuring blots for Control, ATP5α, AT3, and AT20. Our analysis reveals that AT20 aggregates exhibit higher expression and accumulation rates in GSC and SF cells.

      Fig. 1C has been updated to include experimental groups treated with cycloheximide and sgNEMF. Our results show that sgNEMF effectively inhibits CAT-tailing in GBM cell lines, whereas cycloheximide has no impact. After consulting with the Reporter's original creator and optimizing expression conditions, we observed no significant aggregates with β-globin-non-stop protein, potentially due to the length of endogenous CAT-tail formation (as noted by Inada, 2020, in Cell Reports). Our analysis focused on the ratio of CAT-tailed (red box blots) and non-CAT-tailed proteins (green box blots). Comparing these ratios revealed that both anisomycin treatment and sgNEMF effectively hinder the CAT-tailing process, while cycloheximide has no effect.

      (5) In Figure 1E, please double-check the results with the figure legend. ATP5A aggregated should be shown endogenously. The number of aggregates shown in the bar graph is not represented in micrographs. Please replace the images. For Figure 1E, to confirm the ATP5-specific aggregates, it would be better if the authors would show endogenous immunostaining of C-130 and Cox-IV.

      Labels in Fig. 1E were corrected to reflect that the bar graph in Fig. 1F indicates the number of cells with aggregates, not the quantity of aggregates per cell. The presence of endogenous ATP5⍺ is accurately shown. To address the specificity of ATP5⍺, immunostaining for endogenous NUDS3 was conducted. This revealed NUDS3 aggregation in GBM cells (SF and GSC) lacking TOM20, as demonstrated in the new Fig. S3A, B. These findings suggest NUDS3 also undergoes CAT-tailing modification, similar to ATP5⍺.

      (6) Figure 3A. Please add representative images in the anisomycin sections. It is difficult to address the difference.

      We appreciate your feedback. Upon re-examining the Calcein fluorescence intensity data in Fig. 3A, we believe the images accurately represent the statistical variations presented in Fig. 3B. To address your concerns more effectively, please specify which signals in Fig. 3A you find potentially misleading. We are prepared to revise or substitute those images accordingly.

      (7) Figure 3D. If NEMF is overexpressed, is the CAT-tailing of ATP 5 reversed?

      Thank you. Your prediction aligns with our findings. We've added data to the revised Fig. S6A, B, which demonstrates that both NEMF overexpression and ANKZF1 knockdown lead to elevated levels of CRC. This increase, however, was not statistically significant in GSC cells. A plausible explanation for this discrepancy is that the MPTP of GSC cells is already closed, thus any additional increase in CAT-tailing activity does not result in further amplification.

      (8) Figure 3G. Why on the BN page are AT20 aggregates not the same as shown in Figure 2E?

      We appreciate your inquiry regarding the ATP5⍺ blots, specifically those in the original Fig. 3G (left) and 2E (right). Careful observation of the ATP5⍺ band placement in these figures reveals a high degree of similarity. Notably, there are aggregates present at the top, and the diffuse signals extend downwards. Given that this is a gradient polyacrylamide native PAGE, the concentration diminishes towards the top. Consequently, the non-rigid nature of the Blue Native PAGE gel may lead to slight variations in the aggregate signals; however, the overall patterns are very much alike. To mitigate potential misinterpretations, we have rearranged the blot order in the new Fig. 3M.

      (9) Figure 4D. The amount of aggregation mediated by AT20 is more compared to AT3. Why are there no such drastic effects observed between AT3 and AT20 in the Tunnel assay?

      The previous Figure 4D presents the quantification of cell migration from the experiment depicted in Figure 4C. But this is a good point. TUNEL staining results are directly influenced by mitochondrial membrane potential and the state of mitochondrial permeability transition pores (MPTP), not by the degree of protein aggregation. Our previous experiments showed comparable effects of AT3 and AT20 on mitochondria (Fig. 2E, 3K), which aligns with the expected similar outcomes on TUNEL staining. As for its biological nature, this could be very complicated. We hope to explore it in future studies.

      (10) Figure 5C: The role of NEMF and ANKZF1 can be further clarified by conducting Annexin-PI assays using FACS. The inclusion of these additional data points will provide more robust evidence for CAT-tailing's role in cancer cells.

      In response to your suggestion, we have incorporated additional data into the revised version.

      Using the Annexin-PI kit, we labeled apoptotic cells and detected them using flow cytometry (FACS). Our findings indicate that anisomycin pretreatment, NEMF knockdown (sgNEMF), and ANZKF1 upregulation (oeANKZF1) significantly increase the rate of STS-induced apoptosis compared to the control group (refer to new Fig. S9D-G).

      (11) Figure 5F: STS is a known apoptosis inhibitor. Why it is not showing PARP cleavage?

      Also, cell death analysis would be more pronounced, if it could be shown at a later time point. What is the STS and Anisomycin at 24h or 48h time-point? Since PARP is cleaved, it would also be better if the authors could include caspase blots.

      I guess what you meant to say here is "Staurosporine is a protein kinase inhibitor that can induce apoptosis in multiple mammalian cell lines." Our study observed PARP cleavage even in GSCs, which are typically more resistant to staurosporine-induced apoptosis (C-PARP in Fig. S9B). The ratio of C-PARP to total PARP increased. We selected a 180-minute treatment duration because longer treatments with STS + anisomycin led to a late stage of apoptosis and non-specific protein degradation (e.g., at 24 or 48 hours), making PARP comparisons less meaningful. Following your suggestion, we also examined caspase 3/7 activity in GSC cells treated with DMSO, CHX, and anisomycin. We found that anisomycin treatment also activated caspases (Fig. S9A).

      (12) In Figure 5, the addition of an explanation, how CAT-tailing can induce cell death, would add more information such as BAX-BCL2 ratio, and cytochrome-c release from the mitochondria.

      Thank you for your suggestion. In this study, we state that specific CAT-tails inhibit GSC cell death/apoptosis rather than inducing it. Therefore, we do not expect that examining BAX-BCL2 and mitochondrial cytochrome c release would offer additional insights.

      (13) To confirm the STS resistance, it would be better if the author could do the experiments in the STS-resistant cell line and then perform the Anisomycin experiments.

      Thank you. We should emphasize that our data primarily originates from GSC cells. These cells already exhibit STS-resistance when compared to the control cells (Fig. S8A-C).

      (14) It would be more advantageous if the author could show ATP5 CATailed status under standard chemotherapy conditions in either cell lines or in vivo conditions.

      This is an interesting question. It's worth exploring this question; however, GSC cells exhibit strong resistance to standard chemotherapy treatments like temozolomide (TMZ).

      Additionally, we couldn't detect changes in CAT-tailed ATP5⍺ and thus did not include that data.

      (15) In vivo (cancer mouse model or cancer fly model) data will add more weight to the story.

      We appreciate your intriguing question. An effective approach would be to test the RQC pathway's function using the Drosophila Notch overexpression-induced brain tumor model. However, Khaket et al. have conducted similar studies, stating, "The RNAi of Clbn, VCP, and Listerin (Ltn), homologs of key components of the yeast RQC machinery, all attenuated NSC over-proliferation induced by Notch OE (Figs. 5A and S5A–D, G)." This data supports our theory, and we have incorporated it into the Discussion. While the mouse model more closely resembles the clinical setting, it is not covered by our current IACUC proposal. We intend to verify this hypothesis in a future study.

      Reference:

      Khaket TP, Rimal S, Wang X, Bhurtel S, Wu YC, Lu B. Ribosome stalling during c-myc translation presents actionable cancer cell vulnerability. PNAS Nexus. 2024 Aug 13;3(8):pgae321.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1B, C: To demonstrate that Globin, ATP5alpha, and C-130 are CAT-tailed, it is necessary to show that the high mobility band disappears after NEMF deletion or mutagenesis of the NFACT domain of NEMF. This can be done in a cell line. The anisomycin experiment is not convincing because the intensity of the bands drops and because no control is done to show that the effects are not due to translation inhibition (e.g. cycloheximide, which inhibits translation but not CAT tailing). Establishing ATP5alpha as a bonafide RQC substrate and CAT-tailed protein is critical to the relevance of the rest of the paper.

      Thank you for suggesting this crucial control experiment.

      To confirm the observed signal is indeed a bona fide CAT-tail, it's essential to demonstrate that NEMF is necessary for the CAT-tailing process. We have incorporated data from NEMF knockdown (sgNEMF) and cycloheximide treatment into the revised manuscript. Our findings show that both sgNEMF and anisomycin treatment effectively inhibit the formation of CAT-tailing signals on the reporter protein (Fig. 1C). Similarly, NEMF knockdown in a GSC cell line also effectively eliminated CAT-tails on overexpressed ATP5⍺ (Fig. S2B).

      In general, the text should be weakened to reflect that conclusions were largely gleaned from artificial CAT tails made of AT repeats rather than endogenously CAT-tailed ATP5alpha. CAT tails could have other sequences or be made of pure alanine, as has been suggested by some studies.

      Thank you for your reminder. We have reviewed the recent studies by Khan et al. and Chang et al., and we found their analysis of CAT tail components to be highly insightful. We concur with your suggestion regarding the design of the CAT tail sequence. We aimed to design a tail that maintained stability and resisted rapid degradation, regardless of its length. In the revised version, we clarify that our conclusions are based on artificial CAT tails, specifically those composed of AT repeat sequences (p. 9). We acknowledge that the presence of other sequence components may lead to different outcomes (p. 19).

      Reference:

      Khan D, Vinayak AA, Sitron CS, Brandman O. Mechanochemical forces regulate the composition and fate of stalled nascent chains. bioRxiv [Preprint]. 2024 Oct 14:2024.08.02.606406. Chang WD, Yoon MJ, Yeo KH, Choe YJ. Threonine-rich carboxyl-terminal extension drives aggregation of stalled polypeptides. Mol Cell. 2024 Nov 21;84(22):4334-4349.e7. 

      Throughout the work (e.g. 3B, C), anisomycin effects should be compared to those with cycloheximide to observe if the effects are specific to a CAT tail inhibitor rather than a translation inhibitor.

      We agree that including cycloheximide control experiments is crucial. The revised version now incorporates new data, as depicted in Fig. S5A, B, illustrating alterations in the on/off state of MPTP following cycloheximide treatment. Furthermore, Fig. S6A, B present changes in Calcium Retention Capacity (CRC) under cycloheximide treatment. The consistency of results across these experiments, despite cycloheximide treatment, suggests that anisomycin's role is specifically as a CAT tail inhibitor, rather than a translation inhibitor.

      Line 110, it is unclear what "short-tailed ATP5" is. Do you mean ATP5alpha-AT3? If so this needs to be introduced properly. Line 132: should say "may indicate accumulation of CAT-tailed protein" rather than "imply".

      We acknowledge your points. We have clarified that the "short-tailed ATP5α" refers to ATP5α-AT3 and incorporated the requested changes into the revised manuscript.

      Figure 1C: how big are those potential CAT-tails (need to be verified as mentioned earlier)?

      They look gigantic. Include a ladder.

      In the revised Fig. 1D, molecular weight markers have been included to denote signal sizes. The aggregates in the previous Fig. 1C, also present in the control plasmid, are likely a result of signal overexposure. The CAT-tailed protein is observed just above the intended band in these blots. These aggregates have been re-presented in the updated figures, and their signal intensities quantified.

      Line 170: "indicating that GBM cells have more capability to deal with protein aggregation".

      This logic is unclear. Please explain.

      We appreciate your question and have thoroughly re-evaluated our conclusion. We offer several potential explanations for the data presented in Fig. 1D: (1) ATP5α-AT20 may demonstrate superior stability. (2) GSC (GBM) cells might lack adequate mechanisms to monitor protein accumulation. (3) GSC (GBM) cells could possess an increased adaptive capacity to the toxicity arising from protein accumulation. This discussion has been incorporated into the revised manuscript (lines 166-169).

      Line 177: how do you know the endogenous ATP5alpha forms aggregates due to CAT-tailing? Need to measure in a NEMF hypomorph.

      We understand your concern and have addressed it. Revised Fig. 3G, H demonstrates that a reduction in NEMF levels, achieved through sgNEMF in GSC cells, significantly diminishes ATP5α aggregation. This, in conjunction with the Anisomycin treatment data presented in revised Fig. 3E, F, confirms the substantial impact of the CAT-tailing process on this aggregation.

      Line 218: really need a cycloheximide or NEMF hypomorph control to show this specific to CAT-tailing.

      We have revised the manuscript to include data from sgNEMF and cycloheximide treatments, specifically Fig. 3G, H, and Fig. S5C, D, as detailed in our response above.

      Lines 249,266, Figure 5A: The mentioned experiments would benefit from controls including an extension of ATP5alpha that was not alanine and threonine, perhaps a gly-ser linker, as well as an NEMF hypomorph.

      We sincerely appreciate your insightful comments. In response, the revised manuscript now incorporates control data for ATP5α featuring a poly-glycine-serine (GS) tail. This data is specifically presented in Figs. S2E-G, S4E, S7A, D, E, and S8F, G. Our experimental findings consistently demonstrate that the overexpression of ATP5α, when modified with GS tails, had no discernible impact on protein aggregation, mitochondrial membrane potential, GSC cell mobility, or any other indicators assessed in our study.

      Figure S5A should be part of the main figures and not in the supplement.

      This has been moved to the main figure (Fig. 5C).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobaccoflavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG:VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further subclustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      Weaknesses:

      The authors included a valuable control group - the PG:VG group, since PG:VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG:VG group, and the PG:VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology, but similar effects were not corroborated by the singlecell data.

      We thank the reviewer for this query. We agree that PG:VG group is the foundation of the e-liquid formulation and hence comparisons with this group are of significance to understand the effect of individual flavors on the cell population. Though we compared the flavored e-cig groups with PG:VG group, we did not discuss it in detail within the manuscript to avoid confusions in interpretation for this study. However, we have now included the comparisons with the PG:VG group as a Supplement File S13-S18 in our revised manuscript to facilitate proper interpretation of our omics data to interested readers.

      While we agree that flavor-specific effects might be of interest, we did not delve into exploring them in detail as the fruit flavor e-liquids have now been regulated/banned from sale in the US. Thus, from regulatory point of view, the effects of tobacco-flavored e-liquids hold most interest. Since at the time of conducting this study, fruit flavors were in the market, we have still included the data. However, studying it further was not the focus of this work.

      The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

      We agree with the reviewer that the loss of Ly6G on neutrophils is a very interesting finding and we have designed a neutrophil specific experiment to study the impact of e-cig exposure on neutrophil maturation and function which will be discussed in subsequent work by our group. To address the concerns raised by the reviewer, we stained the lung tissue samples from air-and tobacco flavored e-cig aerosol exposed mouse lungs with Ly6G and S100A8 (universal marker for neutrophil) to see the infiltration of Ly6G+ vs Ly6G- neutrophils within the lungs of exposed and unexposed mice. Results from this study showed that exposure to tobacco-flavored e-cig aerosol affects the neutrophil population within the mouse lungs. In fact, the changes were more pronounced for female mice. The data have now been shown in Figure 4.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analyzed and how many cells in each group were compared.

      We thank the reviewer for recognizing the strengths of this manuscript while pointing out the errors to allow us to improve our analyses. We understand that the low number of replicates in this work makes the analyses difficult to draw solid conclusions, but this was a pilot study to identify the changes in the mouse lung upon acute exposures to flavored e-cig aerosols at a single cell level. So far, the e-cig field has been primarily focused on conducting toxicological studies to help regulatory bodies to set standards and enforce laws to better regulate the manufacture, sale and distribution of e-cig products. However, adolescents and young adults are still getting access to these products, and there is little to no understanding of how this may affect the lung health upon acute and chronic exposures. Single cell technology is a powerful tool to analyze the gene expression changes within cell populations to study cell heterogeneity and function. Yet, it is a costly tool owing to which conducting such analyses on large sample sizes is not ideal. This pilot study was designed to get some initial leads for our future studies involving larger sample sizes and chronic exposures. However, due to the vast information that is provided by a single cell RNA sequencing experiment, we intend to share it with a larger audience to support research and further study in this area. We understand that the validations are limited in our current work and so we have now conducted coimmunostaining to validate the Ly6G+ and Ly6G- neutrophil population. We have now included single cell findings with the validating experiments using classical methods of experimentation including ELISA, immunostaining or flow cytometry and revamped the whole manuscript. However, it is important to mention that such validations are sometimes challenging as many of these techniques still investigate the tissue while the changes shown in single cell analyses are mainly pertaining to a single cell type. This could be well-understood by looking at the flow cytometry results for neutrophils where we use Ly6G as a marker to stain for neutrophils which is only found in mature neutrophil population.

      Only 71,725 cells mean only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      We do agree that the number of cells could be too low. To avoid this, we did not study gene expression variations at the finest level of cell identity. We classified the cell clusters into general annotations -myeloid, lymphoid, endothelial, stromal and epithelial- and identified the changes in the gene expressions. Of these, only two clusters (myeloid and lymphoid) with more than ~1000 cells per cell type per group were studied in detail. We have included the cell count information to allow better interpretation of our results in the revised manuscript. For a single cell point of view, a cell count of ~3500 each with over 20000 features (genes) has good statistical strength and merit in our opinion.

      The dynamic range of RNA measurement using scRNA seq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      This is a well-taken point, and we thank the reviewer for this comment. We agree that the dynamic range RNA measurement is limited low cell numbers that could lead to bias. However, none of the clusters with counts lower than 150 were included for differential gene analyses. To avoid confusion, we now show immunofluorescence results to validate the findings. We are certain that with the inclusion of these validation experiments, will convince the reviewer about the loss of Ly6G marker from neutrophils and lack of proper neutrophilic response in exposed mouse lungs as compared to the controls.

      There is no rigorous quantification of Ly6G+ and Ly6G- cells int he flow cytometry data.

      We understand that flow-based quantification of our scRNA seq findings would be interesting. However, flow cytometry and single cell suspension to perform sequencing were performed parallelly for this study. We used a basic flow panel using single markers to identify individual immune cell type. We did identify changes in the Ly6G population in our treated and control samples using scRNA seq and intend to exclude it as a marker for our future studies using flow cytometry. Unfortunately, the same analyses could not be performed for the current batch of samples. We have now included results from IHC staining to identify the Ly6G+ and Ly6G- population in the lung tissues from control and treated mice in revised manuscript to address some of the concerns raised here. 

      Eosinophils are heavily involved in lung biology but are missing from the analysis.

      We use RBC lysis buffer to remove the excess RBCs during lung digestion for preparation of single cell suspension for scRNA seq in this study. Reports suggest that RBC lysis could adversely affect the eosinophil number and function. We did not identify any cell cluster, representing markers for eosinophils through our scRNA seq data and we believe that our lung digestion protocol could be the reason for it. We have studied the eosinophil changes through flow cytometry in these samples and have found significant changes as well. However, due to our inability to find cell clusters for eosinophil through scRNA seq data, we did not include these results in the final manuscript previously. To avoid confusion and maintain transparency, we have now included the changes in eosinophils through flow cytometry in revised manuscript (Figure S4).

      The figures had no titles so were difficult to navigate.

      We have now revamped the figures to make it easier for the readers to navigate.

      PGVG is not defined and not introduced early enough.

      We have made the necessary changes in the revised manuscript.

      Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      We have now removed the cell cycle scoring information from the revised manuscript. Performing BrDU assay was not possible for these tissues due to limited samples and resources. However, we may consider performing it in our future studies.

      It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      We have now included the two-way ANOVA statistics (Supplementary File S3) for other data included in the revised manuscript. It is important to note that since we did not identify any significant changes upon two-way ANOVA, the interaction statistics were not available for the abovementioned statistical test. We have included the interaction information wherever available.

      Many statistics are only vs air control, but it would be more useful as a flavor comparison to see these vs PGVG. In some cases, the carrier PGVG looks worse than some of the flavors (which have nicotine).

      While we agree with this comment of the reviewer, comparisons with PG:VG were not included due to the low cell numbers for PG:VG samples obtained following quality control and filtering of scRNA seq analyses.  However, considering the reviewer’s question we still include the details of comparisons with PG:VG included as supplementary files S13-S18 in the revised manuscript.

      The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavor vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions.

      While we agree in part with the concern raised here. In our opinion, an omics study is not necessarily aimed at finding the changes at transcript level with absolute certainty, but rather to identify probable cell and gene targets to validate with subsequent work. We did not claim that our findings are absolute outcomes but rather add the limitation of sample number and need for further research at every step. The strength of this work is to be the first study of its kind looking at changes in the lung cell population at single cell level upon e-cig aerosol exposure. This study has provided us with interesting gene and cell targets that we are now validating with future work. We still strongly believe that a dataset like this is a useful resource for a wider audience.  

      The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      We have now changed the format of data representation in the figure.

      Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

      Wherever needed, the n number has been included in the Figure legend. Additionally, the n number is shown in Figure 1A. However, with respect to the second comment we would like to differ from the reviewer’s opinion. Each scRNA seq data had 2 samples – one for male and another for female which has been clearly shown in the current figures. The pooling of cells as mentioned in the comment happened at the stage of preparation of cell suspension from each sex/group at the start of the sequencing. We show the results of the pooled sample showing the variability amongst pooled samples, which we acknowledge is a shortcoming of our work. In terms of representation of the heat maps and data analyses we have included all the needed information to uphold transparency of our study design and data visualization for each figure and would like to stick to the current representations. However, validation cohort does not involve any pooling of sample and still agrees with most of the deductions made from this study. So we are confident that no over statements have been made in this work and we still provide a useful dataset to inform future research in this area.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-downregulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4)Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      This study was not designed to study the effects of chronic exposures on lung tissues. We were interested in delineating the effect of acute exposures for which the proposed study design was chosen. Previous work by our group has performed similar exposures and has been well received by the community. We understand that chronic exposures will be interesting to look at, but that was beyond the scope of this pilot study. Longer / chronic exposures will be conducted considering disease modifying effects of e-cigarettes.

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We thank the reviewer for this observation, and we have now included the necessary validations and details of the sex-based statistical analyses in the revised version of this manuscript. 

      Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      We thank the reviewer and have included all the necessary statistical details with more details in the revised manuscript.

      Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      As pointed out by the reviewers themselves the strength of this work is in the first ever scRNA seq analyses of mice exposed to differently flavored e-cig aerosols in vivo. We also show cellspecific differential gene expressions and address some of the major queries made around e-cig research including release of metals on a day-to-day basis from the same coil. The limited sample number makes it difficult to draw solid conclusions from this work, which has been discussed as a shortcoming. Nevertheless, the major strength of this work is not in identifying specific trends, but rather to determine the possible cell and gene targets to expand the study for longer (chronic) exposures with a larger sample group. We have mentioned the significance of the study with respect to vaping effects on cellular heterogeneity leading to deleterious effects.

      The manuscript lacks validation of findings in tissue by other methods such as staining.

      We have now included some validation experiments and revamped the revised manuscript to support scRNA seq findings.

      This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for this observation. The cell numbers for some cell clusters (especially epithelial cells) were too low. So, though we have performed the differential gene expression analyses on all the cell clusters, we refrained from discussing it in the manuscript to avoid over interpretation of our results. Only clusters with high enough (> 150) cells per sex per group were used to plot the heatmaps. We have now included the cell numbers for each cell type in the revisions to allow better interpretation of our data. Furthermore, the raw data from this study will be freely available to the public upon publication of this manuscript. This would enable the interested readers to access the raw data and study the cell types of interest in detail based on their study requirements. This data will be a useful resource for all in this community to inform and design future studies. 

      Recommendation For The Author:

      Major comments

      Mouse experiments are extremely variable and an n of 2 is not enough. Because of the complexity of separating male and female mice, the analyses are not adequately powered to support conclusions. The two-way ANOVA style approach to consider sex as a separate variable was a great idea in Table S2 - but this was not used elsewhere, and there is a need to show the interaction statistic (which would say if there is a flavor effect dependent on sex).

      We thank the reviewers for this recommendation. We agree that the experiments are highly variable. However, it is not merely an outcome of a small sample size (which we address as one of the limitations). What is important to mention here is the fact that validating results from single cell technologies using regular molecular biology techniques is challenging and may not completely align. It is because we are comparing single cell population in the former and a heterogeneous cell population in latter. However, considering this comment, we have now toned down our conclusions and performed some extra experiments to validate single cell findings. We also provide the results from two-way ANOVA statistics for all the figures/experiments performed in this work. 

      More validatory data with PCR, immunostaining, and flow cytometry would be very helpful. This includes validating the neutrophil functional and phenotype data and the T-cell data by flow cytometry.

      To validate the presence of Ly6G+ and Ly6G- neutrophil population, we performed coimmunostaining experiments and proved that exposure to tobacco-flavored e-cig aerosols results in increase in cell percentages of two neutrophil population in female mice. We also re-analyzed our Flow cytometry data to align with scRNA seq results. Multiplex protein assay was another technique used to show altered innate/adaptive immune responses upon exposure to differently flavored e-cig aerosol. Of note, considering the short duration of exposure we did not identify significant changes in cell numbers or inflammatory responses. But we have now validated our scRNA seq results using various techniques to draw meaningful conclusions.

      The in vivo experimental design seems to model very short-term exposure. In the literature, including the papers cited in the references, much longer time points are used, extending from several weeks to months of exposure. There seem to be few examples of papers using 5-day exposure and those that do are inspired by traditional cigarette smoke rather than e-cig aerosols or model acute exposure by making the daily duration longer. It is important to consider the possibility that the greatest number of up- or down-regulated genes are found in immune cell populations solely because they are the first to be affected by e-cig exposure and the other cell types just do not have time to become dysregulated in 5 days.

      We thank the reviewers for this comment. We do not refute the fact that our observations of major changes in the immune cell population are due to the short duration of exposure. This was one of the first studies using single cell technologies to look at cell specific changes in the mouse lungs exposed to e-cig aerosols. However, the future experiments being conducted in our lab are using more controlled approach to mimic chronic exposures to e-cig aerosols to identify changes in other cell types and long-term effects of e-cig exposures in vivo. However, since this was not the focus of this work, we have not discussed it in detail.

      The validity of the claims pertaining to septal thickening and mean linear intercept (MLI) are questionable due to the poor lung inflation of the treatment group, which the authors acknowledge. Thus, MLI cannot be accurately used. It is contradictory to state that the fruit-flavored treatment group presented challenges with inflation but then concluded that there is a phenotype. In addition, inflation with low-melting agarose is not an ideal method because it does not use a liquid column to maintain constant pressure. For these metrics to be used and evaluated, it is imperative that all lobes are properly inflated. Therefore, these data should either be repeated or removed.

      We agree with this critique and have removed the MLI quantification from the revised manuscripts, we also do not make claims regarding much histological changes upon exposure. We suggest further work in future to get better understanding of the effect of differently flavored e-cig aerosol exposure on mouse lungs.

      What is the purpose of analyzing cell cycle scores? Why is it relevant that neutrophils are in G2M-phase? Figure 3B shows that neutrophils are clearly in both G1- and G2M-phase and this cluster includes both Ly6G+ and Ly6G- subsets, so it does not seem accurate to claim that they are in the G2M-phase of the cell cycle, nor does it reveal anything novel about Ly6G- neutrophils. Is it possible that the cell cycle score is noting a point in differentiation when neutrophils acquire/begin expressing Ly6G? Ly6G expression in neutrophils has been found to be associated with differentiation and maturation. To rule out the possibility that this is a cell state being identified, differential gene expression between the 2 neutrophil subsets should be shown in a volcano plot. It would also be useful to stain for Ly6G+/- neutrophils using either IF or RNAscope to prove they are present. If the claim is that Ly6G- neutrophils are a "unique" population, it must be established to what extent they are unique. Immune cells cluster together on UMAPs, so what if these are a different cell type entirely, like another immature myeloid lineage, and this is an artifact of clustering? This could be clarified with a trajectory analysis and further subsetting of the immune population.

      We thank the reviewers for this comment. We now realize that analyzing the cell cycle scores was not serving the intended purpose in this work. Moreover, due to the use of pooled samples for scRNA seq analyses, it may not be best to perform such downstream analyses in our datasets. We have thus removed these graphs from the revised version and have tried to simplify the conclusions of our study to the readers. 

      Our main take home from this study is the increase in number of mature (Ly6G+) and immature (Ly6G-) neutrophils in tobacco-flavored e-cig aerosol exposed mouse lungs as compared to air control. This result was validated using co-immunofluorescence in the revised manuscript (Figure 4).

      In vivo validation of findings should be included, especially for the claimed changes. As of now, this paper serves more as a dataset that could be further explored by other groups, which in itself is valuable, but it is just one single cell sequencing experiment without validation.

      We thank the reviewers for this comment. We have used multiple techniques (flow cytometry, multiplex protein assay, co-immunofluorescence) in the revised manuscript to validate the scRNA seq findings. However, this was a preliminary study which was designed to generate a small dataset for future experiments, and we do not have resources to add more validatory experiments for this study. We are currently designing chronic e-cig exposure studies to elaborate upon certain hypothesis generated through this study in future.

      Minor Comments

      There are several examples of typos or small errors in the text that would benefit from proofreading. Examples: line 51 "in the many countries including (the) United States (US), (the) United Kingdom..."; on line 54, the reference cited states that 9.4% of middle schoolers are daily users, not 9.2%; on line 55 the reference cited states that these are the most commonly used flavors, not the most preferred, which explains why the percentages do not add up to 100; line 120 "the lungs were in a collapsed state than the other groups"; line 127 "to confirm out speculations"; line 136 "PGVG" instead of the previously used "PG:VG"; line 140 "(single cell capture))"; line 999 "result in" rather than "results in" for Figure 4 title, etc.

      We thank the reviewer for this comment. The manuscript has been thoroughly proofread and edited to avoid typos and grammatical errors.

      If this is a "pilot study" (as it is stated in the introduction) it is meant to assess the validity of experimental design on a small scale to later test a hypothesis. The authors should change the phrasing.

      We have now changed the phrasing as suggested.

      The introduction lacked the necessary context and background. Some information described in the results section could be addressed in the intro. For example: What is the significance of neutrophils having a Ly6G deficiency? Why was the exposure duration of 1 hour a day for 5 days chosen? Why use nose-only exposure when many models use whole-body exposure? Why look at cell-type-specific changes?

      We have made the necessary amendments in the introduction.

      Some figure titles only address certain panels rather than summarizing the figure as a whole. For example, the title of Figure 1 only refers to panel D and is unrelated to serum cotinine levels, septa thickening, or mean linear intercept. The text discussed conclusions about septa thickening and Lm values for the fruit-flavored treatment group, so they are equally relevant to the figure compared to the metal levels.

      We have now changed the Figures and Figure legends to summarize the figure.

      significance level is not defined in Figure 1 legend although it is used in Figure 1C.

      The Figure legend has now been updated.

      Figure 1E does not include a scale bar.

      We have now included the scale bar in updated figures.

      The multiplex ELISA shown in the experimental design schematic is not further discussed in the paper. Flow cytometry plots should be displayed in addition to the data they generated.

      The flow cytometry plots have now been included (Figures 3&5) and the results for Multiplex ELISA are shown as Figure S3D and lines 327-342 of the revised manuscript.

      In Figure 1F, a multivariate ANOVA should be used so that multiple groups can be compared across sex, rather than plotting in a sex-specific manner and claiming there exists a sex bias. The small sample size also introduces an issue because a p-value cannot be generated with so few samples.

      Per the suggestions made previously, figure 1F has now been removed from the revised manuscript.

      The protocol for achieving a single-cell suspension should be detailed in the methods section. As is, it only describes the sample collection and preparation. This could help elucidate to the reader why the UMAP shows such a large abundance of immune cells.

      We have now included the protocol in the revised manuscript.

      Clarify whether PG:VG was used as a control in the scRNA sequencing in addition to air to generate the UMAP in Figure 2A.

      Yes, PG:VG was used as one of the controls which has now been illustrated as groupwise comparison in Figure 2D. We have also included the comparisons to identify DEGs in myeloid and lymphoid clusters upon comparison of various treatment groups versus PGVG (Supplementary Files S13-S18)

      A UMAP should be shown for each treatment group/flavor. The overall UMAP in Figure 1A is good, but there could be another panel with separate projections for each condition.

      A groupwise UMAP has now been included in Figure 2D.

      In Figure 2C, relative cell percentage is not a reliable method to quantify cell type and the histogram is not a great way to visualize the data or its statistical significance. These claims should also be validated in tissue.

      We thank the reviewers for this comment and have tried to validate the findings using Flow cytometry. However, we may want to add that the changes observed in single cell technologies cannot be validated using simple molecular biology techniques as the markers used to specify cell clusters in scRNA seq is too specific which was not the case for the design of flow panel in this work. Our major purpose of using cell percentages was to show the flavor-specific changes in generalized cell populations in mouse lungs. So, we have still included these graphs in the revised manuscript.

      Figure 2D could be better illustrated with a volcano plot to show which genes are being dysregulated rather than just how many. Knowing which genes are affected is more valuable than knowing just the number of genes.

      Figure 2D is no longer a part of the revised manuscript. For the other comparisons we have still used heatmaps as they also depict sex-specific changes in gene expressions, which would have been difficult to elucidate using volcano plots.

      Assuming Figure 3C is representative of all conditions, then Figures 3C and D demonstrate that Ly6G- neutrophils are present in all conditions including controls. To see whether they are truly present in different abundances between treatment and control groups, separate UMAPs of the neutrophil subsets should be made per condition or use a dot plot for Figure 3A. This also applies to Figure 3B.

      We thank the reviewers for pointing this out. We have now revamped the whole manuscript and used additional validation experiments to show the presence of Ly6G- and Ly6G+ neutrophil population upon exposure to tobacco-flavored e-cig aerosols. 

      Figure 3E shows that there is no statistically significant change in % of Ly6G+ neutrophils across treatment groups, but the text claims that there is "an increase in the levels of Ly6G+ neutrophils in lung digests from mouse lungs exposed to tobacco-flavored e-cig aerosols" (lines 207-209). The text also claims that "The observed increase was more pronounced in males as compared to females" (lines 209-210), but there was no statistical analysis across sexes to support this statement. It is clear that the change in % of Ly6G+ neutrophils is more pronounced in males than females, but it is still not statistically significant. This figure should also be repeated for analysis of Ly6G- neutrophils. Lines 272-274 mention that the % increase is higher for Ly6G- neutrophils than for Ly6G+ neutrophils, but there is not an analogous histogram to demonstrate this. The claims made in lines 275-280 are not clearly shown in any figure.

      We thank the reviewer for this query. This was an error on our part. We have now added sex-specific changes using scRNA seq, flow cytometry and co-immunofluorescence-based experiments to prove that more pronounces changes in the Ly6G+ and Ly6G- neutrophil population occurs in female mice and not males.

      Figures 4 and 6 have an overwhelming amount of heatmaps. Volcano plots with downstream analyses could be used to make some of this data more legible. The main findings should be validated in vivo/in tissue.

      We have now revamped the figures and data distribution to make the data legible and remove overwhelming amount of data from the slides.

      For Figure 5, show cell type by condition and do differential gene expression analysis displayed in a volcano plot. Then, stain tissue to validate the findings. Compare across sex during statistical analysis.

      The necessary changes have been made.

      Figure 6 error: panels E and F should be labeled as "tobacco" rather than "fruit".

      Error has now been fixed.

      Figure 7C can be placed in the supplemental materials.

      It has now been included in supplemental materials.

      The Figure 6E title should have been tobacco instead of fruit.

      This error has now been fixed.

      Line 381 mentioned the wrong subfigure. (Figure 7B instead of 7E).

      We have now made the necessary edits.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors. 

      Strengths: 

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. 

      We are pleased to hear that reviewer 1 noted the novelty and importance of our work. As reviewer 1 mentioned, we are also excited about the identification of a subcluster of companion cells with very high FT expression. We believe that this work is an initial step to describe the molecular characteristics of these FT-expressing cells. We are also excited to share our new findings on NIGT1s as potential FT regulators. We believe this finding will attract a broader audience, as the molecular factor coordinating plant nutrition status with flowering time remains largely unknown despite its well-known phenomenon.

      Weaknesses: 

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all. 

      We agree with this comment, as it was not our intention to sound like that FT is not produced in other companion cells than the subpopulation we identified. We revised the title to more accurately reflect the point. The new title is “Companion cells with high florigen production express other small proteins and reveal a nitrogen-sensitive FT repressor.”

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      We appreciate this comment; we noticed that we did not clearly explain the rationale for using single-nucleus RNA sequencing (snRNA-seq) instead of single-cell RNA-seq (scRNA-seq). As reviewer 1 mentioned, RNA abundance in scRNA-seq is higher than in snRNA-seq. To conduct scRNA-seq using plant cells, protoplasting is the necessary step. However, in our study, protoplasting has many drawbacks in isolating our target cells from the phloem. First, it is technically challenging to efficiently isolate protoplasts from highly embedded phloem companion cells from plant tissues. Typically, at least several hours of enzymatic incubation are required to obtain protoplasts from companion cells (often using semi-isolated vasculatures), and the efficiency of protoplasting vasculature cells remains low. Secondly, for our analysis, restoring the time information within a day is also crucial. Therefore, we employed a more rapid isolation method. In the revision, we will explain our rationale for choosing snRNA-seq due to the technical limitations. In the revised manuscripts, we added four new sentences in the Introduction section to clearly explain these points.

      Reviewer 1 also raised a concern about the quality of our snRNA-seq data, referring to the relatively low readcounts per nucleus. Although we believe that shallow reads do not necessarily indicate low quality and are confident in the accuracy of our snRNA-seq data, as supported by the detailed follow-up experiments (e.g., imaging analysis in Fig. 4B), we agree that it is important to address this point in the revision and alleviate readers’ concerns regarding the data quality. 

      We believe the primary reason for the low readcounts per cell is the small amount of RNA present in each Arabidopsis vascular cell nucleus that we isolated. For bulk nuclei RNAseq, we collected 15,000 nuclei. However, the total RNA amount was approximately 3 ng. It indicates that each nucleus isolated contains a very limited amount of RNA (by the simple calculation, 3,000 pg / 15,000 nuclei = 0.2 pg/nucleus). It appears that the size of cells and nuclei was still small in 2-week-old seedlings; thus, each nucleus may contain lower levels of RNA. During the optimization process, we also tried to fix the tissues that we hoped to restore nuclear retained RNA, but unfortunately, in our hands, we encountered the technical issue of nuclei aggregation that hindered the sorting process, which is not suitable for single-nucleus RNA-seq.

      Reviewer 1 suggested that we repeat the same snRNA-seq experiment. We agree that having more cells increases the reliability of data. However, to our knowledge, higher cell numbers enhance the confidence of clustering, but not readcounts per cell. In our snRNAseq data, our target, FT-expressing cells, were observed in cluster 7, which projected at an obvious distance from other cell clusters. Therefore, we think that having more nuclei does not significantly help in separating high FT-expressing cluster 7 cells and different types of cells, although we may obtain more DEGs from the cluster 7 cells. Considering the costs and time required for additional snRNA-seq experiments, we think that adding more followup molecular biology experiment data would be more practical. We clearly stated the limitations of our approach in the Discussion section. “A drawback of our snRNA-seq analysis was shallow reads per nucleus. It appears mainly due to the low abundance of mRNA in nuclei from 2-week-old leaves. Based on our calculation, the average mRNA level per nucleus is approximately 0.2 pg (3,000 pg mRNA from 15,000 sorted nuclei). Future technological advance is needed to improve the data quality“

      In this revised version of the manuscript, we silenced FT gene expression using an amiRNA against FT driven by tissue-specific promoters [pROXY10, cluster 7; pSUC2, companion cells; pPIP2.6, cluster 4 (for the spatial expression pattern of PIP2.6, please see the new data shown in Fig. S8F); pGC1, guard cells]. Given that both FT and ROXY10 were highly expressed in cluster 7 of our snRNA-seq dataset, we anticipated the late flowering phenotype of pROXY10:amiRNA-ft. As we expected, pROXY10:amiR-ft but not pPIP2.6:amiR-ft lines showed delayed flowering phenotypes (Fig. S14A), supporting the validity of our snRNA-seq approach. We are also now more confident in the resolution of our snRNA-seq analysis, since cluster 4-specific PIP2.6 did not cause late flowering despite its higher basal expression than ROXY10 (Fig. S14B).

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed? 

      In the original manuscript, as we showed only limited spatial images of overlap between FT and other cluster 7 genes in Fig. 4B, this comment is totally understandable. To respond to it, we added whole leaf images showing the spatial expression of FT and other cluster 7 genes (Fig. S12). These data indicate that cluster 7 genes including FT are expressed highly in minor veins in the distal part of the leaf but weakly in the main vein. We also added enlarged images of spatial expression of FT and cluster 7 genes (FLP1 and ROXY10) to note that those genes do not overlap completely (Fig. S13).

      In contrast to cluster 7 genes, genes highly expressed in cluster 4, such as LTP1 and MLP28, are reportedly highly expressed in the main leaf vein. To further confirm it, we established a transgenic line that expresses a GFP-fusion protein controlled by the promoter of a cluster 4-specific gene PIP2.6 (Fig. S8F). It also showed strong GFP signals in the main vein, consistent with previous observations of LTP1 and MLP28.   In summary, FT-expressing cells (cluster 7 cells) are enriched in companion cells in the minor vein, and their expression patterns show a clear distinction from genes expressed in the main vein (e.g., cluster 4-specific genes). 

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions? 

      We agree with reviewer 1 that more experiments are required to conclude the role of NIGT1 on FT regulation, in addition to our Y1H data, flowering time data of NIGT1 overexpressors, and FT expression in NIGT1 overexpressors and nigtQ mutant.

      First, to test the direct regulation of NIGT1s on FT transcription, we conducted a transient luciferase (LUC) assay in tobacco leaves using effectors (p35S:NIGT1.2, p35S:NIGT1.4, and p35S:GFP) and reporters [pFT:LUC (FT promoter fused with LUC) and pFTm:LUC (the same FT promoter with mutations in NIGT1-binding sites fused with LUC)]. Our result showed that NIGT1.2 and NIGT1.4, but not GFP, decreased the activity of pFT:LUC but not pFTm:LUC (Fig. 5C). This indicates that NIGT1s directly repress the FT gene.

      Second, to address reviewer 1’s suggestion about the effect of of nigtQ mutation on flowering time, we have grown WT and nigtQ plants on 20 mM and 2 mM NH<sub>4</sub>NO<sub>3</sub>. Under 20 mM NH<sub>4</sub>NO<sub>3</sub>, the nigtQ line bolted at earlier days than WT; under 2 mM NH<sub>4</sub>NO<sub>3</sub>, nigtQ and WT bolted at almost same timing (Fig. S17D and E). This result suggests that the nigtQ mutation affects flowering timing depending on nitrogen nutrient status. However, leaf numbers of bolted plants were not different between WT and nigtQ lines (Fig. S17E). Therefore, it appears that nigtQ mutation also accelerated overall growth of plants rather than flowering promotion. We also have measured flowering time by counting leaf numbers of the nigtQ and WT plants at bolting on nitrogen-rich soil. The mutant generated slightly more leaves than WT when they flowered (Fig. S17G). These results suggest that the NIGT-derived fine-tuning of FT regulation is conditional on higher nitrogen conditions. 

      Minor: 

      (1) Abstract: "Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and in true leaves differed transcriptionally.". This sentence is not informative. What exactly is the difference in FT-expressing cells between cotyledons and true leaves? 

      We modified the sentence to clarify the differences between cotyledons and true leaves. “Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and true leaves showed differences especially in FT repressor genes.”

      (2) As a standard practice, to support the direct regulation of FT by NIGT1, the authors should provide EMSA and ChIP-seq data. Ideally, they should also generate promoter constructs with deletions or mutations in the NIGT1 binding sites. 

      To test direct interaction of NIGT1 to the FT promoter sequences, we performed the transient reporter assay using FT promoter driven luciferase reporter (Fig. 5C). NIGT1.2 and NIGT1.4 repressed the FT promoter activity; however, with NIGT1 binding site mutations, this repression was not observed, indicating that NIGT1 binds to the ciselements in the FT promoter to repress its transcription.

      (3) Sorting: Did the authors fix the samples before preparing the nuclei suspension? If not, could this be the reason the authors observed the JA-responsive clusters (Fig. 2J)? Please provide more details related to nuclei sorting in the Methods section. 

      We added a new subsection in the Materials and Methods section to explain a detail of the nuclei sorting procedure. We did not include a sample fixation step. We have tried formaldehyde fixation; however, it clumped nuclei, which was not suitable for snRNA-seq. Moreover, fixation steps generally reduce readcounts of single-cell RNA-seq according to the 10X Genomics’ guideline.

      We agree that JA responses were triggered during the FANS nuclei isolation. Therefore, we added the following sentence. “Since our FANS protocol did not include a sample fixation step to avoid clumping, these cells likely triggered wounding responses during the chopping and sorting process (Fig. S1B).  

      Reviewer #2 (Public review): 

      This manuscript submitted by Takagi et al. details the molecular characterization of the FTexpressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4. 

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time. 

      We are grateful that reviewer 2 recognizes the importance of transcriptome profiling of FTexpressing cells at the single-cell level.

      Here are my comments on how to improve this manuscript. 

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section. 

      We agree with reviewer 2 that the spatial expression patterns of NIGT1.2 and cluster 7 genes do not overlap much, and some discussion should be provided in the manuscript. Although we do not have a concrete answer for this phenomenon, we obtained the new data showing that NIGT1.2 and NIGT1.4 directly repress the FT gene in planta (Fig. 5C).  As NIGT1.2/1.4 are negative regulators of FT, it is plausible that NIGT1.2/1.4 may suppress FT gene expression in non-cluster 7 cells to prevent the misexpression of FT. We added this point in the Results section.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of constructs the authors used to make the transformants.

      As reviewer 2 pointed out, we lacked a clear explanation of why we used NTF in this study. NTF is the fusion protein that consists of a nuclear envelope targeting WPP domain, GFP, and a biotin acceptor peptide. It was initially designed for the INTACT (isolation of nuclei tagged in specific cell types) method, which enables us to isolate bulk nuclei from specific tissues. Although our original intention was to profile the bulk transcriptome of mRNAs that exist in nuclei of the FT-expressing cells using INTACT, we utilized our NTF transgenic lines for snRNA-seq analysis. To explain what NTF is to readers, we included a schematic diagram of NTF (Fig. S1A) and more explanation about NTF in the Results section.

      Again, we appreciate all reviewers’ careful and constructive comments. With these changes, we hope our revised manuscript is now satisfactory.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function. 

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on some behavioral outcomes may be a bit overstated given technical limitations of the experiments. 

      For example, after virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic. 

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in some of the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences? 

      We thank the reviewer for the comments and for raising additional interpretations of our results. We agree that determining the relative number of D1- versus D2-SPN starter cells would allow a more accurate estimate of connectivity. However, due to current technical limitations, achieving this level of precision remains challenging. As the reviewer also noted, differences in the number of cortical neurons targeting D1- versus D2-SPNs could introduce additional complexity to the functional effects observed in the behavioral experiments. Moreover, functional heterogeneity is likely to exist not only among cortical neurons projecting to striatal D1- or D2-SPNs, but also within the striatal D1- and D2-SPN populations themselves. Addressing these questions at the single-neuron level will require more refined viral tools in combination with improved recording and manipulation techniques. Despite these limitations, our results suggest that a subpopulation of cortical neurons selectively targets striatal D1-SPNs, supporting a functional dichotomy of pathway-specific corticostriatal subcircuits in the control of behavior.   

      Reviewer #2 (Public review): 

      Summary: 

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs). 

      Strengths: 

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure. 

      Weaknesses: 

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not. 

      Thank you for raising this thoughtful concern. It is indeed not feasible to restrict ChR2 expression to a specific cortical region using the first-generation rabies-ChR2 system alone. A more refined approach would involve injecting Cre-dependent TVA and RG into the striatum of D1- or A2A-Cre mice, followed by rabies-Flp infection. Subsequently, a Flp-dependent ChR2 virus could be injected into the MCC or M1 to selectively label D1- or D2-projecting cortical neurons. This strategy would allow for more precise targeting and address many of the current limitations.

      However, a significant challenge lies in the cytotoxicity associated with rabies virus infection. Neuronal health begins to deteriorate substantially around 10 days post-infection, which provides an insufficient window for robust Flp-dependent ChR2 expression. We have tested several new rabies virus variants with extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, they did not perform effectively or suitably in the corticostriatal systems we examined.

      In our experimental design, the aim is to delineate the connectivity probabilities to D1 or D2-SPNs from cortical neurons. Our hypothesis considered includes the possibility that similar innervation patterns could occur across multiple cortical subregions, or that some subregions might show preferential input to D1-SPNs while others do not, or a combination of both scenarios. This leads us to perform a series behavior test that using optogenetic activation of the D1- or D2-projecting cortical populations to see which could be the case.

      In the cortical areas we examined, MCC and M1, during behavioral testing, there is consistency with our electrophysiological results. Specifically, when we stimulated the D1-projecting cortical neurons either in MCC or in M1, mice exhibited facilitated local motion in open field test, which is the same to the activation of D1 SPNs in the striatum along (MCC: Fig 3C & D vs. I; M1: Fig 3F & G vs. L). Conversely, stimulation of D2-projecting MCC or M1 cortical neurons resulted in behavioral effects that appeared to combine characteristics of both D1- and D2-SPNs activation in the striatum (MCC: Fig 3C & D vs. J; M1: Fig 3F & G vs. M). The similar results were observed in the ICSS test. Our interpretation of these results is that the activation of D1-projecting neurons in the cortex induces behavior changes akin to D1 neuron activation, while activation of D2-projecting neurons in the cortex leads to a combined effect of both D1 and D2 neuron activation. This suggests that at least some cortical regions, the ones we tested, follow the hypothesis we proposed.

      There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. The authors indicate they are patching from mCherry-negative neurons within the region of the mCherry-positive neurons, but since the mCherry population will include both true starter cells and monosynaptically connected cells, this is not perfectly precise. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021). 

      This is a valid concern regarding anatomical studies. Investigating cortico-striatal connectivity at the single-cell level remains technically challenging due to current methodological limitations. At present, we rely on rabies virus-mediated trans-synaptic retrograde tracing to identify D1- or D2-projecting cortical populations. This anatomical approach is coupled with ex vivo slice electrophysiology to assess the functional connectivity between these projection-defined cortical neurons and striatal SPNs. This enables us to quantify connection ratios, for example, the proportion of D1-projecting cortical neurons that functionally synapse onto non-starter D1-SPNs.

      To ensure the robustness of our conclusions, it is essential that both the starter cells and the recorded non-starter SPNs receive comparable topographical input from the cortex and other brain regions. Therefore, we carefully designed our experiments so that all recorded cells were located within the injection site, were mCherry-negative (i.e., non-starter cells), and were surrounded by ChR2-mCherry-positive neurons. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.

      These methodological details are also described in the section on ex vivo brain slice electrophysiology, specifically in the Methods section, lines 453–459:

      “D1-SPNs (eGFP-positive in D1-eGFP mice, or eGFP-negative in D2-eGFP mice) or D2-SPNs (eGFP-positive in D2-eGFP mice, or eGFP-negative in D1-eGFP mice) that were ChR2-mCherry-negative, but in the injection site and surrounded by cells expressing ChR2-mCherry were targeted for recording. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.”

      This experimental strategy was implemented to control for potential spatial biases and to enhance the interpretability of our connectivity measurements.

      A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls, although a different control (with light delivered in M1) is provided in Supplementary Figure 3. Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement. This study may have given different results due to subtly different experimental parameters, including fiber optic placement and NA.

      We appreciate the reviewer’s thoughtful evaluation and comments. We have added a short discussion of Cui et al.’s study on optogenetic stimulation of D1-SPNs in the DLS (lines 341-343), which reports findings that contrast with ours and those of other studies.

      Reviewer #3 (Public review): 

      Review of resubmission: The authors provided a response to the reviews from myself and other reviewers. While some points were made satisfactorily, particularly in clarification of the innervation of cortex to striatum and the effects of input stimulation, many of my points remain unaddressed. In several cases, the authors chose to explain their rationale rather than address the issues at hand. A number of these issues (in fact, the majority) could be addressed simply by toning done the confidence in conclusions, so it was disappointing to see that the authors by and large did not do this. I repeat my concerns below and note whether I find them to have been satisfactorily addressed or not. 

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity, and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points. 

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below. 

      Major: 

      There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results. 

      This is still an issue. The authors point out why they chose various vectors. I can understand why the authors chose the fluorophores etc. that they did, yet the issues I raised previously are still valid. The discussion should mention that this is a potential issue. It does not necessarily invalidate results, but it is an issue. Furthermore, it is possible (in all systems) that rabies replicates better/more efficiently in some cells than others. This is one possible interpretation that has not really been explored in any study. I don't suggest the authors attempt to do that, but it should be raised as a potential interpretation. If the rabies results could mean several different things, the authors owe it to the readership to state all possible interpretations of data.

      We thank the reviewer for the comments and suggestions. Because the same fluorophore (mCherry) was used in both TVA- and ChR2-expressing viruses, it was not possible to distinguish true starter SPNs from TVA-only SPNs or monosynaptically labeled SPNs. This limitation makes it difficult to precisely assess the efficiency of rabies labeling and retrograde tracing in our experimental setup. Moreover, differences in rabies replication efficiency between D1- and D2-SPNs could potentially lead to an apparent lower connection probability from D1-projecting cortical neurons to D2-SPNs than from D2-projecting cortical neurons to D1-SPNs. We have added this clarification to the Discussion (lines 280-297).

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. Health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included. 

      This issue remains unaddressed. I did not request clarity about experimental design, but rather, raised issues about the potential effects of toxicity. I believe this to be a valid concern that needs to be discussed in the manuscript, especially given what look visually like potential differences in S2. 

      We understand and appreciate the reviewer’s concern regarding the potential cytotoxicity of rabies virus infection. Although we performed the in vivo optogenetic behavioral experiments during a period when rabies-infected cells are generally considered relatively healthy, some deficits in starter cells may still occur and could contribute to the observed effects of optogenetic cortical stimulation. We have added this clarification to the Discussion (lines 298-306).

      The overall purity (e.g., EnvA pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity. This issue has not been addressed. Viral strain is irrelevant. The quality of the specific preparations used is what matters.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down. 

      The authors added text to the discussion to address this point. While it largely does what is intended, based on the one study cited, I disagree with the authors' conclusions that it is "clear" that potential contamination from other sites does not play a role. The simplest interpretation is the one the authors state, and there is some supporting evidence to back up that assertion, but to me that falls short of making the point "clear" that there are no other interpretations. 

      The statements about specificity of connectivity are not well founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results. 

      Again, the goal here would be to make a statement about this in the discussion to clarify limitations of the study. I don't expect the authors to re-do all of these experiments, but since they are discussing the corticostriatal circuits, which have multiple subdomains, this remains a relevant point. It has not been addressed. 

      The results in Figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret. 

      I think that the caveat of showing no clear effects of inputs to D2 stimulation should be pointed out. Yes, I understand that the viruses appeared to express etc., but again it remains possible that the results are driven by a lack of e.g., sufficient ChR2 expression. Aside from a full quantification of the number of cells expressing ChR2, overlap in fiber placement and ChR2 expression (which I don't suggest), this remains a possibility and should be pointed out, as it remains a possibility. 

      In the light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in Figure 4 - the inputs and putative downstream cells do not have the same effects. Given potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments. 

      The explanation the authors provide in their rebuttal makes sense, however this should be included in the discussion of the manuscript, as it is interesting and relevant. 

      We thank the reviewer for the valuable comments and suggestions. In line with the reviewer’s recommendation, we have incorporated these explanations into the Discussion (lines 242–279) to help interpret the complex behavioral outcomes of optogenetic stimulation of cortical neurons projecting to D1- or D2-SPNs.

      Reviewer #2 (Recommendations for the authors): 

      I appreciate the authors' responses, which helped clarify some experimental choices. I appreciate that the experiment in Fig S3 serves as a reasonable light control for optogenetics experiments. The careful comparison with methods in Cui et al (2021) is useful, although not added to the main manuscript. Some of the other citations here don't really address the controversy, e.g. Kravitz at al is in DMS, but perhaps fully addressing this issue is outside the scope of the current manuscript and awaits further experiments. I also appreciate the clarification for recording locations that "This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry." However, the statement in the reviewer response does not seem to be added to the manuscript's methods, which I think would be helpful. The criteria for choosing recorded cells are still a bit fuzzy without a map of recording locations and histology. There is also a problem that mCherry-positive cells could be starter cells or could be monosynaptically traced cells, so it is hard to know the area of the starter cell population in these experiments for sure. My evaluation of the manuscript remains largely the same as the original. However, I have adjusted my public review a bit to incorporate the authors' responses. I still think this paper has valuable information, suggesting an interesting and previously unappreciated structure of corticostriatal inputs that I hope this group and others will continue to investigate and incorporate into models of basal ganglia function.

      We thank the reviewer for the valuable suggestions. We have now included a comparison with Cui et al. in the Discussion. In addition, we have added the criteria for selecting recorded cells to the Methods section: ‘This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.’

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets.

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure.

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. We have added a couple sentences to the introduction to emphasize the novelty of this approach and validation.

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song.  

      We thank the reviewer for this suggestion. We have included a comparison of our triplet loss embedding model to the VAE model proposed in Goffinet et al. 2021. We also included comparisons of similarity scoring using each of these embedding models combined with either earth mover’s distance (EMD) or maximum mean discrepancy (MMD) to calculate the similarity of the embeddings, as was done in Goffinet et al. 2021. As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, to analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future, and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.  

      We recognize the similarities between these approaches and have included comparisons of the VAE and MMD as in the Goffinet paper to our triplet loss model and EMD.  As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach. 

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.  

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior.  

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We have added a couple lines to the ‘Comparing Song Disruptions with AVN Features’ and ‘Tracking Song Development with AVN Features’ sections of the results to make this more clear. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      This reviewer appears to have misunderstood our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and have added a paragraph to the ‘Measuring Song Imitation’ section of the results explaining this rationale more briefly.

      First, nowhere are we training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of two-dimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a selfsupervised learning task, as it does require syllable labels to generate the triplets. A common selfsupervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we have included a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript. 

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, this reviewer seems not to understand our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful low-dimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).  

      We did compare multiple methods for syllable segmentation (WhisperSeg, TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.  

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in training. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1<sub>seg</sub> scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and non-stationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was never any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see Figure 2–figure supplement 1b), but still very high precision scores (Figure 2–figure supplement 1a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Figure 3c) or syllable duration entropy (Figure 3–figure supplement 2a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.  

      We appreciate the author’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings shared with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets. 

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.  

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology, and is outside the scope of our current efforts.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.  

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human feedback into AVN was never the goal of our pipeline, would require significant changes to AVN’s design and is outside the scope of this manuscript.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.  

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types, as mentioned in lines 222-226 of the revised manuscript. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#Syllable-Repetitions for further details.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.  

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We have expanded our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (Figure 2–figure supplement 1). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments non-vocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label (or is missing a label entirely), but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another.

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and Figure 2–figure supplement 3b&e. We also added two paragraphs to the end of the ‘Accurate, fully unsupervised syllable labeling’ section of the Results in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.  

      We apologize for not making this distinction sufficiently clear in the manuscript and have added a paragraph to the ‘Measuring Song Imitation’ section of the Results explaining the rational for using an embedding model for similarity scoring. 

      We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD or MMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD or MMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD and MMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space. 

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. We observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, and have added an additional supplementary figure to the revised manuscript showing this (Figure 2–figure supplement 4).

      Reviewer #1 (Recommendations For The Authors):

      (1) Benchmark their similarity score to the method used by Goffinet et al, 2021 from the Pearson group. Such a comparison would be really interesting and useful.  

      This has been added to the paper. 

      (2) Please clarify exactly what is new and what is applied from existing methods to help the reader see the novelty of the paper.  

      We have added more emphasis on the novel aspects of our pipeline to the paper’s introduction. 

      Minor:

      It's unclear if AVN is appropriate as the paper deals only with zebra finch song - the scope is more limited than advertised.

      We assume this is in reference to ‘Birdsong’ in the paper’s title and ‘Avian’ in Avian Vocalization Network. There is a brief discussion of how these methods are likely to perform on other commonly studied songbird species at the end of the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      A few points for the authors to consider that might strengthen or inform the paper:

      (1) In the public review, I detailed some ways in which the SSL+EMD approach is unlikely to be appreciably distinct from the VAE+MMD approach -- in fact, one could mix and match here. It would strengthen the authors' claim if they showed via experiments that their method outperforms VAE+MMD, but in the absence of that, a discussion of the relation between the two is probably warranted.  

      This comparison has been added to the paper.

      (2) ll. 305-310: This loss of accuracy near the edge is expected on general Bayesian grounds. Any regression approach should learn to estimate the conditional mean of the age distribution given the data, so ages estimated from data will be pulled inward toward the location of most training data. This bias is somewhat mitigated in the Brudner paper by a more flexible model, but it's a general (and expected) feature of the approach.

      (3) While the online AVA documentation looks good, it might benefit from a page on design philosophy that lays out how the various modules fit together - something between the tutorials and the nitty-gritty API. That way, users would be able to get a sense of where they should look if they want to harness pieces of functionality beyond the tutorials.

      Thank you for this suggestion. We will add a page on AVN’s design philosophy to the online documentation. 

      (4) While the manuscript does compare AVN to packages like TweetyNet and AVA that share some functionality, it doesn't really mention what's been going on with the vocalpy ecosystem, where the maintainers have been doing a lot to standardize data processing, integrate tools, etc. I would suggest a few words about how AVN might integrate with these efforts.

      We thank the reviewer for this suggestion.

      (5) ll. 333-336: It would be helpful to provide a citation to some of the self-supervised learning literature this procedure is based on. Some citations are provided in methods, but the general approach is worth citing, in my opinion. 

      We have added a paragraph to the results section with more background on self-supervised learning for dimensionality reduction, particularly in the context of similarity scoring.

      (6) One software concern for medium-term maintenance: AVN docs say to use Python 3.8, and GitHub says the package is 3.9 compatible. I also saw in the toml file that 3.10 and above are not supported. It's worth noting that Python 3.9 reaches its end of life in October 2025, so some dependencies may have to be altered or changed for the package to be viable going forward.  

      Thank you for this comment. We will continue to maintain AVN and update its dependencies as needed.

      Minor points:

      (1) It might be good to note that WhisperSeg is a different install from AVN. May be hard for novice users, though there's a web interface that's available. 

      We’ve added a line to the methods section making this clear. 

      (2) Figure 6b: Some text in the y-axis labels is overlapping here. 

      This has been fixed. Thank you for bringing it to our attention. 

      (3) The name of the Python language is always capitalized.  

      We’ve fixed this capitalization error throughout the manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors improve the motivation of the chosen tasks and data or choose new tasks that more clearly speak to the optimizations they want to perform. 

      We have included more details about the motivation for our LDA classification analysis, age prediction model and embedding model for similarity scoring in the results of the revised manuscript, as discussed in more detail in the above responses to this reviewer. Thank you for these suggestions. 

      (2) They need to rigorously report the (classification) scores on the test datasets: these are the scores associated with the cost function used during training.  

      Based on this reviewer’s ‘Weaknesses: 3’ comment in the public reviews, we believe that they are referring to a classification score for the triplet loss model. As we explained in response to that comment, this is not a classification task, therefor there is no classification score to report. The loss function used to train the model was a triplet loss function. While we could report these values, they are not informative for how well this approach would perform in a similarity scoring context, as explained above. As such, we prefer to include contrast index and tutor contrast index scores to compare the models’ performance for similarity score, as these are directly relevant to the task and are established in the field for said task.

      (3) They need to explain the reasons for the poor performance (or report on the inconsistencies with previous work) and why they prefer a fully automated system rather than one that needs some fine-tuning on bird-specific data.

      We’ve addressed this comment in the public response to this reviewer’s weakness points 3, 5, and 6. 

      (4) They should consider applying their method to data from Japanese and European labs.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 4.

      (5) The need to document the failure modes and report all details about the human annotations.  

      We’ve added additional description of the failure modes for our segmentation and labeling approaches in the results section of the revised manuscript.

      Details: 

      The introduction is very vague, it fails to make a clear case of what the problem is and what the approach is. It reads a bit like an advertisement for machine learning: we are given a hammer and are looking for a nail.  

      We thank the reviewer for this viewpoint; however, we disagree and have decided to keep our Introduction largely unchanged. 

      L46 That interpretability is needed to maximize the benefits of machine learning is wrong, see self-driving cars and chat GPT.  

      This line states that ‘To truly maximize the benefits of machine learning and deep learning methods for behavior analysis, their power must be balanced with interpretability and generalizability’. We firmly believe that interpretability is critically important when using machine learning tools to gain a deeper scientific understanding of data, including animal behavior data in a neuroscience context. We believe that the introduction and discussion of this paper already provide strong evidence for this claim. 

      L64 What about zebra finches that repeat a syllable in the motif, how are repetitions dealt with by AVN?  

      This is already described in the results section in lines 222-226, and in the methods in the ‘Syntax Features: Repetition Bouts’ section.

      L107 Say a bit more here, what exactly has been annotated?  

      We’ve added a sentence in the introduction to clarify this. Line 113-115. 

      L112 Define spectrogram frames. Do these always fully or sometimes partially contain a vocalization? 

      Spectrogram frames are individual time bins used to compute the spectrogram using a short-term Fourier transform. As described in the ‘Methods; Labeling : UMAP Dimensionality Reduction” section, our spectrograms are computed using ‘The short term Fourier transform of the normalized audio for each syllable […] with a window length of 512 samples and a hop length of 128 samples’. Given that the song files have a standard sampling rate of 44.1kHz, this means each time bin represents 11.6ms of song data, with successive frames advancing in time by 2.9ms. These contain only a small fraction of a vocalization. 

      L122 The reported TweetyNet score of 0.824 is lower than the one reported in Figure 2a.  

      The center line in the box plot in Figure 2a represents the median of the distribution of TweetyNet vmeasure scores. Given that there are a couple outlying birds with very low scores, the mean (0.824 as reported in the text of the results section) is lower than the median. This is not an error.

      L155 Some of the differences in performance are very small, reporting of the P value might be necessary. 

      These methods are unlikely to statistically significantly differ in their validation scores. This doesn’t mean that we cannot use the mean/median values reported to justify favoring one method over another. This is why we’ve chosen not to report p-values here.

      L161 The authors have not really tested more than a single clustering method, failing to show a serious attempt to achieve good performance.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 2.

      L186 Did isolate birds produce stereotyped syllables that can be clustered? 

      Yes, they did. The validation for clustering of isolate bird songs can be found in Figure 2–figure supplement 4. 

      Fig. 3e: How were the multiple bouts aligned?

      This is described in lines 857-876 in the ‘Methods: Song Timing Features: Rhythm Spectrograms” section of the paper.

      L199 There is a space missing in front of (n=8).  

      Thank you for bringing this to our attention. It’s been corrected in the updated manuscript. 

      L268 Define classification accuracy.  

      We’ve added a sentence in lines 953-954 of the methods section defining classification accuracy. 

      L325 How many motifs need to be identified, why does this need to be done manually? There are semiautomated methods that can allow scaling, these should be  cited here. Also, the mention of bias here should be removed in favor of a more extensive discussion on the experimenter bias (traditionally vs Texas bias (in this paper).  

      All of the methods cited in this line have graphical user interfaces that require users to select a file containing song and manually highlight the start and end each motif to be compared. The exact number of motifs required varies depending on the specific context (e.g. more examples are needed to detect more subtle differences or changes in song similarity) but it is fairly standard for reviewers to score 30 – 100 pairs of motifs. 

      We’ve discussed the tradeoffs between full automation and supervised or human-in-the loop methods in response to this reviewer’s public comment ‘weakness #5 and 6’. Briefly, AVN’s aim is to standardize song analysis, to allow direct comparisons between song features and similarity scores across research groups. We believe, as explained in the paper, that this can be best achieve by having different research groups use the same deep learning models, which perform consistently well across those groups. Introducing semi-automated methods would defeat this benefit of AVN. 

      We’ve also addressed the question of ‘Texas bias’ in response to their reviewer’s public comment ‘Weakness #4’. 

      L340 How is EMD applied? Syllables are points in 8-dim space, but now suddenly authors talk about distributions without explaining how they got from points to distributions. Same in L925.  

      We apologize for the confusion here. The syllable points in the 8-d space are collectively an empirical distribution, not a probability distribution. We referred to them simply as ‘distributions’ to limit technical jargon in the results of the paper, but have changed this to more precise language in the revised manuscript.

      L351 Why do authors now use 'contrast index' to measure performance and no longer 'classification accuracy'?  

      We’ve addressed this comment in the public response to this reviewer’s weakness points 1 and 2.

      Figure 6 What is the confusion matrix, i.e. how well can the model identify pupil-pupil pairings from pupiltutor and from pupil-unrelated pairings? I guess that would amount to something like classification accuracy.  

      There is no model classifying comparisons as pupil-pupil vs. pupil-tutor etc. These comparisons exist only to show the behavior of the similarity scoring approach, which consists of a dissimilarity measure (MMD or EMD) applied to low dimensional representations of syllable generated by the triplet loss model or VAE. This was clarified further in our public response to this reviewer’s weakness points 1 and 2. 

      L487 What are 'song files', and what do they contain?   

      ‘Song files’ are .wav files containing recordings of zebra finch song. They typically contain a single song bout, but they can include multiple song bouts if they are produced close together, or incomplete song bouts if the introductory notes were very soft or the bouts were very long (>30s from the start of the file). Details of these recordings are provided in the ‘Methods: Data Acquisition: UTSW Dataset’ section of the manuscript.

      L497 Calls were only labelled for tweetynet but not for other tasks.  

      That is correct. The rationale for this is provided in the ‘Methods: Manual Song Annotation’ section of the manuscript. 

      L637 There is a contradiction (can something be assigned to the 'own manual annotation category' when the same sentence states that this is done 'without manual annotation'?) 

      We believe there is confusion here between automated annotation and validation. Any bird can be automatically annotated without the need for any existing manual annotations for that individual bird. However, manual labels are required to compare automatically generated annotations against for validation of the method.

      L970 Spectograms of what? (what is the beginning of a song bout, L972). 

      The beginning of a song bout is the first introductory note produced by a bird after a period without vocalizations. This is standard.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on figshare can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We will further add this link to the Data Availability Statement of our revised version.  

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. Considering these points, we will make changes in our revised version to clearly elaborate on what the definition of ‘mechanism’ should include in line with the reviewer’s feedback.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points we will expand our discussion to highlight these key questions that could be critical to think upon for future research.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. Author response:

      We thank the reviewers for their detailed and thoughtful comments on the manuscript.  In general, the reviewers found the data supporting the role of Enterovirus D68 proteases in disrupting the composition of the nuclear pore complex, the 2A protease disrupting nucleocytoplasmic transport of protein cargoes, and the mechanistic dissection of this process to be convincing and potentially relevant to the pathogenesis of AFM.  Reviewers requested additional experiments evaluating our observation that RNA export was not similarly impaired, particularly in the context of viral infection rather than solely expression of recombinant proteases.  They also requested that cleavage of POM121 and Nup98 by 2A protease, which was demonstrated in 2A<sup>pro</sup> transfected cells and in biochemical assays, also be demonstrated in motor neurons infected by EV-D68.  Finally, reviewers noted that while suggestive, the evidence falls short of demonstrating that the toxicity of 2A<sup>pro</sup> is mediated through nuclear pore complex dysfunction.

      To address these critiques, we aim to do the following:

      (1) Determine the impact of live virus infection on RNA export by repeating the ethinyl uridine pulse-chase assay in the setting of live virus infection.  We will also provide representative images for these data and the previously reported data from transfection with GFP-2A<sup>pro</sup> and GFP-3C<sup>pro</sup>.

      (2) Evaluate cleavage of POM121 and Nup98 in EV-D68-infected diMNs and inhibition of cleavage by telaprevir by Western blot.

      (3) Present motor neuron survival data in figure 4 as separate graphs for each of the viral strains tested, rather than pooling the data.  To clarify reviewer #3’s concern, these were not mixed cultures.

      We agree that we have not demonstrated conclusively that the mechanism by which 2A<sup>pro</sup> is toxic to motor neurons is via NPC dysfunction.  Future work will determine the extent to which NPC dysfunction contributes to 2A<sup>pro</sup>-mediated motor neuron toxicity versus other potential targets of 2A<sup>pro</sup>.  We feel that the additional experiments required to achieve this will be extensive and are beyond the scope of the present manuscript, which represents a key first step in this line of inquiry.

      In addition to the above, there were several points of disagreement between reviewers.  We would like to respond to those as follows:

      Reviewer #1: “The hypothesis that infection of motoneurons is the cause of EVD68-induced neurological complications so far is supported by only one autopsy report.  Other data suggest that infection of other cell types, such as astrocytes, and/or inflammatory cell infiltration in the CNS, are likely to be responsible for the symptoms.”

      Reviewer #3: “This study opens up a very intriguing hypothesis: that EV-D68 2Apro could be directly responsible for motor neuron cell death, mediated by POM121 and possibly Nup98 cleavage, that ultimately results in paralysis known as acute flaccid myelitis. This hypothesis notably does run counter to other published data showing that human neuronal organoids derived from iPSCs can support productive EV-D68 infection for weeks without cell death and that EV-D68-infected mice can have paralysis prevented by depletion of CD8 T cells, still with EV-D68 infection of the spinal cord. However, even if 2Apro is not ultimately responsible for motor neurons dying in human infections, that does not exclude the possibility that cleavage of nups could still disrupt motor neuron function. Notably, most children with AFM have some amount of motor function return after their acute period of paralysis, but most still have some residual paralysis for years to life. It is possible that 2A pro could mediate the acute onset of weakness, while T cells killing neurons could determine the amount of long-term, residual paralysis.”

      The infection of motor neurons is strongly supported not only by the aforementioned autopsy data[1], but also by mouse model data demonstrating replication of EV-D68 within motor neurons in the anterior horn of the spinal cord.[2 ] There are also extensive reports of electromyography and nerve conduction studies from human AFM patients demonstrating that the site of pathology is the spinal motor neuron.[3-10]. By contrast, infection of astrocytes has been demonstrated only in primary murine astrocyte cultures in which no neurons were present.[11] .Therefore, while the available data suggest that EV-D68 infection of astrocytes is possible, in the in vivo context of human and mouse spinal cord, tropism to motor neurons appears to be preferential.  The relative contributions to toxicity of neuron-autonomous vs non-autonomous processes such as glial dysfunction and inflammatory cell infiltration remain to be elucidated, and are not mutually exclusive.

      Our working hypothesis is more in line with that of Reviewer #3.  Motor neuron dysfunction and motor neuron death may ultimately prove to have dissociable causes, each of which may be neuron-autonomous, non-neuron-autonomous, or a mixture thereof.  The infection of motor neurons is likely the initiating event, with multiple downstream consequences.  Much additional work will be required to resolve this controversy.

      Reviewer #1: “Demonstrates a therapeutic effect of telaprevir, with neuroprotection independent of viral replication inhibition, adding translational value to the findings.”

      Reviewer #3: “The authors' claim that the neuroprotective effect of telaprevir is independent of its antiviral effect is not well-founded. Figure 4E (neuroprotection) was done with MOI 5, and Figure 4G (virus growth) was MOI 0.5. Telaprevir neuroprotection is not shown at MOI 0.5, nor is the neuroprotective effect correlated with inhibition of 2A cleavage of Nup98 or POM121.”

      The selection of MOIs for these two experiments was limited by technical considerations.  If the viral growth curve were to be performed at MOI 5, it would be confounded by cell death.  Further, a low MOI is required in order to allow multiple rounds of infection, replication, and spread within the culture, and is therefore more sensitive for assaying the effect of telaprevir on viral replication.  On the other hand, at MOI 0.5 diMN death is very gradual, and in the neuroprotection assay we would have lacked the statistical power to determine whether a rescue of this small magnitude of toxicity is significant.  The EC<sub>50</sub> of telaprevir is not expected to vary significantly at different MOIs.

      References:

      (1) Vogt, M. R. et al. Enterovirus D68 in the Anterior Horn Cells of a Child with Acute Flaccid Myelitis. N Engl J Med 386, 2059-2060 (2022). https://doi.org/10.1056/NEJMc2118155

      (2) Hixon, A. M. et al. A mouse model of paralytic myelitis caused by enterovirus D68. PLoS Pathog 13, e1006199 (2017). https://doi.org/10.1371/journal.ppat.1006199

      (3) Andersen, E. W., Kornberg, A. J., Freeman, J. L., Leventer, R. J. & Ryan, M. M. Acute flaccid myelitis in childhood: a retrospective cohort study. Eur J Neurol 24, 1077-1083 (2017). https://doi.org/10.1111/ene.13345

      (4) Elrick, M. J. et al. Clinical Subpopulations in a Sample of North American Children Diagnosed With Acute Flaccid Myelitis, 2012-2016. JAMA Pediatr 173, 134-139 (2018). https://doi.org/10.1001/jamapediatrics.2018.4890

      (5) Hovden, I. A. & Pfeiffer, H. C. Electrodiagnostic findings in acute flaccid myelitis related to enterovirus D68. Muscle Nerve 52, 909-910 (2015). https://doi.org/10.1002/mus.24738

      (6) Knoester, M. et al. Twenty-Nine Cases of Enterovirus-D68 Associated Acute Flaccid Myelitis in Europe 2016; A Case Series and Epidemiologic Overview. Pediatr Infect Dis J 38, 16-21 (2018). https://doi.org/10.1097/INF.0000000000002188

      (7) Martin, J. A. et al. Outcomes of Colorado children with acute flaccid myelitis at 1 year. Neurology 89, 129-137 (2017). https://doi.org/10.1212/WNL.0000000000004081

      (8) Saltzman, E. B. et al. Nerve Transfers for Enterovirus D68-Associated Acute Flaccid Myelitis: A Case Series. Pediatr Neurol 88, 25-30 (2018). https://doi.org/10.1016/j.pediatrneurol.2018.07.018

      (9) Van Haren, K. et al. Acute Flaccid Myelitis of Unknown Etiology in California, 2012-2015. JAMA 314, 2663-2671 (2015). https://doi.org/10.1001/jama.2015.17275

      (10) Natera-de Benito, D. et al. Acute Flaccid Myelitis With Early, Severe Compound Muscle Action Potential Amplitude Reduction: A 3-Year Follow-up of a Child Patient. J Clin Neuromuscul Dis 20, 100-101 (2018). https://doi.org/10.1097/CND.0000000000000217

      (11) Rosenfeld, A. B., Warren, A. L. & Racaniello, V. R. Neurotropism of Enterovirus D68 Isolates Is Independent of Sialic Acid and Is Not a Recently Acquired Phenotype. Mbio (2019). https://doi.org/10.1128/mBio

    1. Author response:

      Reviewer #1 (Public review):

      For summary:

      Thank you for your insightful and rigorous review. We fully agree with your core concern: establishing a causal link between MORC2 phase separation (PS) and its gene regulatory function is not only a key need in the phase separation field but also essential to elevating the overall utility of our work. To resolve the current gap in causal evidence, we will design experiments that explicitly distinguish the role of phase-separated condensates from soluble MORC2 complexes: We will generate a phase-separation-deficient but dimerization-competent MORC2 mutant by mutating key hydrophobic residues in the IDRa region (critical for IDR-IBD multivalent interactions driving phase separation) without disrupting the CC3 domain’s dimerization interface. In addition, we plan to investigate whether introducing a KS sequence[1] at the C-terminus can effectively attenuate the phase separation propensity of MORC2. These mutants will allow us to decouple “phase separation capacity” from “protein dimerization” (a prerequisite for both soluble complex formation and condensates).

      For strengths:

      We appreciate the reviewer’s recognition of our characterization of MORC2 phase separation and its structural basis. Our understanding of the CW domain’s function remains preliminary. Although we observed that the CW domain can influence condensate size, the IDR, IBD, and CC3 domains constitute the core structural elements driving phase separation. Consequently, the CW domain was not a primary focus of the current study. Nonetheless, investigating its functional contributions represents an interesting avenue for future work.

      For weaknesses:

      (1) We appreciate the reviewer’s rigorous concern. Our RNA-seq data were generated from fully independent transfections performed in triplicate across different time points and cell culture batches, aiming to maximize sample independence. However, for sensitive sequencing experiments, we observed that variability in transfection efficiency and cell culture across batches can introduce experimental differences, resulting in variable regulation of differentially expressed genes across samples. During differential gene analysis, p-value filtering excluded an additional 40 overlapping genes. In total, 61 genes overlapped with those reported in reference 22[2] (ZNF91, ZNF721, ZNF66, ZNF493, ZNF462, ZNF221, ZNF121, VGLL3, TUFT1, TLE4, TGFB2, SYS1-DBNDD2, STXBP6, SPRY2, SAMD9, ROR1, PTGES, PLK2, PLCXD2, PEA15, PDE2A, OLR1, NYAP2, NTN4, NRXN3, NEXN, MYLK, MPP7, MDGA1, MAMDC2, LBH, KRT80, ITGB8, IGFBP3, IGF2BP2, ICAM1, HIVEP3, GRB14, GPRC5A, GLCE, GJB3, GADD45B, GADD45A, FOXE1, FOSL1, FGF2, ETV5, ERBB3, DNAJC22, DIRAS1, DBNDD2, CXCL16, CRB2, COL9A3, CLDN1, BDNF, ATP8A1, AMOTL2, AHNAK2, ADAMTS16, ACSF2). To further enhance reproducibility, we will perform additional sequencing experiments.

      (2).Disease-associated mutants of MORC2

      At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity, also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent[2]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      We are fully committed to implementing these revisions with strict rigor and plan to complete them within 8–10 weeks. We will submit a comprehensive response letter alongside the revised manuscript, explicitly mapping how each of your concerns has been addressed, and ensuring that our conclusions about MORC2 PS’s functional role are supported by solid, reproducible data. We believe these revisions will transform our study from a strong “mechanism-focused” work to a comprehensive one that bridges PS mechanisms and biological function—aligning with the high standards of the phase separation field. Thank you again for your invaluable guidance in improving our work.

      Reviewer #2 (Public review):

      For summary:

      Thank you for your thorough and constructive review of our manuscript. We fully agree with the key concerns you raised and have developed a detailed revision plan to address each point comprehensively. We will perform additional control and validation experiments to directly link MORC2’s condensate-forming capacity with its gene silencing function. At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity[3], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent[4]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      For strengths:

      We thank the reviewer for their appreciation of the key findings presented in this manuscript.

      For weaknesses:

      We thank the reviewer for their careful assessment of MORC2’s DNA-binding properties and its relationship with ATPase and transcriptional activities. We would like to offer the following clarifications to address these concerns, which will also be incorporated into the Discussion section of the revised manuscript.

      (1) Recent work by Tan et al.[4] similarly identified multiple DNA-binding sites in MORC2, consistent with our findings, though there are discrepancies in the precise binding regions. In particular, they reported that isolated CC1 and CC2 domains do not bind 60 bp dsDNA, which contrasts with our observations. We attribute this difference to the types of DNA used in the assays. In our study, we employed 601 DNA, a defined nucleosome-positioning sequence, which differs substantially from randomly designed short dsDNA. For instance, prior work by Christopher H. Douse et al.[3] also confirmed that MORC2’s CC1 domain can bind 601 DNA.

      (2) In the study by Fendler et al.², DNA binding was reported to reduce MORC2’s ATPase activity—an observation that appears inconsistent with the results presented in our Fig. 5j. A critical distinction between the two studies lies in the experimental systems used: Fendler et al. employed a truncated MORC2 construct (residues 1–603) and 35 bp double-stranded DNA (dsDNA), whereas our experiments utilized full-length MORC2 and 601 bp DNA (a sequence with high nucleosome assembly potential). These differences—including the absence of potentially regulatory C-terminal regions in the truncated construct and the varying length/structural properties of the DNA substrates—introduce variables that substantially complicate direct comparative analysis of ATPase activity outcomes.

      Separately, Douse et al.³ demonstrated that the efficiency of HUSH complex-dependent epigenetic silencing decreases as MORC2’s ATP hydrolysis rate increases, implying an inverse relationship between ATPase activity and silencing function. Notably, our current work has not established a direct mechanistic link between MORC2 phase separation and its ATPase activity. Thus, we refrain from inferring that the effect of MORC2 phase separation on transcriptional repression is mediated through modulation of its ATPase function—this remains an important question to address in future studies.

      (3) Finally, we plan to perform additional experiments to rule out the potential effects of CC3 dimerization. We will generate a phase-separation-deficient but dimerization-competent MORC2 mutant by mutating key hydrophobic residues in the IDRa region (critical for IDR-IBD multivalent interactions driving phase separation) without disrupting the CC3 domain’s dimerization interface. In addition, we plan to investigate whether introducing a KS sequence[1] at the C-terminus can effectively attenuate the phase separation propensity of MORC2. These mutants will allow us to decouple “phase separation capacity” from “protein dimerization”.

      We are committed to implementing these revisions with strict rigor and plan to complete them within 8–10 weeks. We will submit a detailed response letter alongside the revised manuscript, explicitly mapping how each of your concerns has been addressed, and ensuring the Discussion section is robust, context-rich, and fully integrates our work with the existing literature. We believe these improvements will significantly enhance the reliability, contextual relevance, and impact of our study, and we sincerely thank you for guiding us to elevate its quality.

      Reviewer #3 (Public review):

      For summary:

      Thank you for your insightful review and constructive suggestions, which have been invaluable in refining our manuscript. We greatly appreciate your recognition of the study’s strengths, including its logical structure, integration of multi-disciplinary approaches (in vitro LLPS assays, cellular studies, NMR, and crystallography), and the establishment of a functional link between MORC2 phase separation, DNA binding, and transcriptional control. Your identification of areas needing stronger evidence has provided clear, actionable directions for improvement, and we are fully committed to addressing each point comprehensively.

      For Major comments:

      To strengthen the manuscript as per your recommendations:

      (1) For the characterization of IDR-IBD interactions in PS: We will perform systematic in vitro assays, including PS turbidity measurements and confocal imaging of MORC2 variants lacking IDR or IBD (ΔIDR, ΔIBD) and truncated constructs (IDR alone, IBD alone). These experiments will quantify how each domain individually or synergistically contributes to phase separation propensity (e.g., critical concentration, condensate size/distribution).

      (2) To assess DNA’s influence on PS: We will generate phase diagrams by testing a range of MORC2 concentrations (0.5–10 μM) or with 601 DNA (147bp) and concentrations (0–2 μM), using turbidity assays and microscopy to map phase boundaries. This will systematically clarify how DNA modulates MORC2 phase separation.

      We plan to complete these experiments within 3–4 weeks, with rigorous quantification and statistical analysis to support our conclusions. The revised manuscript will include a detailed response letter mapping each of your suggestions to specific data additions, ensuring enhanced robustness and conviction. We believe these revisions will significantly strengthen the study’s conclusions, and we sincerely thank you for guiding us to improve its quality.

      Reference:

      [1] Mensah, M. A., Niskanen, H., Magalhaes, A. P., Basu, S., Kircher, M., Sczakiel, H. L., Reiter, A. M. V., Elsner, J., Meinecke, P., Biskup, S., et al. (2023). Aberrant phase separation and nucleolar dysfunction in rare genetic diseases. Nature 614, 564-571. https://doi.org/10.1038/s41586-022-05682-1.

      [2] Fendler, N. L., Ly, J., Welp, L., Lu, D., Schulte, F., Urlaub, H., and Vos, S. M. (2024). Identification and characterization of a human MORC2 DNA binding region that is required for gene silencing. Nucleic Acids Res 53, gkae1273. https://doi.org/10.1093/nar/gkae1273.

      [3] Douse, C. H., Bloor, S., Liu, Y. C., Shamin, M., Tchasovnikarova, I. A., Timms, R. T., Lehner, P. J., and Modis, Y. (2018). Neuropathic MORC2 mutations perturb GHKL ATPase dimerization dynamics and epigenetic silencing by multiple structural mechanisms. Nat Commun 9, 651. https://doi.org/10.1038/s41467-018-03045-x.

      [4] Tan, W., Park, J., Venugopal, H., Lou, J. Q., Dias, P. S., Baldoni, P. L., Moon, K. W., Dite, T. A., Keenan, C. R., Gurzau, A. D., et al. (2025). MORC2 is a phosphorylation-dependent DNA compaction machine. Nat Commun 16, 5606. https://doi.org/10.1038/s41467-025-60751-z.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are for the most part convincing. Furthermore, the discussion of the potential explanations for the subtle impacts of individual microexon deletions versus lossof-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful. One major weakness: data presentation, methods, and jargon at times affect readability / might lead to overstated conclusions. However, overall this manuscript is well-written, easy to follow, and the results are of broad interest.

      We thank the Reviewer for their positive comments on our manuscript. In the revised version, we will try to improve readability, reduce jargon and avoid overstatements.  

      Strengths:

      (1) The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function to regulation of behavior and development.

      (2) The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

      Weaknesses:

      Major Points:

      (1) According to the methods, it seems that srrm3 social behavior is tested by pairing a 3mpf srrm3 mutant with a 30dpf srrm3 het. Is this correct? The methods seem to indicate that this decision was made to account for a slower growth rate of homozygous srrm3 mutant fish. However, the difference in age is potentially a major confound that could impact the way that srrm3 mutants interact with hets and the way that srrm3 mutants interact with one another (lower spread for the ratio of neighbour in front value, higher distance to neighbour value). This reviewer suggests testing het-het behavior at 3 months to provide age-matched comparisons for del-del, testing age-matched rather than size-matched het-del behavior, and also suggests mentioning this in the main text / within the figure itself so that readers are aware of the potential confound.

      Thank you for bringing up this point. For the tests shown in Figure 5, we indeed decided to match the pairs involving srrm3 mutant fish by fish size since we reasoned this would be more comparable to the other lines, both biologically and methodologically (in terms of video tracking, etc.). However, we are confident the results would be very similar if matched by age, since the differences in social interactions between the srrm3 homozygous mutants and their control siblings are very dramatic at any age. As an example, this can be appreciated, in line with the Reviewer's suggestion, in Videos S2 and S3, which show groups of five 5 mpf fish that are either srrm3 mutant or wild type. It can be observed that the behavior of 5 mpf WT fish (Video S3) is very similar to those of 1 mpf WT fish pairs, with very small interindividual distances, while the difference with repect to the srrm3 mutant group (Video S2) is dramatic. We nonetheless agree that this decision on the experimental design should be clearly stated in the main text and figure legend and we have done so in the revised version.

      (2) Referring to srrm3+/+; srrm4-/- controls for double mutant behavior as "WT for simplicity" is somewhat misleading. Why do the authors not refer to these as srrm4 single mutants?

      This comment applies to Figure 4 as well as the associated figure supplements. We reasoned that this made the understanding of plots easier, but the Reviewer is correct that it can be misleading. As a middle ground, we have now changed Figure 4 to follow the nomenclature of Figure 3D (WD, HD, DD), which is further explained in the legend, but kept the original format in the figure supplements for consistency with the (many) other plots in those figures.

      (3) It's not completely clear how "neurally regulated" microexons are defined / how they are different from "neural microexons"? Are these terms interchangeable?

      Yes, they are interchangeable. We have now double checked the wording to avoid confusion and for consistency.

      (4) Overexpression experiments driving srrm3 / srrm4 in HEK293 cells are not described in the methods.

      We apologized for this omission. We now briefly describe the data and asscoiated methods in more detail in the revised version; however, please note that the data was obtained from a previous publication (Torres-Mendez et al, 2019), where the detailed methodology is reported.

      (5) Suggest including more information on how neurite length was calculated. In representative images, it appears difficult to determine which neurites arise from which soma, as they cross extensively. How was this addressed in the quantification?

      We have added further details to the revised version. With regards to the specific question, we would like to mention that this has not been a very common issue for the time points used in the manuscript (10 hap and 24 hap). At those stages, it was nearly always evident how to track each individual neurite. Dubious cases were simply ignored and not measured, as we aimed for 100 neurites per well. Of course, such complex cases become much more common at later time points (48 and 72 hap), which were not used in this study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores in zebrafish the impact of genetic manipulation of individual microexons and two regulators of microexon inclusion (Srrm3 and Srrm4). The authors compare molecular, anatomical, and behavioral phenotypes in larvae and juvenile fish. The authors test the hypothesis that phenotypes resulting from Srrm3 and 4 mutations might in part be attributable to individual microexon deletions in target genes.

      The authors uncover substantial alterations in in vitro neurite growth, locomotion, and social behavior in Srrm mutants but not any of the individual microexon deletion mutants. The individual mutations are accompanied by broader transcript level changes which may resemble compensatory changes. Ultimately, the authors conclude that the severe Srrm3/4 phenotypes result from additive and/or synergistic effects due to the de-regulation of multiple microexons.

      Strengths:

      The work is carefully planned, well-described, and beautifully displayed in clear, intuitive figures. The overall scope is extensive with a large number of individual mutant strains examined. The analysis bridges from molecular to anatomical and behavioral read-outs. Analysis appears rigorous and most conclusions are well-supported by the data.

      Overall, addressing the function of microexons in an in vivo system is an important and timely question.

      Weaknesses:

      The main weakness of the work is the interpretation of the social behavior phenotypes in the Srrm mutants. It is difficult to conclude that the mutations indeed impact social behavior rather than sensory processing and/or vision which precipitates apparent social alterations as a secondary consequence. Interpreting the phenotypes as "autism-like" is not supported by the data presented.

      The Reviewer is absolutely right. It was not our intention to imply that these social defects should be interpreted simply as autistic-like. It is indeed very likely that the main reason for the social alterations displayed by the srrm3 mutants is their impaired vision. We have now added this discussion point explicitly in the revised version. 

      Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. The inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility, and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identify transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      (4) Overall this is a study with a robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

      Thank you very much for your positive comments to our manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor Suggestions

      (1) Axes are often scaled differently even between panels in the same figure. For example in Figure 5 - supplement 10, the srrm3_17 y axis scales from 0-20, while the neighboring panels scale from ~1-2.5. This somewhat underrepresents the finding that srrm3 mutants have much larger inter-individual distances. Similarly, in the panel above (src_1), the y-axis is scaled to include a single point around 17cm. As a result, it appears at first glance that the src_1 trials resulted in much lower inter-individual distance. Suggest scaling all of these the same to improve readability.

      While the Reviewer is certainly correct, after careful consideration we decided to have autoscaled axis to prioritize within-plot visualization (i.e. among genotypes within an experiment) than across plots (i.e. among experiments and lines).

      (2) Attention to italicizing gene names.

      Thanks.

      (3) In many points in the methods, we are instructed to "see below." Suggest directing the reader to a particular section heading.

      We found only one such instance, and we directed the reader to the specific section, as suggested.

      (4) In Methods, remove "in the corpus callosum." This is not an accurate descriptor for the site at which Mauthner axons cross.

      This is absolutely correct, apologies for this mistake.

      Clarify:

      (1) In the results section, "tissue-specific regulation was validated..." - suggest mentioning that this was performed in adult tissues / describe dissection in the methods.

      Added.

      (2) In the results section, the meaning of "no event ortholog" is not clear. Does this mean that a microexon does not have a human homolog? If so, suggest stating more clearly.

      Correct. We have added addition information.

      (3) In the results, the authors state that 78% of microexons are affected by srrm3/4 loss-offunction. Suggest stating the method used here (e.g. RNA-seq in mutants as compared to siblings)

      Added.

      (4) It is not clear what "siblings for the main founders means" for example in 3D. Is this effectively the analysis of microexon knockouts across multiple independent lines? Are the lines pooled for stats, for example in 3C?

      The main founder correspond to that listed as _1 and as default for experiments when only one found is used. We now explicitely state this.  

      For 3C, the lines are not pooled for stats; the stats correspond only to the main founder for each line. However, for each main founder line, multiple experiments are usually analyzed together and the stats are done taking their data structure into account (i.e. not simply pooling the values).

      (5) The purpose and a general description of NanoBRET assays should be included in the results.

      We added the main purpose of the NanoBRET assays (testing protein-protein interactions).

      (6) Specify that baseline behavior is analyzed in the light.

      Added.

      (7) In Figure 4A, adult fish are schematized being placed into a 96-well plate. Suggest using the larval diagram as in Figure 6 for accuracy.

      Done.

      (8) In Figure 4, plot titles could be made more accessible, especially in 4 F. Suggest removing extraneous information / italicizing gene names, etc. In G, suggest writing out Baseline, Dark, and Light to make it more accessible. Same in 4B.

      We have implemented some of the suggestions. In particular, italics were not used, since we are referring to the founder line, not the gene.

      (9) Figure 6 legend B - after (barplots), suggest inserting the word "and", to make clear that barplots indicate host gene *and* closely related paralogs are indicated by dots.

      Done.

      (10) In methods: "To better capture all microexons..." This sentence is difficult to understand. Suggested edit: "we excluded *from our calculation?* tissues with known or expected partial overlap... from comparison (for example, ...).

      Done.

      (11) In the methods, "which were defined with similar parameters but -min_rep 2." Suggest spelling this out, e.g. "with similar parameters, but requiring sufficient read coverage in at least n=2 samples per valid tissue group, whereas we only required one.".

      Done.

      (12) RNA was extracted for event and knockout validations. What does event mean here?

      Event refers to the validation of the exon regulatory pattern in WT tissues. We added this information.

      Provide definitions for abbreviations:

      (1) (Figure 6) Delta corrected VST Expression.

      Done.

      (2) "Mic-hosting genes" paralogs.

      Done.

      (3) In Figure 1F, "emic" is not defined.

      Done.

      Misspellings:

      All corrected.

      (1) Figure 6B (percentile is spelled percentil).

      (2) Figure 6B legend (bottom or top decile*).

      (3) Figure 6D - Schizophrenia* genes.

      (4) In Zebrafish husbandry and genotyping: suggest "srrm3 mutants grew more slowly.".

      (5) In results, "reduced body size at 90pdf" > 90dpf.

      Reviewer #2 (Recommendations for the authors):

      (1) Characterization of microexon mutants (Figure 2): The semi-quantitative PCR with flanking primers (Figure 2, supplement1) is well-suited to assess successful deletion of the exon and enables detection of potential mis-splicing around the alternative segment. However, it does not quantify the impact on total transcript levels. The authors should complement those experiments with qPCR measures of the transcript levels - otherwise, it is difficult to link mutant phenotypes to isoforms (as opposed to alterations in the level of gene expression). This point is somewhat addressed in Figure 6 by the RNA Seq analysis but it might help to add data specifically in Figure 2.

      As the Reviewer says, this point is explicitely addressed in Figure 6, where were show the change in the host gene's expression that follows the the removal of some microexons. We prefer to keep this in Figure 6, for consistency, as we believe this is not a direct (regulatory) consequence of the removal, but more likely a compensation effect.

      (2) Social behavior alterations in juvenile fish: The authors report "increased leadership" in Srrm3 mutant fish. However, these fish have impaired vision. Thus, "increased leadership" may simply reflect the fact that they do not perceive their conspecifics and, thus, do not follow them. The heterozygous conspecific will then mostly follow the Srrm3 mutant which appears as the mutant exhibiting an increase in leadership. Figure 5D suggests that Srrm3 del and het fish have the same ratio of "neighbor in front" which would be consistent with the hypothesis that the change in this metric is a consequence of a loss of following behavior due to a loss of vision. The authors should either adjust the discussion of this point or assess with additional experiments whether this is indeed a "social phenotype" or rather a secondary consequence of a loss of vision.

      The Reviewer is absolutely correct, and we have thus modified the short discussion directly related to these patterns.

      (3) The discussion centers on potential reasons why only mild phenotypes are observed in the single microexon mutants. One caveat of the phenotypic analysis provided in the manuscript is that it does not very deeply explore the phenotypic space of neuronal morphologies or circuit function. The behavioral and anatomical read-outs are rather coarse. There are no experiments exploring fine-structure of neuronal projections in vivo or synapse number, morphology, or function. Moreover, no attempts are made to explore which cell types normally express the microexons to potentially focus the loss-of-function analysis to these specific cell types. Of course, such analysis would substantially expand the scope of a study that already covers a large number of mutant alleles. However, the authors may want to add a discussion of these limitations in the manuscript.

      The Reviewer is correct. We aimed at covering this when referring to "(i) we may not be assessing the traits that these microexons are impacting, (ii) we may not have the sensitivity to robustly measure the magnitude of the changes caused by microexon removal". We have now added some of the specific points raised by the Reviewer as examples.

      (4) Note typos in Figure 6D: "schizoFrenia", "WNT signIalling"

      Done.

      Reviewer #3 (Recommendations for the authors):

      I only have a few minor suggestions for the authors.

      (1) It is interesting that a not insignificant number of microexon deletions (3/21) result in cryptic inclusions of intron fragments, and perhaps alludes to an as yet unreported molecular function of microexons in the regulation of host gene expression. Is it possible that microexon inclusion in these 3 genes could be important for expression? I think this requires some further discussion, as (if I'm not mistaken) microexons have thus far only been hypothesised to act as modulators of protein function, not as gene regulatory units.

      While we see that microexon removal can impact expression of the host gene (Figure 6), this is likely a compensatory mechanism (or so we suggest). We do not think these three cases are related to a putative physiological regulation, since the cryptic exons appear only in the deletion line. On the contrary, we think these are "regulatory artifacts" that originate in the nonWT mutated context. I.e. we removed the exon but some splicing signals remained in the intron, which are then recoginized by the spliceosome that incorrectly includes a different piece of the intron.

      (2) The flow of the text accompanying the molecular investigation of microexon function for evi5b and vav in Figure 3 could be improved. The text currently fades out with a speculative explanation for the lack of evi5b interaction phenotype. This final sentence could be moved to the discussion and replaced with a more general summary of the data.

      We have now swapped the order in which these results are described and leave out the discussion about evi5b's microexon function.

      (3) Is this a co-submission with Calhoun et al? If so, both papers should reference each other in the discussion and discuss the relative contributions of each.

      Done

      (4) "1 × 104 cells" in methods Nanobret paragraph should be superscript.

      Done

    1. Author response:

      We thank the reviewers for their primarily positive comments and the critiques about where the manuscript could be improved. We agree with the vast majority of points raised. In our revised submission, we will:

      • Clarify some of the wording such as “unified mechanism” so that our intended meaning is clear to all readers

      • Completely change figure 2, as we accept the critique that an X-Y plot is not the logical way to present this concept

      • Amend the legends of figures 1 and 3 so that the disease pathways we are attempting to illustrate are clear for all readers

      • Expand on the genetic interactions between humans and TB and cite the manuscripts suggested

      • Add further discussion on multiple disease endotypes, and the immunological events that may lead to these distinct end points, along with how this may inform treatment stratification approaches

      • Extend the discussion about trained immunity

      • Make specific changes to address each of the reviewers’ points in the recommendations to authors

      • In the minority of cases where we feel a change is not necessary, we will justify this in our response to reviews

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations. 

      Strengths: 

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population. Their revised manuscript appropriately tempers any claims about specific plasticity mechanisms involved. 

      Weaknesses: 

      Although the single cell analyses in this manuscript are comprehensive, questions about how holographic stimulation impacts population coding are left to future manuscripts, or perhaps re-analyses of this unique dataset. 

      Reviewer #2 (Public review): 

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTseltuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role. 

      The authors have successfully controlled for potential artefacts resulting from their optogenetic stimulation. This study is therefore pioneering in the field of the auditory cortex (AC), as it is the first to use single-cell optogenetic stimulation to explore the functional organization of AC circuits in vivo. The conclusions of this paper are very interesting. They raise new questions about the mechanisms that could underlie such a rebalancing process. 

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such approach is complex and requires precise controls to be convincing. The authors provide important controls to demonstrate the precise ability of their optogenetic methods. In particular, holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such effect, the authors have decoupled the imaging and the excitation planes, and checked for the absence of out-of-focus unwanted excitation (Suppl Fig1). 

      (2) In the auditory cortex, assemblies of cells with similar pure-tone selectivity are linked together not only by their ability to respond to the same sound, but also by other factors. This study clearly shows that such assemblies are structured in a way that maintains a stable global response through a rebalancing process. If a group of cells within an assembly increases its response, the rest of the assembly must be inhibited to maintain the total response. 

      One surprising result is the clear boundary between assemblies: a rebalancing process occurring in one assembly does not affect the response in another assembly comprising cells tuned to a different frequency. However, this is slightly challenged by the data shown in Figure 3. 

      Figure 3B-left, for example, shows that, compared to controls, non-target 16 kHzpreferring neurons only decrease their response to a 16 kHz pure tone when the cells targeted by the opto stimulation also prefer 16 kHz, but not when the targeted cells prefer 54 kHz. However, the inverse is not entirely true. Again compared to controls, Figure 3B (right) shows that non-target 54 kHz-preferring neurons decrease their response to a 54 kHz pure tone when the targeted cells also prefer 54 kHz; however, they also tend to be inhibited when the targeted cells prefer 16 kHz. 

      The authors suggest this may be due to the partial activation of 54 kHz-preferring cells by 16 kHz tones and propose examining the response of highly selective neurons. The results are shown in Figure 3F. It would have been more logical to show the same results as in Figure 3B, but with the left part restricted to highly 16 kHz-selective cells and the right part to highly 54 kHz-selective cells. However, the authors chose to pool all responses to 16 kHz and 54 kHz tones in every triplet of conditions (control, opto stimulation on 16 kHz-preferring cells and opto stimulation on 54 kHz-preferring cells), which blurs the result of the analysis. 

      We thank reviewers for highlighting the strengths of our work and providing valuable feedback. We further developed our manuscript mainly from Reviewer 2’s point on the overall effect explained as the main result. One of the main reasons why we chose to pool all tone preferring cells instead of highly selective cells was to ensure that the observed effect not necessarily driven by only a small group of neurons but rather that the effect was driven at the population level, especially at a subject level for Figure 3B. While Figure 3F represents how highly selective cells to each frequency play a major role in the effect, we now have added additional results with only highly selective neurons as Supplementary Figure 3. The left panel shows restricting the population to highly selective neurons to 16 kHz and the right panel restricting the population to highly selective neurons to 54 kHz at cell population level to emphasize the result (Supplementary Figure 3). 

      We appreciate an additional raised point by Reviewer 1 regarding the stimulation effect on population coding. Our primary focus in this manuscript was to establish single cell level effects of holographic stimulation, and we believe that population coding analyses would benefit from a more cell-type-specific approach. We plan to pursue such analyses in follow-up studies where cell types can be better identified and linked to network dynamics. 

      Reviewer #1 (Recommendations for the authors): 

      The authors have appropriately addressed my concerns. 

      As this dataset will be of general interest, it would be helpful to include a doi/link to their data repository in the data availability section. 

      Updating the data repository to the institution server is currently in progress. We will provide the correct doi or link as soon as it becomes available. In the meantime, we will ensure to share them with anyone who contacts to us directly. 

      Reviewer #2 (Recommendations for the authors): 

      Many references to Figures have not been updated between the two versions of the manuscript. See lines 107, 128, 297, 321 and 346. 

      We are sorry for the confusion with mislabelled figures. We now have updated all the figure numbers accordingly.

      In the paragraph beginning on line 266, there is no explicit reference to Figure 3C. 

      We now added Figure 3C reference in the main text (line 290). 

      If the new analysis includes 15 FOV for stim on 54 kHz-preferring cells, as indicated in the rebuttal, the corresponding numbers should be corrected in lines 152 and 180. 

      We now updated the number of FOVs accordingly. 

      The added model is not explained well enough. How are the calcium traces simulated? It is difficult to ascertain whether the result shown in Figure 3C is merely a trivial consequence of the hypothesis that suppression is applied to co-tuned neurons or to all neurons. 

      We are sorry for the lack of important details in the explanation of the model. We simulated time-varying sound-evoked calcium transient especially by applying different decay time constant (faster decay for co-tuned neurons and slower decay for non co-tuned neurons) to closely match the real data. More detailed explanation on this is now included in the manuscript (lines 644 – 650). Since our data do not currently allow us to identify specific cell types, we focused on modelling the stronger suppression observed in co-tuned neurons, especially by adapting the stimulation effect of target cells from the real data. In this revision, we now added data showing that ‘Randomly selected cells’ from the two groups (co-tuned or non co-tuned cell groups) did not exhibit any stimulation effect (added column in Figure 3D) to further indicate that suppression specific to co-tuned neurons is the key factor underlying the observed effects in the real data. We hope to build on this work in future studies to identify cell-type-specific effects and their computational roles. 

      Although the rebuttal clearly states that experiments are carried out on awake animals, this information is still missing from the manuscript. 

      We now stated ‘Fully awake animals’ in the experimental procedures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to whole embryo morphology that is used as evidence for convergent extension (CE) defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than Keller explants or actual cell movements in the embryo. 2) The study would benefit from high or super resolution microscopy, since in many cases the differences in protein localization are not very pronounced. 3) The IP and Western analysis data often show subtle differences, and not apparent in some cases. 4) It is not clear how many biological repeats were performed or how and whether statistical analyses were performed. 

      (1) To more objectively assess the convergent extension phenotypes, we developed a Fiji macro to automatically quantify the LWR in various injected Xenopus embryos, as detailed in the Methods section. We acknowledge that a limitation in the current manuscript is how to link our mechanistic model at the molecular level with the actual cellular behavior during convergent extension, and we plan to perform cell biological studies in the future to elucidate the link;

      (2) We have repeated some of the imaging experiments in DMZ explants using a Zeiss LSM 900 confocal equipped with Airyscan2 detector that can increase the resolution to ~100 nm. The new data are in Suppl. Fig. 4, 9, 11, 16;

      (3) We have repeated all IP and western blots at least three times and provided quantification and statistical analyses;

      (4) We have added the information on biological repeats and statistical analyses in all figures and figure legends.

      Reviewer #2 (Public Review):

      The protein localization experiments in animal cap assays are for the most part convincing, but with the caveat that the authors assume that the proteins are acting within the same cell. As Fzd and Vangl2 are thought to localize to opposite cell ends in many contexts, can the authors be sure that the effects they observe are not due to trans interactions? 

      In our previous publication, we provided evidence that Vangl is necessary and sufficient to recruit Dvl to the plasma membrane within the same cell (Figure 3 in 10.1093/hmg/ddx095). In a more recent publication ( 10.1038/s41467-025-57658-0 ), we further elucidated a mechanism through which Dvl oligomerization switches its binding from Vangl to Fz, and determined that Dvl binding to Vangl and Fz are differentially mediated by its PDZ and DEP domain, respectively. In the current manuscript, we also performed co-IP experiment under various conditions to demonstrate binding between Dvl and Vangl. We feel that these evidences together provide a strong argument for our model where Vangl2 acts within the same cell to sequester Dvl from Fz.

      In regards to the Dvl patches induced by Wnt11 (Fig. 3 and Suppl. Fig. 9), we performed separate injection of EGFP- and mSc-tagged Dvl into adjacent blastomeres, and demonstrated that the Wnt11-induced patches arise from symmetrical accumulation of Dvl at contact of two neighboring cells (Suppl. Fig. 9a-c’). This scenario is different from epithelial PCP where Fz/Dvl and Vangl/Pk are asymmetrically accumulated at the contact between two adjacent cells.

      The authors propose a model whereby Vangl2 acts as an adaptor between Dvl and Ror, to first prevent ectopic activation of signaling, and then to relay Dvl to Fzd upon Wnt stimulation. This is based on the observation that Ror2 can be co-IPed with Vangl2 but not Dvl; and secondly that the distribution of Ror2 in membrane patches after Wnt11 stimulation is broader than that of Fzd7/Dvl, while Vangl2 localizes to the edges of these patches. The data for both these points is not wholly convincing. The co-IP of Ror2 and Vangl2 is very weak, and the input of Dvl into the same experiment is very low, so any direct interaction could have been missed. Secondly, the broader distribution of Ror2 in membrane patches is very subtle, and further analysis would be needed to firm up this conclusion. 

      (1) We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b). Whereas this data confirms our previous conclusion, we acknowledge that a negative data does not fully exclude the possibility for direct biding between Ror and Dvl.

      (2) We re-analyzed the signal intensity of Dvl and Ror in Wnt11-induced patches. By quantifying the intensity ratio between Ror and Dvl along the patches, we found an increase over two folds at the border of the patches (Fig. 7j, bottom panel). We interpret this data to suggest that Ror is accumulated to a higher level than Dvl at the patch borders.     

      A final caveat to these experiments is that in the animal cap assays, loss of function and gain of function both cause convergence and extension defects, so any genetic interactions need to be treated with caution i.e. two injected factors enhancing a phenotype does not imply they act in the same direction in a pathway, in particular as there are both cis/trans and positive/negative feedbacks between the PCP proteins. 

      We agree with the reviewer that a difficulty in studying PCP/ non-canonical signaling is that both loss and gain of function of any its components can cause convergence and extension defects. Genetic interactions, especially synergistic interactions, should be interpreted with caution. But we do want to point out that, in a number of case, we were also able to demonstrate epistasis. For instance, we found that Dvl2 over-expression induced CE defects can be rescued by Pk over-expression (Fig. 1e and f), whereas Vangl/ Pk co-injection induced severe CE defects can be reciprocally rescued by Dvl2 over-expression (Fig. 1g). Likewise, we showed that Fz2/ Dvl2 co-injection induced CE defects can be rescued by wild-type Vangl2 but not Vangl2 RH mutant (Suppl. Fig. 6b), and Ror2 can rescue Vangl2 overexpression induced CE defect (Suppl. Fig. 14). Collectively, these functional interaction data consistently demonstrate an antagonism between Dvl/ Fz/ Ror2 and Vangl2/ Pk, which is correlated with our imaging and biochemical studies.

      As you can see from the reviews, the referees generally agree that your paper is a potentially valuable contribution to the field. Your observations are important because of the novel model based on the inhibitory feedback regulation between planar cell polarity (PCP) protein complexes. However, the reviewers also stated that the model is only partly supported by data because of insufficient clarity and missing controls in several experiments supporting the proposed model. The paper would be significantly improved if your conclusions are backed up by additional experimentation. Specifically, the referees wanted to see the reproducibility of the results shown in Figures 3, 4, 8, S3, S7, S12. 

      We hope that you are able to revise the paper along the lines suggested by the referees to increase the impact of your study on the current understanding of PCP signaling mechanisms. 

      We thank the reviewers for careful reading of our manuscript and for their constructive critiques and suggestions. We have repeated the animal cap studies in original Figures 3, 4, 8 and S3 with DMZ explants, and the new data are in Supplementary Fig. 9, 11, 16 and 4, respectively. We also repeated the biochemical studies in original Figure S 7and 12, and the new data are in Supplementary Fig. 8 and 15.

      Reviewer #1 (Recommendations For The Authors):

      Major points:(1) The author conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). To validate the model proposing that 'non-canonical Wnt induces Dvl to transition from Vangl to Fz, while PK inhibits this transition, and they function synergistically with Vangl to suppress Dvl during Convergent Extension (CE),' it is crucial to assess the subcellular localization of PCP core proteins in dorsal marginal zone (DMZ) cells, which are known to undergo CE. Notably, the overexpression of Wnt11 alone, as employed by the author, does not induce animal cap elongation. Therefore, the use of animal cap explants may not be sufficient to substantiate the model during Convergent Extension (CE). Indeed, previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants. However, the results presented in this manuscript appear to differ from this established understanding. Consequently, to provide more robust support for the proposed model, it is advisable to replicate the key experiments (Figures 3, 4, 8, and Figure S3) using DMZ explants. 

      We repeated the experiments in Figure 3, 4, 8 and Figure S3 with DMZ explant and the new data are in new Supplementary Fig. 9, 11, 16 and 4, respectively.In regards to “previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants”, we are aware Vangl/ Pk localization to the anterior cell cortex in neural epithelium from the studies by the Sokol and Wallingford labs, but are not aware of similar reports in DMZ explants. When we examined the localization of small amount of injected EGFP-mPk2 (0.1 ng mRNA) in DMZ explants, we saw a somewhat uniform distribution on the plasma membrane (Suppl. Fig. 4). In addition, in a related recent publication, we examined endogenous XVangl2 protein localization in activin induced animal cap explants that do undergo CE. What we observed was that whereas low level injected Dvl2 and Fz form clusters on the plasma member, endogenous XVangl2 remains uniformly distributed on the plasma membrane (Suppl. Fig. 3S-Z in 10.1038/s41467-025-57658-0 ). These observations may suggest potential differences of PCP protein localization during neural vs. mesodermal convergence and extension.

      (2) The author suggests that 'Vangl2 and Pk together synergistically disrupt Fz7-Dvl2 patches.' As shown in Figure 4 (panels J' to I'), it is evident that the co-expression of Pk and Vangl2 increases Fz7 endocytosis. Nevertheless, a significant amount of Fz7 still co-localizes with Dvl2. To strengthen the author's hypothesis, additional clear assay is required such as Fluorescence resonance energy transfer (FRET) assay. 

      We appreciate this valuable advice. Since none of the tagged Fz/ Dvl/ Vangl proteins we had were suitable for FRET, we made proteins tagged with mClover and mRuby2, which were reported as optimized FRET pairs. But in our hands mRuby2 seems to require very long time (~2 days) to mature and become detectable at room temperature, and is not suitable for our Xenopus experiments. We are in the process of establishing a luciferase based NanoBiT system to detect Fz-Dvl and Dvl-Vangl interactions in live cells and cell lysates, and will use it in future studies to investigate their interaction dynamics.

      For the current manuscript, we reason that a substantial reduction of Fz7-Dvl2 clusters with Vangl2/ Pk co-injection would still support our idea that Vangl2 and Pk act synergistically to sequester Dvl from Fz to prevent their clustering in response to non-canonical Wnt ligands.

      (3) The IP data is less clear and evident. A couple of examples are: a) Fig 2g where the authors report that the Vangl2 R177H variant reduced Vangl2 interaction with Pk and recruitment of Pk to the plasma membrane, but it appears that the variant interacts slightly better than WT Vangl2 with Pk. In Fig. S7a, the authors state that Pk overexpression can indeed significantly reduce Wnt11-induced dissociation of EGFP-Vangl2 and Flag-Dvl2 in the DMZ. However, there is a minimal impact when compared to the Wnt11 absent control. Based on the results presented in Fig S12a the authors indicate that Wnt11 reduces the association between Vangl2 and Dvl2, which can be discerned, but loss of Ror2 does not change this in any obvious way - but the authors indicate it does. In S12b, the authors have suggested that Ror and Dvl do not form a direct binding interaction. However, the interpretation of Figure S12b is not entirely convincing due to several issues. Notably, the expression levels of each protein appear inconsistent, the bands are not sufficiently clear, and there is the detection of three different tag proteins on a single blot. To strengthen the validity of these findings, it is advisable to repeat this experiment with improved quality. 

      We repeated all the co-IP and western blot analyses pointed out by the reviewer, and performed quantification and statistical analyses.

      Fig 2g had a mistake in the labeling and is replaced with new Figure 2g;

      Fig. S7a is replaced by new data in Supplementary Figure 8a and b;

      Fig. S12a and 12b are replaced by new data in Supplementary Figure 15a, a’ and b, respectively. In 15a and a’, we noticed a consistent decrease of Dvl2-Vangl2 co-IP in Xror2 morphant. The reason for this is not yet clear and will need further study in the future.

      Minor points: (1) In all the whole embryo injection assays examining morphology, no Western analysis is performed to show roughly equivalent and appropriate levels of the various proteins are being expressed. Differences will affect the data. 

      Although we did not do western analyses to examine the protein levels in various functional interaction assays, we did examine how co-expression of Vangl2, mPk2 or Dvl2 may impact each other’s protein levels in Supplementary Fig. 2, which did not reveal any significant change when co-injected in different combination.

      (2) The author's prior publication (Bimodal regulation of Dishevelled function by Vangl2 during morphogenesis, Hum Mol Genet. 2017) presented clear evidence of Vangl2 overexpression inducing Dvl2 membrane localization. However, Figure S4 in the current manuscript did not provide clear evidence of membrane localization. To strengthen the hypothesis that Vangl2-RH mutant also induces Dvl2 membrane localization, further comprehensive imaging analysis is needed. 

      We re-analyzed the imaging data and replaced old Figure S4 with a new Supplementary Fig. 5.

      (3) In Supplementary Figure 9, the authors propose that the overexpression of Vangl2/Pk induces Fz7 endocytosis, as indicated by its co-localization with FM4-64. However, it raises a question: how does the Fz7-GFP protein internalize into the cells without endocytosis, as seen in Figures S9a-c'? To enhance readers' understanding, a discussion addressing this point should be included. 

      We think that this might be a technical issue. As detailed in the Method section, we only incubated the embryos transiently with FM4-64 for 30 minutes, and the embryos were subsequently washed and dissected in 0.1X MMR without the dye. Therefore, only the Fz7-GFP protein endocytosed during the 30 minute-incubation would be labeled by FM-64, whereas that endocytosed before or after the incubation would not. Alternatively, the very few Fz7-GFP puncta occasionally observed in the absence of Vangl2/Pk overexpression could be vesicles trafficking to the plasma membrane.

      (4) Statistical analyses are absent for several results, including those in Figure 2f, Figure S4d, and Figure S7b. 

      We repeated these experiments and included statistical analyses. The new data are in Figure 2f, Supplementary Fig. 5d and Supplementary Fig. 8b.

      (5) This manuscript lacks any results regarding Ck1. Therefore, it is advisable to consider removing the discussion or mention of CK1. 

      We agree, and tune down the discussion on CK1 and removed CK1 from our model in Fig. 9.

      Reviewer #2 (Recommendations For The Authors):

      (1) In all the convergence and extension assays, the authors should report n numbers (i.e. number of animals), what statistical test is used, and what the error bars show. Ideally dot-plots would be used instead of bar charts as they give a better insight into the data distribution. It might be useful to give a section on the statistical analyses used in the M&M, including e.g. any power calculations carried out, as now required by many journals. 

      We have follow the advice to use dot-plots for all the quantification analyses in the manuscript. We include in the figure legends the statistical test used and what the error bars show. The number of embryos analyzed were included in each panel in the figures. We also provided more details in the Methods section on how the LWR quantification was carried out.

      (2) I think Figure 2g is wrongly labelled? FLAG bands are in all three lanes in the western blot, but not labelled as such in the schematic. 

      We corrected the schematic labeling in Figure 2g, and thank the reviewer for catching this mistake.

      (3) In Figure S7, the authors show that co-IP of Dvl and Vangl2 is reduced by Wnt11 and the effects of Wnt are blocked by Pk. Does Pk have any effect in the absence of Wnt? 

      We examined the effect of Pk over-expression on Dvl2-Vangl2 co-IP as advised, and did not see a significant impact in the absence of Wnt11 co-injection. The data is included in the new Supplementary Figure 8a. We interpret the data to suggest that “at least under the condition of our co-IP experiment, Pk may not directly impact the steady-state binding between Vangl and Dvl”.

      (4) In Figure 3, the authors show (as published previously) that Wnt11 induces patches of Dvl at the plasma membrane. It would be useful to see Dvl in the absence of Wnt and Vangl2/Dvl in the absence of Wnt. 

      Dvl is widely known as a cytoplasmic protein and its localization has been published by many labs over the past 20-30 years. In our recent publication (10.1038/s41467-025-57658-0 ), we also re-examined Dvl localization when injected at various dosages. So we did not feel it was necessary to show its localization in the absence of Wnt11 again, but included a reference to our prior publication. In regards to Vangl/Dvl distribution in the absence of Wnt11, the readers can see Suppl. Fig. 5b as an example, in addition to our previous publications referenced in the manuscript.

      (5) In the review figures, the difference in Fz7-GFP patch formation in d' and e' (vs e.g. a') is not very clear. Could the images be improved or (better) quantified in some way? 

      We assume that “review figures” refer to Figure 3 or 4? If so, we felt that Fz7-GFP patch formation was clear in Fig. 3d’, e’ or Fig. 4d’, e’. Nevertheless, we repeated these experiments in DMZ explants as advised by Reviewer 1, and additional examples of Fz7-EGFP patch formation can be seen in the new Suppl. Fig. 9d-f’ and Suppl. Fig. 11d-f’.

      (6) In Figure 6d, I'm concerned that the loss of flag-Dvl2 might occur via dephosphorylation in the IP reaction. Also the M&M don't include methodological details about buffers and whether phosphatase inhibitors were used. A compelling control would be anti-FLAG pulldown showing retention of phosphorylation. Also Figure 6f shows a reduced ratio of fast-to-slow migrating bands of Dvl with Vangl2/Pk - unless I have misunderstood, is this ratio the wrong way round? 

      We added co-IP buffer and protease inhibitor information in Methods.

      We agree that the concern about dephosphorylation during IP reaction is valid, and that direct pull down of Dvl to show the phosphorylated form is a compelling control. We therefore note that in Suppl. Fig. 8a and 15b, direct pull down of Flag-Dvl or Myc-Dvl (with anti-Flag or anti-Myc) did show the slower migrating, phosphorylated form. Additional examples in which Vangl only co-IP the faster migrating unphosphorylated Dvl include Suppl. Fig. 15a, and in a related paper we published recently (Fig. 3R and R’ in 10.1038/s41467-025-57658-0 ).

      Finally, we did wrongly label Figure 6f in the last submission, and the ratio should have been “slow/fast”. We have made the correction, and appreaicte the reviewer for the meticulousness in perusing our manuscript.

      (7) In Figure 7, what does Ror2 look like in the absence of Wnt11? 

      We included new Figure 7a-c to show that without Wnt11 co-injection, Ror2 is uniformly distributed on the plasma membrane.

      (8) Also in Figure 7, Ror2 patches are said to be slightly wider than Dvl2 patches "reminiscent of Vangl2" - I wouldn't describe them as being similar. Vangl2 shows a distinct dip in the center of the Dvl patches, Ror2 does not show a dip, and is only (at best) in a slightly wider patch, and I would want to see further examples to be convinced that the localization domain is reproducibly wider. The merge of many samples in 7d may actually be making the distribution harder to see and if the Xror2 and Dvl2 intensities were normalized I'm not sure how different the curves would appear. (i.e. the Xror2 curve looks like a flattened version of the Dvl2 curve). 

      We have added an additional panel in the new Figure 7j to compare the intensity ratio of Ror/ Dvl2 along the patches, and this analysis reveals an over two folds increase of the ratio at the border region. This quantification may make a more convincing argument that at the patch border region, Dvl is diminished whereas Ror2 accumulate with Vangl2. 

      (9) In Figure S12a, the authors suggest Wnt11 induced dissociation of Dvl from Vangl2 (by co-IP), and this is reduced after Ror2 MO. This would be more convincing with replicates and quantitation. 

      We have repeated this experiment with Vangl2 pull down and added quantification. The data is in the new Suppl. Fig. 15a.

      (10) In Figure S12b, the authors suggest Ror2 can co-IP Vangl2 but not Dvl. This is not very convincing, as the Dvl input band is very weak, and the Vangl2 co-IP band is very weak. 

      We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b).

      (11) "Prickle" spelled "Prickel" in the abstract (and abbreviated to "PK" not "Pk" at one place in the abstract and several places in text) 

      We have corrected these typos.

      (12) Quite a lot of interesting observations are in supplemental figures. Normally it might be expected that extra data supporting a conclusion would be in supplemental, but here some of the supplemental data feels like it is more than simply additional evidence. For instance supplemental Figures 2 and 3 feel more than just supplemental (and Supplemental Figure 3 if merged with Figure 2 would make it easier for the reader). Moreover, for example, the description of the results in Figure 2 is punctuated by references to supplemental Figures 4 and 5 that contain key data to support the conclusions, which means the reader has to flick backwards and forwards from place to place in the manuscript to follow the argument. It is of course up to the authors, but in some cases putting supplemental data back into the main figures (for which there is no size or number limit) would increase clarity. 

      These are excellent points; in the resubmitted manuscript we have a total of 24 data figures, and we used 8 as main figures since we felt that they provide the most relevant and conclusive evidence to our model. We will consult the copy editors at eLife on how to arrange the rest as main vs. supporting figures when requesting publication as version of record.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      This study investigates plasticity effects in brain function and structure from training in navigation and verbal memory.

      The authors used a longitudinal design with a total of 75 participants across two sites. Participants were randomised to one of three conditions: verbal memory training, navigation training, or a video control condition. The results show behavioural effects in relevant tasks following the training interventions. The central claim of the paper is that network-based measures of task-based activation are affected by the training interventions, but structural brain metrics (T2w-derived volume and diffusion-weighted imaging microstructure) are not impacted by any of the training protocols tested.

      Strengths:

      (1) This is a well-designed study which uses two training conditions, an active control, and randomisation, as appropriate. It is also notable that the authors combined data acquisition across two sites to reach the needed sample size and accounted for it in their statistical analyses quite thoroughly. In addition, I commend the authors on using pre-registration of the analysis to enhance the reproducibility of their work.

      (2) Some analyses in the paper are exhaustive and compelling in showcasing the presence of longitudinal behavioural effects, functional activation changes, and lack of hippocampal volume changes. The breadth of analysis on hippocampal volume (including hippocampal subfields) is convincing in supporting the claim regarding a lack of volumetric effect in the hippocampus.

      Weaknesses:

      (1) The rationale for the study and its relationship with previous literature is not fully clear from the paper. In particular, there is a very large literature that has already explored the longitudinal effects of different types of training on functional and structural neuroimaging. However, this literature is barely acknowledged in the Introduction, which focuses on cross-sectional studies. Studies like the one by Draganski et al. 2004 are cited but not discussed, and are clumped together with cross-sectional studies, which is confusing. As a reader, it is difficult to understand whether the study was meant to be confirmatory based on previous literature, or whether it fills a specific gap in the literature on longitudinal neuroimaging effects of training interventions.

      We thank the reviewer for these comments and feedback. 

      We want to clarify that through our pre-registered analysis plan, our approach was confirmatory, rather than exploratory (or rather than post-hoc justified.) This confirmatory approach allowed us to critically evaluate the theoretically novel and important hypotheses which tested what no other study like our longitudinal/intervention study proposed or performed previously. We have now clarified this in the introduction. 

      This allowed us to address the following novel theoretical questions: 1) what neural changes, if any, result from an intensive within-participant intervention that improves memory or navigation skills in healthy young adults 2) if such changes occur, what is the degree of neural overlap between the acquisition of these cognitive skills.”

      “We pre-registered three novel and specific hypotheses, which are described in more detail here (https://osf.io/etxvj) ”

      We have also attempted to better separate cross-section and longitudinal studies. Due to space limitations, we have focused on interventional studies that involved gray matter changes that could relevance to either navigation, episodic memory, or the hypothesized time frame we chose for the training. We also note that some of these relevant studies are discussed in more depth in the discussion.

      “Successful cognitive interventions suggest that targeted within-participant cognitive training, even for as little as 1-2 weeks, can result in improvements to specific cognitive functions, including changes in focal gray matter [4,23-27]; but see[28].”

      We have also added some additional citations to relevant cognitive intervention work, although we agree that this is an extensive literature, only a subset of which we are able to capture here:

      “In some instances, interventions may even generalize to areas not explicitly trained but closely related to the training (termed “near transfer”)[29-33].”

      (2.1) The main claim regarding the lack of changes in brain structure seems only partially supported by the analyses provided. The limited whole-brain evidence from structural neuroimaging makes it difficult to confirm whether there is indeed no effect of training. Beyond hippocampal analyses, many whole-brain analyses of both volumetric and diffusion-weighted imaging metrics are only based on coarse ROIs (for example, 34 cortical parcellations for grey matter analyses).

      Although vertex-wise analyses in FreeSurfer are reported, it is unclear what metrics were examined (cortical thickness? area? volume?). 

      We appreciate the reviewer’s thoughtful feedback. We apologize for the lack of clarity in the original manuscript regarding the type of metric used in the vertex-wise analysis. We confirm that these analyses were based on cortical volume, not thickness or area. To clarify this, we have explicitly stated in the revised Methods that the vertex-wise analyses were conducted on cortical volume using FreeSurfer’s mri_glmfit.

      In addition, in response to the concern regarding the coarse nature of the ROI-based analyses, we have re-analyzed the volumetric data using the more fine-grained Destrieux atlas, which contains 148 cortical ROIs (74 per hemisphere), instead of the original, coarser 34-region atlas. These more detailed analyses still revealed no significant volume changes from pre- to post-training in any of the three groups. We believe this provides stronger support for the lack of training-induced volumetric changes outside the medial temporal lobe.

      Relevant revisions have been made to the Results and Methods sections. Below is the updated content added to the manuscript:

      In Results:

      “We also analyzed gray matter volume changes outside of the medial temporal lobe using FreeSurfer (see Methods) to determine if any cortical or other relevant brain areas might have been affected by the training. We applied a vertex-wise analysis of cortical volume, again finding no significant differences across the entire cortex (see Methods). This finding was further validated using the Destrieux atlas, which includes 74 cortical parcellations per hemisphere (148 ROIs in total). Paired-sample t-tests revealed that none of the ROIs exhibited significant volume changes from pre- to post-test in any of the three groups (all ps > 0.542, FDR-corrected). These findings suggest that training did not result in any measurable cortical volumetric changes.”

      In Methods:

      “Whole-brain structural analyses were conducted using FreeSurfer (version 7.4.1; https://surfer.nmr.mgh.harvard.edu). T1-weighted anatomical images were processed using the longitudinal processing pipeline. Vertex-wise analyses of cortical volume were performed using FreeSurfer’s general linear modeling tool, mri_glmfit. Group-level comparisons were corrected for multiple comparisons using mri_glmfit-sim, which implements cluster-wise correction based on Monte Carlo simulations. A vertex-wise threshold of Z > 3.0 (corresponding to p < 0.001, two-sided) was applied to detect both positive and negative effects. Clusters were retained if they survived a cluster-wise corrected p < 0.05.

      In addition to vertex-wise analysis, cortical parcellation was performed using the Destrieux atlas (aparc.a2009s), which includes 74 cortical regions per hemisphere, yielding 148 ROIs in total. To account for variability in brain size, each ROI volume was normalized by estimated intracranial volume (ICV) and scaled by a factor of 100. Longitudinal comparisons were conducted using paired-sample t-tests. To correct for multiple comparisons, we applied FDR correction (q < 0.05).”

      (2.2) Diffusion-weighted imaging seems to focus on whole-tract atlas ROIs, which can be less accurate/sensitive than tractography-defined ROIs or voxel-wise approaches.

      We appreciate the reviewer’s important point regarding diffusion-weighted imaging (DWI) analysis. We focused primarily on atlas-defined tract-level ROIs derived from a standard white matter tract atlas as we did not feel that we had the resolution for more fine-grained analyses with our sequences. While this approach has the advantage of robust anatomical correspondence and improved interpretability, we agree that it may be less sensitive than tractography-defined or voxel-wise methods for detecting more subtle, localized training-related changes. Because of limitations in our DWI sequence, which was optimized to be shorter and identical between different scanners, we are not able to provide more fine-grained analysis of the DWI data.

      (3) Quality control of images is only mentioned for FA images in subject space. Given that most analyses are based on atlas ROIs, visual checks following registration are fundamental and should be described in further detail.

      Thank you for your thoughtful comment. We agree that visual quality control is critical when using atlas-based ROI analyses. In our study, we implemented comprehensive quality control procedures across all structural and functional imaging analyses.

      For hippocampal segmentation using ASHS, we performed manual visual inspections of each participant's subfield segmentation to verify the accuracy of the automated outputs. This is now clearly described in the revised Methods section:

      “Each participant's subfield segmentations were manually inspected to ensure the accuracy and reliability of the segmentation protocol.”

      For FreeSurfer-based hippocampal and cortical segmentation, we also conducted detailed visual inspections and manual edits following the standard FreeSurfer longitudinal pipeline. We have added the following description to the Methods section to clarify this process:

      “Visual quality control was conducted by three trained raters who systematically inspected skull stripping, surface reconstruction, and segmentation accuracy at both the within-subject template and individual timepoints. Manual edits were primarily applied to the within-subject template to correct segmentation errors—particularly in challenging regions such as the hippocampus—since corrections to the template automatically propagate to all timepoints. Raters followed standardized FreeSurfer longitudinal editing guidelines to ensure consistent and reproducible corrections across subjects. Discrepancies were resolved via consensus discussion. This quality control approach enhanced the accuracy and consistency of segmentation across longitudinal scans, thereby improving the reliability of morphometric analyses and atlas-based ROI extractions.”

      For functional MRI preprocessing, all registration steps—including transformations from individual functional runs to MNI space—were visually checked for each participant to ensure accurate alignment with the Schaefer atlas. We have clarified this point in the revised Methods section with the following statement:

      “Prior to ROI extraction, all registration steps—from individual functional space to MNI space—were visually inspected for each participant to confirm accurate alignment between the functional images and the atlas parcellation.”

      These additions now more clearly reflect the robust quality control procedures that were employed throughout our pipeline to ensure the validity of atlas-based analyses.

      Recommendations for the authors:

      (1) As a reader, I would have appreciated a short section in the methods regarding the preregistration and power analysis. Currently, it is not too straightforward to understand which analyses were included in the preregistration, and at what point in the project the pre-registration was written. Finding all the relevant information from OSF is feasible, but it would be more accessible if a summary of the information were available inside the text.

      We thank the reviewer for this valuable suggestion. We agree that providing a concise summary within the manuscript's methods section will significantly improve accessibility for readers. 

      The full preregistration is now explicitly referenced in the Methods:

      Preregistration and Power Analysis

      This study was preregistered on the Open Science Framework (OSF; https://osf.io/etxvj). The preregistration was completed on October 30, 2023, after approximately 80% of data collection had been completed, but prior to any analysis of the primary outcome variables. The preregistration outlines the study hypotheses, design, target sample size, and planned behavioral and neuroimaging analyses, including longitudinal ROI comparisons and statistical correction procedures.

      A priori power analysis was conducted using G*Power 3.1 to estimate the required sample size for detecting a Group × Time interaction in a mixed-design ANOVA. Assuming a small-to-medium effect size (f = 0.35), we determined that 24 participants per group would provide 80% power to detect a significant effect at α = 0.05. To allow for potential attrition and data exclusion (e.g., due to excessive motion or incomplete datasets), we targeted recruitment of 30 participants per group across two study sites.

      All primary hypotheses, analytic plans, and inference criteria are documented in the preregistration. Exploratory analyses are clearly delineated in both the preregistration and the present manuscript.”

      (2) The relevance of the study for "disease" is mentioned in the Abstract but is absent in the Introduction. This may be worth removing?

      Thank you for pointing this out. We agree that the reference to "disease" in the Abstract was not well-supported in the Introduction. To maintain consistency and avoid overstatement, we have removed the mention of "disease" from the Abstract in the revised manuscript.

      In Abstract:

      “Training cognitive skills, such as remembering a list of words or navigating a new city, has important implications for everyday life.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      The correlation between rebound excitation and song structure (e.g., harmonic stack duration) may depend on outliers, such as birds with harmonic stacks >150ms.

      If in wild zebra finch, or even if in domesticated zebra finch including our birds and the birds from the other labs that we evaluated, the distribution of durations of longest harmonic stacks has a long tail, it is not apparent that birds with long duration harmonic stacks are properly considered as outliers. Examining the distribution of motif durations (a less derived statistic) in 33 birds (Fig. 2C) does not support the idea that birds with longer duration songs are outliers. Thus, we view the reviewer question as addressing whether there are different mechanisms operating in birds with long harmonic stacks than for other birds. Unfortunately, the numbers of long-duration harmonic stack birds are too small to give confidence in any statistical analysis of that group. Thus, we limited our re-analysis to the data excluding birds with harmonic stacks >150ms (which is arbitrary), examining how these birds influence our conclusions. We conclude that the influence of the excluded birds on the overall result is modest. The updated results are presented in Supplemental Figure 6, and the Results section has been revised to state:

      “We found that while some of the p values increased above 0.05 (p = 0.058 for rebound area vs. longest harmonic stack and p = 0.082 for sag ratio and longest harmonic stack), it remained significant for firing frequency and longest stack (Pearson’s R, p = 0.0017) and for sag ratio and motif duration (p = 0.024). However, when sag ratio was compared against the duration of the motif excluding the longest harmonic stack, there was no relationship (p = 0.85).”

      There is a disconnect between the physiological measurements and the HH model presented.

      We acknowledge that addressing this limitation would involve additional experimental and modeling assumptions. Rather than overextending our interpretations, we have clarified the limitations of the current study in the Discussion:

      “While this HH model provides a plausible framework for linking intrinsic properties to sequence propagation, it does not fully account for the observed relationship between IPs and song structure. A principal limitation constraining the current model is the absence of information for the same neurons combining characterization of both IPs and network activity during singing (or song playback), when HVC<sub>X</sub> express activity related to song features. Addressing this gap would requires additional and challenging experiments and is beyond the scope of this study.”

      Although disynaptic inhibition between HVC<sub>X</sub> neurons and between HVC<sub>RA</sub> and HVC<sub>X</sub> neurons is well established, I am not aware of any data indicating direct synaptic connections between HVC<sub>X</sub> neurons.

      This is an important theoretical point about the reliance of the intervaldetecting network model on HVC<sub>X</sub> neurons and about how the model would change if many of the HVC<sub>X</sub> were swapped for HVC<sub>RA</sub> neurons. Connections between HVC<sub>RA</sub> neurons to HVC<sub>X</sub> neurons are established, whereas there is relative paucity of evidence for HVC<sub>X</sub> to HVC<sub>X</sub> connectivity. This is based on work from Prather and Mooney, 2005 (among others) which performed paired sharp electrode recordings to characterized connections in HVC. This work found very few HVC<sub>X</sub> - HVC<sub>X</sub> connections. However, if connected HVC<sub>X</sub> neurons are physically more distant from each other than are connected HVC<sub>RA</sub> – HVC<sub>X</sub> neurons, they would more likely be missed in blind paired recordings. Using different approaches, recent results from the Roberts lab (Trusel et al.,eLife,  2025) supports the existence of robust HVC<sub>X</sub>  - HVC<sub>X</sub>  connections.

      Reviewer #2(Public Review):

      The interpretation of p-values is rigid, and near-significant results (e.g., p = 0.06) are dismissed without discussion.

      We revised the text to reflect a more nuanced and consistent interpretation of p-values and updated the reporting to include exact values. For example, the Results section now states:

      "Nonetheless, the longest syllable duration was not significantly correlated with the average sag ratio for each bird (Pearson’s R: R<sup>2</sup> = 0.12, p = 0.065, Supplemental Fig. 2, top left panel), though it is trending toward significance (see Discussion)”

      The conclusion that harmonic stacks influence intrinsic properties lacks necessary controls.

      We have attempted to further clarify that harmonic stacks were used as a representative feature of temporal song structure rather than a unique determinant of intrinsic properties. The Discussion now states:

      “Although harmonic stacks provide a useful test case for studying temporal integration, our findings suggest that IPs are broadly linked to song duration and structure, rather than specific syllable types. This is also consistent with prior results that found all HVC<sub>X</sub> ion currents that were modeled were influenced by song learning[31].”

      The relationship between rebound area and experimentally tutored birds was not fully explored.

      We expanded the analysis to include rebound area in instrumentally tutored birds, which has now been incorporated into Figure 4C. These additional analyses also robustly support our hypotheses. The Results section has been updated to state:

      “We then evaluated the IPs of HVC<sub>X</sub> in the birds from the two groups. HVC<sub>X</sub> neurons from birds who sang unmodified songs (N = 5 birds, 31 neurons), which had shorter harmonic stacks and shorter overall duration, had lower sag ratios (Mann-Whitney: p = 0.025), firing frequency (Mann-Whitney, p = 0.0051) and rebound area (Mann-Whitney: p = 0.0003)”

      Reviewer #3 (Public Review):

      Limited data supports the claim that intrinsic properties influence temporal integration windows.

      While we agree that further data could strengthen this claim, we show that this can happen in principle (Figure 5) but believe that the appropriate experiment to test this requires further experiments in-vivo. We emphasize in the Discussion:

      “Our findings suggest that post-inhibitory rebound excitation in HVC<sub>X</sub> could expand temporal integration. Ultimately, experiments combining in vitro with in vivo recordings can directly quantify this effect. We hope our results motivate such experiments.”

      Technical Corrections

      (1) Fixed typographical errors (e.g., Line 177: corrected "r2 = 4" to "r2 = 0.4").

      (2) Revised figure legends for clarity (e.g., Figure 4E now includes tutoring design details).

      (3) Updated methods to specify how motifs were defined and measured.

      Revised Figures

      Figure 4: Updated to include analysis of rebound area in instrumentally tutored birds, reflecting the relationship between experimental tutoring and intrinsic properties.

      Supplemental Figure 6: Correlation analysis excluding outliers

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This is a manuscript describing outbreaks of Pseudomonas aeruginosa ST 621 in a facility in the US using genomic data. The authors identified and analysed 254 P. aeruginosa ST 621 isolates collected from a facility from 2011 to 2020. The authors described the relatedness of the isolates across different locations, specimen types (sources), and sampling years. Two concurrently emerged subclones were identified from the 254 isolates. The authors predicted that the most recent common ancestor for the isolates can be dated back to approximately 1999 after the opening of the main building of the facility in 1996. Then the authors grouped the 254 isolates into two categories: 1) patient-to-patient; or 2) environment-to-patient using SNP thresholds and known epidemiological links. Finally, the authors described the changes in resistance gene profiles, virulence genes, cell wall biogenesis, and signaling pathway genes of the isolates over the sampling years.

      Strengths:

      The major strength of this study is the utilisation of genomic data to comprehensively describe the characteristics of a long-term Pseudomonas aeruginosa ST 621 outbreak in a facility. This fills the data gap of a clone that could be clinically important but easily missed from microbiology data alone.

      Weaknesses:

      The work would further benefit from a more detailed discussion on the limitations due to the lack of data on patient clinical information, ward movement, and swabs collected from healthcare workers to verify the transmission of Pseudomonas aeruginosa ST 621, including potential healthcare worker to patient transmission, patient-to-patient transmission, patient-to-environment transmission, and environment-to-patient transmission. For instance, the definition given in the manuscript for patient-to-patient transmission could not rule out the possibility of the existence of a shared contaminated environment. Equally, as patients were not routinely swabbed, unobserved carriers of Pseudomonas aeruginosa ST 621 could not be identified and the possibility of misclassifying the environment-to-patient transmissions could not be ruled out. Moreover, reporting of changes in rates of resistance to imipenem and cefepime could be improved by showing the exact p-values (perhaps with three decimal places) rather than dichotomising the value at 0.05. By doing so, readers could interpret the strength of the evidence of changes.

      Impact of the work:

      First, the work adds to the growing evidence implicating sinks as long-term reservoirs for important MDR pathogens, with direct infection control implications. Moreover, the work could potentially motivate investments in generating and integrating genomic data into routine surveillance. The comprehensive descriptions of the Pseudomonas aeruginosa ST 621 clones outbreak is a great example to demonstrate how genomic data can provide additional information about long-term outbreaks that otherwise could not be detected using microbiology data alone. Moreover, identifying the changes in resistance genes and virulence genes over time would not be possible without genomic data. Finally, this work provided additional evidence for the existence of long-term persistence of Pseudomonas aeruginosa ST 621 clones, which likely occur in other similar settings.

      We thank the reviewer for their thorough evaluation of our work, and for the suggested improvements. A main goal of this study was to show that integrating routine wgs in the clinic was a game changer for infection control efforts. We appreciate this aspect was highlighted as a strength by this reviewer. While some of the weaknesses identified are inherent to the data (or lack thereof) available for this study, we have revised the manuscript to include a detailed discussion on limitations (sampling, thresholds of genetic relatedness, definition and categories etc.) that could influence the genomic inferences. We also provided exact p-values for the changes in rates of resistance, as requested. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a report of a large Pseudomonas aeruginosa hospital outbreak affecting more than 80 patients with first sampling dates in 2011 that stretched over more than 10 years and was only identified through genomic surveillance in 2020. The outbreak strain was assigned to the sequence type 621, an ST that has been associated with carpabapenem resistance across the globe. Ongoing transmission coincided with both increasing resistance without acquisition of carbapenemase genes as well as the convergence of mutations towards a host-adapted lifestyle.

      Strengths:

      The convincing genomic analyses indicate spread throughout the hospital since the beginning of the century and provide important benchmark findings for future comparison.

      The sampling was based on all organisms sent to the Multidrug-resistant Organism Repository and Surveillance Network across the U.S. Military Health System.

      Using sequencing data from patient and environmental samples for phylogenetic and transmission analyses as well as determining recurring mutations in outbreak isolates allows for insights into the evolution of potentially harmful pathogens with the ultimate aim of reducing their spread in hospitals.

      Weaknesses:

      The epidemiological information was limited and the sampling methodology was inconsistent, thus complicating the inference of exact transmission routes. Epidemiological data relevant to this analysis include information on the reason for sampling, patient admission and discharge data, and underlying frequency of sampling and sampling results in relation to patient turnover.

      We thank the reviewer for their thoughtful feedback on our manuscript and for highlighting the quality of the genomic analyses. We agree that the lack of patient epi data (e.g. date of admission and discharge) and the inconsistent sampling through the years are limitations of this study. We have revised the manuscript to acknowledge these limitations and discuss how not having this data complicates the inference of exact transmission routes. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Stribling and colleagues sheds light on a decade-long P. aeruginosa outbreak of the high-risk lineage ST-621 in a US Military hospital. The origins of the outbreak date back to the late 90s and it was mainly caused by two distinct subclones SC1 and SC2. The data of this outbreak showed the emergence of antibiotic resistance to cephalosporin, carbapenems, and colistin over time highlighting the emerging risk of extensively resistant infections due to P. aeruginosa and the need for ongoing surveillance.

      Strengths:

      This study overall is well constructed and clearly written. Since detailed information on floor plans of the building and transfers between facilities was available, the authors were able to show that these two subclones emerged in two separate buildings of the hospital. The authors support their conclusions with prospective environmental sampling in 2021 and 2022 and link the role of persistent environmental contamination to sustaining nosocomial transmission. Information on resistance genes in repeat isolates for the same patients allowed the authors to detect the emergence of resistance within patients. The conclusions have broader implications for infection control at other facilities. In particular, the paper highlights the value of real-time surveillance and environmental sampling in slowing nosocomial transmission of P. aeruginosa.

      Weaknesses:

      My major concern is that the authors used fixed thresholds and definitions to classify the origin of an infection. As such, they were not able to give uncertainty measures around transmission routes nor quantify the relative contribution of persistent environmental contamination vs patient-to-patient transmission. The latter would allow the authors to quantify the impact of certain interventions. In addition, these results represent a specific US military facility and the transmission patterns might be specific to that facility. The study also lacked any data on antibiotic use that could have been used to relate to and discuss the temporal trends of antimicrobial resistance.

      We thank the reviewer for their evaluation of our work and for highlighting the broad implications of our findings regarding the application of real-time surveillance to suppress nosocomial transmission. We agree with the reviewer that fixed thresholds and definitions are imperfect to classify the origin of an infection. The design of this study (e.g. inconsistent sampling through time) was not conducive to provide a comprehensive/quantitative measurement of transmission routes. Thus, we decided to apply conservative thresholds of genetic relatedness and strict conditions (e.g. time between isolate collection, shared hospital location etc.) to favor specificity as our goal was simply to establish that cases of environmentto-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original fixed-thresholds predictions. This limitation is now discussed in the revised manuscript. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly including the addition of Figure S3.

      Reviewer #1 (Recommendations For The Authors):

      The definitions used on lines 391-396 are necessarily somewhat arbitrary, but it would be helpful to have a little bit more justification for the choices made, particularly for the definition of environmental involving the "3x the number of years they were separated". It seems a little hard to square this with the more relaxed 10 SNP cutoff for a patient-to-patient designation. Are there reasons for thinking SNP differences associated with environmental transmission should be smaller than for patient-to-patient, or is the aim here just to set the bar higher for assuming an environmental source? Because these definitions are quite arbitrary, there could also be some value in exploring the sensitivity of the results to these assumptions.

      Thank you. We agree with the reviewers that SNP thresholds, albeit necessarily, are arbitrary and that more discussion/justification was needed to put the genomic inferences in context. We have revised the manuscript to indicate that: 1/ the 10 SNP cutoff for a patient-to-patient designation was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward at the same time) needed to be established. 2/ the environment-to-patient definition was indeed set to be most conservative (nearly identical isolates in two patients from the same ward with no known temporal overlap for > 365 days). This was indeed done to favor high specificity as this inference relied solely on clinical isolates (i.e. the identical environmental strain in the patientenvironment-patient chain was not sampled). For these clinical isolates to have acquired no/very little mutation in that much time, no/low replication is expected and, although unsampled, we propose this most likely happened on hospital surfaces.

      While the term "core genome" should be familiar to most readers, "shell genome" and "cloud genome" are less widely known, and an explanation of what these terms mean here would be helpful.

      Thank you. We have revised the manuscript to define the core, shell, and cloud genomes as genes sets found in ≥ 99%, ≥ 95% and ≥ 15% of isolates, respectively.

      In the first paragraph of the discussion, it could be added that in many cases for clinically important Gram negatives short read sequencing alone will fail to detect transmission events as outbreaks can be driven by plasmid spread with only very limited clonal spread (see, for example, https://www.nature.com/articles/s41564-021-00879-y )

      Thank you. We agree this is an important/emerging aspect of surveillance. However, the goal of this discussion point was to explain why such a large outbreak was missed prior to implementing WGS (short read) surveillance. We feel that discussing “plasmid outbreaks” (which is not at play here, and relatively rare in P. aeruginosa compared to the Enterobacteriaceae) and the need for long read will distract from the narrative. 

      line 599 What does "Mock" mean here? Would it be more accurate to say it is a simplified floor plan?

      Thank you. “Mock” was changed to “simplified”

      IPAC abbreviation is only used once - spelling it out in full would increase readability.

      Revised manuscript was edited as suggested.

      MHS is only used twice.

      Revised manuscript was edited to spell out Military Health System

      Line 364: full stop missing.

      Revised manuscript was edited as suggested.

      Line 401: Bayesian rather than bayesian.

      Revised manuscript was edited as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for giving me the opportunity to review this interesting manuscript.

      The conclusions of this paper are mostly well supported by the data presented, but epidemiological information was limited and the sampling methodology was inconsistent, thus complicating inference of exact transmission routes.

      Major issues:

      What was the baseline frequency of clinical and/or screening samples of Pseudomonas aeruginosa at the hospital? Neither Figure 1D nor Table S1 allows for differentiating between clinical and screening samples. Most isolates were cultured from clinical materials, and there is no information about the patients' length of stay and their respective sampling dates. Is there any possibility of finding out whether the samples were collected for clinical or screening purposes? Would it be possible to include the patients' admission data to determine whether the strains were imported into the hospital or related to a previous stay, e.g. among known carriers? Also, the issue of sampling dates vs. patient stay on the ward should be addressed, as there may be an overlap in patients' stay on the ward but no overlap in terms of sampling dates or even missing samples (missing links).

      We have revised the manuscript to address this important point: i) 16 isolates were from surveillance swabs and are labelled “Surveillance” in Table S1. The remaining 237 were clinical isolates; ii) unfortunately, because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) and we can not calculate length of stay or better identify patient overlap. These limitations are now acknowledged in the discussion of the revised manuscript.

      In order to evaluate the extent of the outbreak, more epidemiological data would be useful What is the size of the hospital, what is the average patient turnover, and what is the average length of stay in ICU and non-ICU? Is there any specialization besides the military label?

      We have revised the manuscript to indicate that facility A is 425-bed medical center and is the only Level 1 trauma center in the Military Health System. Unfortunately, the data to calculate length of stay, throughout the years, in ICU and non-ICU, was not available to us. This limitation is now also acknowledged in the discussion.

      Perhaps the authors could attempwt to discuss the extent to which large outbreaks like these may be considered as part of unavoidable evolutionary processes within the hospital microbiome as opposed to accumulation and transmission of potentially harmful genes/clones, and differentiate between the putative community spread without any epidemiological links on the one hand, and hospital outbreaks that could be targeted by local infection prevention activities on the other hand.,

      We respectfully disagree with the suggestion that this large outbreak “may be considered as part of unavoidable evolutionary processes within the hospital microbiome” and should be opposed to “transmission of potentially harmful genes/clones”. As a matter of fact, our data showed that infection control staff at Facility A responded with multiple interventions, including closing sinks, replacing tubing, and using foaming detergents. This resulted in slowing the spread of the ST621 outbreak with just 3 cases identified in 2022, 0 cases in 2023 and 1 case in 2024. This is now discussed in the revised manuscript.

      Page 5, lines 88-92 lines 101-104. It seems as if the outbreak was identified only by the means of genomic surveillance. This raises questions as to the rationale for sampling and sequencing, especially prior to 2020. Considering 11 cases per year between 2011 and 2016, one could assume such an outbreak would have been noticed without sequencing data.

      The MRSN was created in 2010, in response to the outbreak of MDR Acinetobacter baumannii in US military personnel returning from Iraq and Afghanistan. Between 2011 and 2017, the MRSN collected MDR isolates (mandate for all MDR ESKAPE but compliance varied between years and facilities) from across the Military Health System and, for select isolates (e.g. high-risk isolates carrying ESBLs or carbapenemases) performed molecular typing by PFGE. In 2017 the MRSN started to perform whole genome sequencing of its entire repository. In 2020, a routine prospective sequencing service was started and first detected the ST621 outbreak. A retrospective analysis of historical isolate genomes (2011-2019) identified additional cases. The first paragraph of the discussion lists possible factors to explain why the ST621 escaped detection by traditional approaches. We believe 11 cases per year is not a strong signal when stratified by month, wards, or both, especially for a clone lacking a carbapenemase and without a remarkable antibiotic susceptibility profile. 

      Did the infection control personnel suspect transmission? If yes, was the sampling and submission of samples to the MRSN adapted based on the epidemiologic findings?

      The ST621 outbreak was unsuspected before the initial genomic detection in 2020. Until that point, MDR isolates only (Magiorakos et al PMID: 21793988) were collected but compliance was variable through time. Quickly thereafter (starting in 2021), complete sampling of all clinical P. aeruginosa (MDR or not) from Facility A was started. The manuscript was revised to clarify those details of the sampling strategy.

      Is there any information about how many environmental sites were sampled without evidence of ST621 / screening samples were cultured without evidence of Pseudomonas aeruginosa?

      For patient isolates, only 16 isolates were from surveillance swabs. The remaining 237 were clinical isolates. No denominator data was available to calculate P. aeruginosa and ST-621 positivity rate in surveillance swabs throughout the time period. For environmental isolates, a total of 159 swabs were taken from 55 distinct locations in 8 wards/units including the ER. This data is now included in the revised manuscript. However, a complete analysis of these swabs (positivity rate for ESKAPE pathogens, P. aeruginosa, per ward/floor/room, per swab type (sink drain, bed rail etc.) etc.) is beyond the scope of this study and is being performed as a follow up investigation.

      Page 5 lines 89 and 39 Figure S1B. Please describe how the allelic distance for the cluster threshold was selected.

      As indicated in the legend of Figure S1B, no thresholds were applied. All ST621 isolates ever sequenced by the MRSN were included. All except 3 isolates shared between 023 cgMLST allelic differences. The remaining 3 were distant by 88-89 allelic differences. The text was revised to clarify this point.

      Page 5 lines 99-100. Could the authors please provide some distribution measures (e.g. IQR).

      Done as requested. The revised manuscript now reads “…of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. 1A, Table S1).”

      Page 5 line 102. Could the authors please provide some distribution measures (e.g. IQR).

      Please see above. A chart was created and is now included as Fig. S2.

      Page 6 line 107 and page 34 figure 1c. In the text it is stated that isolates were collected in 27 wards, the figure 1C depicts 26 wards and n/a.

      Thank you for spotting this inconsistency. This has been fixed in the revised manuscript.

      Page 6 lines 117-118. Samples collected in the emergency room would imply samples collected on admission, already addressed previously. Did the authors investigate a potential import into the hospital from community reservoirs or were all these isolates collected among patients who had been previously admitted to the hospital and/or tested positive for the outbreak strain?

      We agree that samples collected in the ER imply samples collected on admission. Of the 29 ER isolates only 9 (31%) were primary isolates (first detection in a new patient) which suggests a majority were from returning patients at Facility A. Because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) to investigate/confirm that these 9 patients had previous visits at Facility A. This point is now discussed in the revised manuscript.

      Page 6 line 128. This could also represent increased selective pressure. However, according to Table S1, the 28 isolates collected in 2011 (the number does not match with Figure 1D) were from many different wards, thus indicating earlier spread throughout the hospital.

      Yes, we agree. Please note that table S1 lists all isolates for 2011 whereas Figure 1D focuses on primary (first isolate from each patients) only.  

      Page 7 line 133. Both Figure 2 and the discussion section, page 13 line 296 suggest the year 2005 instead of 2004?

      Thank you for catching this typographical error. This was corrected to 2004 in the revised manuscript.

      Figure 1E. The figure should also depict intra-patient diversity for comparison.

      Thank you for this great suggestion. We have revised Figure 1E accordingly.

      Page 7, lines 146-147 Could the authors attempt explaining the upper part of the bimodal peaks?

      This is an all-vs-all SNP analysis for all inter-patient isolates. For each isolates all distances to other isolates are reported, not only the smallest. The upper peaks represent comparisons to isolates from a different outbreak subclone (SC1 vs SC2).

      Page 7, line 150 This is a very small number considering the extent of the outbreak and suggests a large number of missing links. Or does this rather imply continuous import and evolution over time that does not necessarily represent transmission within the hospital?

      We believe all cases were due to transmission happening within the hospital. Based on conservative thresholds (genetic relatedness and epi link, or lack thereof) the precise origin from another patient (n=10) or a contaminated surface (n=12) can be inferred. For the remaining 60 patients, with the available sampling, the conditions we chose are not met and we simply do not conclude whether a direct patient-to-patient or an environmental origin was more likely.

      Page 8 line 155. What does the temporal overlap refer to - sampling date versus patient's stay on the ward? Please specify.

      The temporal overlap was investigated from sampling dates, as dates of patient admission/discharged were not available.

      Page 8, line 157: What does primary/serial isolate mean - first and follow-up samples of ST621 per patient?

      Yes. Primary isolate is used to designate the first isolate from a patient. Serial isolates designate follow-up samples of ST621.

      Page 8 line 165: Table S3 and Figure 3 only refer to environmental samples from three wards. Ward 20 rooms 2 and 18 as well as ward 1 rooms 1 and 6 were hotspots - is there any information on the specific infection control/disinfection measures? Addressed in discussion page 12, lines 273-275, but no information on what was actually done.

      The manuscript was revised to indicate the precise disinfection measures that were taken. A follow-up study is ongoing to assess long-term efficacy and monitor possible retrograde growth from previously contaminated sinks.

      Page 8 line 175: Evaluation of change in resistance fraction over time - There may have been a selection bias with an inconsistent number of strains sequenced per year.

      Yes, incomplete sampling and possible selection bias are now listed with other limitations of this study in the discussion of the revised manuscript.

      Page 9 line 183: The referral to Table S1 is unclear, I could not find the number and the specific isolates selected for long-read sequencing.

      Thank you. This has been added to the revised Table S1.

      Page 10 lines 217-225 and Figure 4C: Perhaps it is possible to better align what is written in the text and the caption of the figure. The caption does not clarify that only one patient develops colistin resistance (what was the reason to include the other patients?).

      Thank you. We have revised the text and the caption of the figure to clarify that only isolates from one patient developed colistin resistance. The isolates from the other patients on Fig. 4C are shown to provide context and accurately map the emergence of the PhoQE77fs mutation.  

      Page 10, lines 228-229 and Table S5: How is it possible to identify those 64 genes in Table S5?

      We have revised Table S5 to facilitate the identification of the 64 genes with ≥ 2 independently acquired mutations (excluding SYN). Specifically, we have added column E labeled “Counts independent mutations per locus (excluding SYN)”. A total of 205 rows (in this table each row is a variant) have a value ≥ 2 and these represent 64 genes (upon deduplication of locus tags).  

      Page 13, lines 280-281: Where is the information on chronic infection presented? Serial cultures would not necessarily mean chronic infection.

      Authors response: Yes, we agree this was not the appropriate characterization and this was revised to ‘long-term’ infections.

      Page 14 line 306: Emergence of colistin resistance in a single patient, correct?

      Yes. This was further clarified in the text.

      Page 14 lines 315-320: This should go to the results section. In particular disinfection, closing, and replacing of tubing should be mentioned in the results section in reference to the results presented in Table S3.

      Thank you. We have considered this suggestion and have decided to leave this discussion as the closing paragraph of this publication. A follow-up study is ongoing to assess long-term efficacy of these interventions on the ST-621 bur also other outbreak clones at Facility A.

      Methods

      Page 15 lines 330-333: Perhaps it is possible to avoid redundancy.

      Thank you. We have revised the text accordingly.

      Page 15 lines 341: Information on which isolates were subjected to long-read sequencing is missing.

      Thank you. This has been added to the revised Table S1.

      Page 16 line 345: Was there a particular reason why Newbler was chosen?

      No. At the time Newbler was the default assembler built in the MRSN bacterial genome analysis pipeline and QC processes.

      Page 16, line 357-358: What was the rationale for selecting this isolate as reference genome?

      This isolate was chosen because it was collected early in the outbreak and phylogenetic analysis revealed it had low root to tip divergence.

      Page 16 line 361: Why 310 isolates, if only 253 were assigned to the outbreak clone and only a subset of those were collected in facility A?

      This was a typographical error that has corrected (it now reads “…set of 253 isolates.”) in the revised manuscript.  

      Page 17 lines 387-395: What is the reason that intra-patient diversity was not included in the set of criteria for SNP distances?

      The observed within host variability (now displayed in revised Fig. 1E) was taken into consideration when setting SNP thresholds for categorizing patient-to-patient transmission or environment-to-patient event. This is now clarified in the revised manuscript.

      Page 17 line 392: How was the threshold of <=10 SNPs determined?

      The 10 SNP cutoff to infer a patient-to-patient transmission event was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study, and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward within the same month) needed to be established.

      Page 17 line 395 and Figure 2: What was the assumed average mutation rate per genome per year?

      Thank you. The mean substitution rate inferred by BEAST was 2.987E-7 similar to estimate from previous studies on P. aeruginosa outbreaks (e.g. PMID: 24039595).

      Reviewer #3 (Recommendations For The Authors):

      Please find (line-by-line comments) on each section of the manuscript below:

      Introduction

      Line 86: I am wondering why the authors state ">28 facilities" instead of the exact number of facilities from which these lineages were recovered.

      Thank you. Manuscript was revised to provide the exact number of facilities. It now reads “…recovered from 37 and 28 facilities, respectively.”

      Methods

      It's not clear to me which criteria were used for collecting these isolates (both prospective and retrospective). I understand that some of the data are described in more detail in Lebreton et al but I did not find the specific criteria for the collection of the isolates and I imagine that these might differ if different facilities. Would it be possible to comment on that and add a short paragraph in the Methods section?

      Thank you. This lack of clarity was also raised by other reviewers, and we have revised the manuscript to indicate that: 1/MDR isolates only (Magiorakos et al PMID: 21793988) were collected from 2011-2020 with the same criteria for all facilities although compliance was variable through time and between facilities; and 2/ starting in 2021 all P. aeruginosa isolates, irrespective of their susceptibility profile, were collected from Facility A

      The data comes from a US Military hospital. Is this related to the US Veterans Affairs Healthcare system? Is there more detailed information about the demographics of the patient population?

      Facility A is part of the Military Health System (MHS) which provides care for active service members and their families. This is distinct from the US Veterans Affairs Healthcare system. Only limited patient data was accessible to us as this study was done as part of our public health surveillance activities. Patient age (avg. 57.2 +/- 21.0) and gender (ratio male/female 1.7) are provided in the revised manuscript. 

      Line 384ff: The origin of infection was inferred based on the SNP threshold and epidemiological links. However, recombination events can complicate the interpretation of SNP data. Have the authors attempted to account for this?

      Thank you. We agree that recombination events can complicate the interpretation of SNP data. We used Gubbins v2.3.1 to filter out recombination from the core SNP alignment, as indicated in the revised manuscript.

      The authors' definition of environment-to-patient transmission seems conservative (nearly identical strain and no known temporal overlap for > 365 days). Have the authors changed the threshold, performed sensitivity analyses, and tested how this would affect their results?

      Indeed, acknowledging that fixed thresholds have limitations in their ability to accurately predict the origin of infections, we took a conservative approach to favor specificity as our goal was simply to establish that cases of environment-to-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original predictions. This limitation is now discussed in the revised manuscript.

      The authors don't seem to incorporate the role of healthcare workers in the transmission process. Could they comment on this? I am assuming that environment-to-patient transmission could either be directly from the environment to the patient or via a healthcare worker. I think it's fine to make simplifying assumptions here but it would be great if this was explicitly described.

      Thank you for this suggestion. We have not sampled the hands of healthcare workers in this study. As a result, the reviewer is correct to say that we made the simplifying assumption that healthcare workers would be possible intermediates in either environment-topatient or patient-to-patient transmissions, as previously described by others (PMID: 8452949). This limitation is now discussed in the revised manuscript.

      Page 5, line 100: What does "all vs all" mean? Based on the supplement, I assume it's the pairwise distance and then averaged across all of those. It would improve the readability of the manuscript if the authors could briefly define this term and then maybe refer to Table S1.

      Thank you. We have created Fig.S2 and revised the manuscript to state that ST-621 isolates from facility A belonged to the same outbreak clone with a distance (averaged all vs all pairwise comparison) of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. S2, Table S1).

      Figure 1D: It would be interesting to see additional figures in the supplement on the percentage of sequenced isolates per year and whether it varies across the different sources/sites. Is there any information on which isolates were chosen for sequencing?

      Lack of clarity in the sampling/sequencing scheme was raised by multiple reviewers and we have provided a thorough response to earlier comments. We also have revised the material and methods section accordingly. Finally, we have created Fig. S3 to show the percentage of sequenced isolates per year across different sources/sites, as suggested by the reviewer. No noticeable patterns were observed. 

      It seems like only a subset of all clinical isolates were sequenced. Would it be possible that SC2 was present already earlier but not picked up until a certain date?

      Although all isolates received by the MRSN were sequenced, compliance varied through time so it is true that not all clinical isolates were sequenced between 2011-2019. As such, we fully agree with this hypothesis and discuss this possibility as BEAST analysis placed the origin of SC2 in 2004 while the first detection of an SC2 isolate was in December 2012. This limitation is now discussed in the revised manuscript.

      Could the authors elaborate on whether the isolates resulted from single-colony picks? Is it possible that the different absence of a subclone is due to the fact that they picked only a colony?

      Yes, the isolates resulted from single-colony picks except when the presence of different colony morphologies was noted. In the latter, representative isolates for each colony morphologies were processed. We have revised the methods to make that clear.

      Figure 2: It is difficult to see which nodes belong to which patient due to the small font size. I wonder if it was possible to color the nodes for each patient, to make it more readable.

      We tried coloring the nodes but with > 60 distinct patients/colors we decided it did not improve clarity. We have revised figure 2 to increase the font size.  

      Page 7-8, lines 154-155: Did the authors check whether there were isolates of the same strain (that were found in the environment) present in other patients elsewhere in the ward?

      Yes. In rare cases, we observed virtually genetically identical isolates from two patients collected in different wards. Because we only have access to clinical isolate data (collected from patient X in ward Y) and do not have access to patient data (admission/discharge date, wards, rooms, etc.), we do not know but cannot exclude that patients overlap in a room prior to the sampling of their P. aeruginosa isolates. We designed our fixed thresholds to be conservative. As a result, in this analysis, these cases are labelled as “undetermined”.  

      Page 8: Do the authors have any information on antibiotic use during this timeframe? From the discussion, it seems like there is no patient-level prescription data. Is there any data on overall trends? How were trends in antibiotic use correlated with trends in antibiotic resistance?

      Unfortunately, patient-level prescription data (or any other data not linked to the bacterial specimens) was not accessible to us as this study was done as part of our public health surveillance activities.

      To infer the origin of infection, the authors used a static method with fixed thresholds and definitions. This study does not provide any uncertainty with their estimates. Maybe the authors could add a sentence in the discussion section that MCMC methods to infer transmission trees incorporating WGS could provide these estimates. These methods have not been applied to PA a lot but two examples where MCMC methods have been used without WGS (though the definition of environmental contamination may differ between these studies and this study).

      https://doi.org/10.1186/s13756-022-01095-x

      https://doi.org/10.1371/journal.pcbi.1006697

      Thank you for this great suggestion. We have revised the manuscript to include a discussion on the limitations of fixed thresholds to infer transmission chains/origins, and to discuss existing alternatives including MCMC methods. 

      Line 322-323: This sentence is a bit vague since not all of these HAI are due to P. aeruginosa. I would suggest citing a number that is specific to PA.

      Thank you. While our paper shows a particular example of protracted P. aeruginosa outbreak, the roll-out of routine WGS surveillance in the clinic will help prevent hospital-associated drug-resistant infections for more than this species. We believe that broadening the scope in the last sentence of the manuscript is important and we decline to revise as suggested.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This report addresses a compelling topic. However, I have significant concerns, which necessitate a reassessment of the report's overall value.

      Anatomical Specificity and Stimulation Site:

      While the authors clarify that the ventral MGB (MGv) was the intended stimulation target, the electrode track (Fig. 1A) and viral spread (Fig. 2E) suggest possible involvement of the dorsal MGB (MGd) and broader area. Given that MGv-AI and MGd-AC pathways have distinct-and sometimes opposing-effects on plasticity, the reported LTP values (with unusually small standard deviations) raise concerns about the specificity of the findings. Additional anatomical verification would help resolve this issue.

      We thank the reviewer for highlighting the importance of anatomical specificity in MGv targeting. In the revised manuscript, we have taken several steps to address these issues:

      (1) Higher-magnification histology has been added to Figure 1A, clearly identifying the electrode tip localized within the MGv.

      (2) Figure 2E has been replaced with a new image showing viral expression largely confined to MGB, with minimal spread to surrounding structures.

      (3) In the Discussion, we explicitly acknowledge that although targeting was guided by stereotaxic coordinates and histological confirmation, some viral spread throughout the MGB occurred. We also discuss the possibility that both MGv-A1 and MGd-AC pathways may contribute to the recorded responses, which could influence the observed plasticity, as previously suggested by the reviewer.

      These additions and acknowledgments are now incorporated to ensure the reader can interpret the data with full consideration of anatomical targeting limitations.

      Results section:

      “Higher-magnification histology confirmed accurate MGv targeting (Figure 1A, lower-middle panel)’”

      Discussion section:

      “Although our experiment targeting the MGv was guided by stereotaxic coordinates and verified post hoc, we acknowledge potential contributions from non-lemniscal medial geniculate nucleus dorsal (MGd) projections. Anatomical and physiological evidence indicates that MGv-AC projections provide rapid, frequency‑specific, tonotopically organized excitation, whereas MGd pathways target higher‑order auditory cortex with broader tuning, less precise tonotopy, longer response latencies, and greater context‑dependence, features that can differentially shape cortical sensory integration and plasticity (Lee and Sherman, 2010; Smith et al., 2012; Ohga et al., 2018; Lee, 2015; Hu, 2003). While the co-recruitment of lemniscal and non-lemniscal inputs may enhance the generality of our CCK-dependent mechanism, the differing response characteristics of these pathways suggest subtle differences in their relative engagement in the observed plasticity. Future pathway-specific manipulations will help clarify their respective contributions”

      Lee, C.C., and Sherman, S.M. (2010). Topography and physiology of ascending streams in the auditory tectothalamic pathway. Proceedings of the National Academy of Sciences 107, 372-377. doi:10.1073/pnas.0907873107.

      Smith, P.H., Uhlrich, D.J., Manning, K.A., and Banks, M.I. (2012). Thalamocortical projections to rat auditory cortex from the ventral and dorsal divisions of the medial geniculate nucleus. Journal of Comparative Neurology 520, 34-51.

      Ohga, S., Tsukano, H., Horie, M., Terashima, H., Nishio, N., Kubota, Y., Takahashi, K., Hishida, R., Takebayashi, H., and Shibuki, K. (2018). Direct Relay Pathways from Lemniscal Auditory Thalamus to Secondary Auditory Field in Mice. Cerebral Cortex 28, 4424-4439. 10.1093/cercor/bhy234.

      Lee, C.C. (2015). Exploring functions for the non-lemniscal auditory thalamus. Frontiers in Neural Circuits 9, 69.

      Hu, B. (2003). Functional organization of lemniscal and nonlemniscal auditory thalamus. Experimental Brain Research 153, 543-549. 10.1007/s00221-003-1611-5.

      Figure legend section:

      “Post-hoc histology at higher magnification (lower-middle) shows the electrode tip confined within the MGv. White lines delineate the MGv/MGd border based on cytoarchitectonic landmarks.”

      Statistical Rigor and Data Variability:

      The remarkably low standard deviations in LTP measurements are unexpected based on established variability in thalamocortical plasticity. The authors' response confirms these values are accurate, but further justification, such as methodological controls or replication-would bolster confidence in these results. Additionally, the comparison of in vivo vs. in vitro LTP variability requires more substantive support.

      We appreciate the reviewer's concern regarding the unusually small variability. We would like to clarify that the error bars in our figures represent Standard Error of the Mean (SEM) rather than Standard Deviations (SD). As SEM is derived from the SD while incorporating sample size, it is inherently smaller than SD, which may have led to the impression of unrealistically low variability. This has now been explicitly clarified in the figure legends and Methods.

      To illustrate the raw variability, we have added Supplementary Figure S1E showing unaveraged fEPSP slopes compare to SEM, corresponding to Figure S1C. This addition ensures transparency and allows readers to directly assess the quality and consistency of our recordings.

      Regarding the comparison between in vivo and in vitro LTP variability:

      We agree that clarifying the basis of our in vivo vs. in vitro variability comparison is important. For example, in Chen et al., 2019, using identical LTP induction protocols (Fig. J), the SED of in vitro slice measurements (Fig. K) was substantially larger than that of in vivo recordings (Fig. L).

      This difference likely reflects:

      (1) In vitro: neighboring data points within a single experiment are highly correlated; variability across experiments is large due to heterogeneous sensitivity to LTP induction (10–200% increasement).

      (2) In vivo: lower correlation between neighboring data points, but each is averaged from 12 recordings over 2 min, reducing cross-trial variability; sensitivity to LTP induction is less variable across experiments (5–60% changes).

      We hope that these clarifications and additional data address the reviewer’s concerns regarding statistical rigor and data variability.

      Methods section:

      “The slopes of the evoked fEPSPs were calculated and normalized using a customized MATLAB script, and the group data were plotted as mean ± Standard Error of the Mean (SEM).”

      “All data are presented as mean ± SEM. Error bars and shaded areas represent SEM. Here, n represents the number of stimulation-recording sites or and N represents the number of animals in each experiment. At each time point, fEPSPs were averaged across 12 consecutive trials (2 min) to reduce within-experiment fluctuation. Normalized time courses were then used for repeated-measures analyses.”

      Figure legend section:

      “Data are mean ± SEM; error bars indicate SEM.”

      “(E) Unaveraged fEPSP slopes are shown for each time point, with individual data points corresponding to all sites included in Fig. 1C; mean ± SEM overlays are shown in black. Note that all individual data points are displayed in this figure, whereas in Figure S1C, only the averaged values are shown.”

      Viral Targeting and Specificity:

      The manuscript does not clearly address whether cortical neurons were inadvertently infected by AAV9. Given the potential for off-target effects, explicit confirmation (e.g., microphotograph of stimulation site) would strengthen the study's conclusions.

      We appreciate the request for quantitative confirmation of off-target cortical infection. We clarify that our histological verification was conducted by systematic sampling rather than exhaustive quantification. Under the same sampling procedure, we did not detect tdTomato-positive cortical somata after AAV9‑Syn‑ChrimsonR‑tdTomato injections into the MGB, whereas we observed rare EYFP-positive cortical somata after AAV9‑EF1a‑DIO‑ChETA‑EYFP (median < 1 cell per 0.4 × 0.4 mm² section, Supplementary Figure S1E). Although these observations do not constitute a formal statistical estimate, they were consistent across sampled sections and are in line with the low-level trans-synaptic transfer reported for AAV9. We have discussed their potential implications for data interpretation in the Discussion.

      We hope these clarifications and the newly presented histological evidence address the reviewer’s concerns and further strengthen the rigor of our study.

      Discussion section:

      “Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this risk, we carefully control the injection volume, rate, and viral expression time, while also verifying expression post-hoc. Systematic sampling histological analysis detected no tdTomato-positive cortical somata in the ACx (Figure 2E lower panel), whereas rare EYFP-positive cortical somata were observed after AAV9-EF1a-DIO-ChETA-EYFP injections (median < 1 cell in 0.4 × 0.4 mm2 section, Figure S1F, corresponds to Figure 2A upper-middle panel). These construct‑dependent observations align with occasional low‑level trans‑synaptic transfer reported for AAV9 (Zingg et al., 2017) and indicate that off‑target cortical infection was negligible for ChrimsonR and exceedingly rare for ChETA under our experimental conditions.”

      Zingg, B., Chou, X.L., Zhang, Z.G., Mesik, L., Liang, F., Tao, H.W., and Zhang, L.I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron 93, 33-47. 10.1016/j.neuron.2016.11.045.

      Figure legend:

      “Representative histological images demonstrating low-level transsynaptic spread following AAV9-EF1a-DIO-ChETA-EYFP injection into the MGv. Rare EYFP-positive cortical neurons were observed (median < 1 cell per 0.4 × 0.4 mm² section). Scale bar: 100 µm.”

      Integration of Prior Literature:

      The discussion of existing work is adequate but could be more comprehensive. A deeper engagement with contrasting findings would provide better context for the study's contributions.

      We appreciate the reviewer’s suggestion to engage more deeply with contrasting findings. In the revised Introduction and Discussion, we have:

      (1) Refocused the historical context toward adult auditory thalamocortical plasticity and explicitly contrasted it with visual and somatosensory cortices, while adult ACx exhibits weaker and more gated NMDAR dependence.

      (2) Positioned CCK–CCKBR signaling as a permissive/gating mechanism that can complement or partially compensate for postsynaptic NMDAR signaling, potentially reconciling variability across cortical areas and life stages.

      (3) Clarified the potential differential contributions of lemniscal (MGv) and non‑lemniscal (MGd) streams to plasticity expression and variability, acknowledging pathway-specific response properties.

      These additions are now integrated in the Introduction (paragraphs 2–3) and Discussion (sections “CCK Dependence of Thalamocortical Neuroplasticity in the ACx” and “Developmental and Age‑Dependent CCK‑Mediated Plasticity”), providing a more comprehensive and balanced context for our findings.

      Introduction section:

      “However, converging evidence shows that thalamocortical inputs retain a capacity for experience-dependent modification in adulthood. Sensory enrichment or deprivation can gate or reinstate thalamocortical plasticity. In the adult ACx, pairing sounds with neuromodulatory drive can reshape cortical representations. In vivo high-frequency stimulation (HFS) of dorsal lateral geniculate nucleus (LGN) or medial geniculate body (MGB) induces LTP in sensory cortices and has been linked to perceptual learning beyond the critical period. Notably, auditory thalamocortical plasticity appears less dependent on NMDA receptors compared to other cortical regions. The mechanisms underlying thalamocortical plasticity in the mature brain remain poorly understood.

      Cholecystokinin (CCK) and its receptor CCK-B receptor (CCKBR) are well positioned to influence thalamocortical transmission: Cck mRNA is abundant in MGB neurons and CCKBR is enriched in layer IV of ACx, the principal thalamorecipient layer.”

      Discussion section:

      “These findings suggest a potential involvement of CCK in thalamocortical plasticity. Our data extend this framework by identifying CCK–CCKBR signaling as a permissive modulator of adult thalamocortical LTP.”

      “We propose that CCKBR activation may trigger intracellular calcium release and AMPAR recruitment in parallel to, or partially compensating for,independently of postsynaptic NMDAR signaling, while the complementarity of CCKBR and NMDARs may contribute to robust thalamocortical plasticity. This complementary arrangement may reconcile differences across developmental stages and cortical areas, and highlights neuropeptidergic signaling as a lever to re-enable adult thalamocortical plasticity.

      Notably, exogenous CCK alone failed to induce LTP in the absence of accompanying stimulation (Figure S2A and S2B), emphasizing that CCK function as a modulator rather than a direct initiator of LTP. Activation of the thalamocortical pathway is also essential for LTP induction. Although our experiment targeting the MGv was guided by stereotaxic coordinates and verified post hoc, we acknowledge potential contributions from non-lemniscal medial geniculate nucleus dorsal (MGd) projections. Anatomical and physiological evidence indicates that MGv-AC projections provide rapid, frequency‑specific, tonotopically organized excitation, whereas MGd pathways target higher‑order auditory cortex with broader tuning, less precise tonotopy, longer response latencies, and greater context‑dependence, features that can differentially shape cortical sensory integration and plasticity. While the co-recruitment of lemniscal and non-lemniscal inputs may enhance the generality of our CCK-dependent mechanism, the differing response characteristics of these pathways suggest subtle differences in their relative engagement in the observed plasticity. Future pathway-specific manipulations will help clarify their respective contributions. Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      Therapeutic Implications:

      The authors' discussion of therapeutic potential is now appropriately cautious and well-reasoned.

      Conclusion:

      While the study presents intriguing findings, the concerns outlined above must be addressed to fully establish the validity and impact of the results. I appreciate the authors' efforts thus far and hope they can provide additional data or clarification to resolve these issues. With these revisions, the manuscript could make a valuable contribution to the field.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because is opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      There are some details that should be addressed, primarily regarding potential baseline differences in comparison groups. The behavioral assessment is relatively limited, but may be fleshed out in future work.

      We appreciate the reviewer’s suggestion regarding potential baseline differences. In our study, all groups underwent harmonized procedures, including identical exposure, timing, and acquisition parameters. Group allocation and data collection were performed under standardized conditions. For electrophysiology, baseline fEPSP measures and stimulation intensities were calibrated per site using consistent input-output procedures, with analyses based on normalized slopes relative to each site’s own baseline. For behavior, animals from the same litter served as both experimental and control groups, matched for handling conditions; startle/PPI data were acquired using identical hardware and timing settings. While no additional post hoc re-processing was performed, we have clarified these controls in the Methods to enhance transparency.

      We agree that the behavioral assessment is intentionally focused and does not encompass broader auditory perceptual functions (e.g., temporal processing). We now explicitly state this limitation and propose future studies to examine temporal acuity and cell-type-specific manipulations. These experiments will clarify how CCK-dependent thalamocortical plasticity generalizes to other perceptual domains.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity is almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results, along with the rigor multi-angled approach, provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      We agree with the reviewer that the relative contributions of pyramidal neurons and PV-interneurons to CCK-dependent thalamocortical plasticity remain to be determined. Our recordings primarily reflected excitatory postsynaptic activity from layer IV pyramidal neurons, given the fEPSP metrics used. As PV-interneurons are essential in shaping cortical inhibition and temporal precision, they may also be modulated by CCK release from thalamocortical inputs. We have explicitly acknowledged this limitation in the Discussion section of the manuscript and propose that future studies should employ cell-type-specific recording or manipulation approaches to dissect the respective roles of inhibitory and excitatory neuronal populations in CCK-dependent thalamocortical plasticity. We appreciate the reviewer’s suggestion and believe this is a valuable direction for ongoing research.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      We acknowledge that the current study primarily examined frequency discrimination and did not directly assess temporal processing. Enhanced network responsivity could have variable effects on temporal precision, depending on the balance between excitation and inhibition. PV-interneurons, in particular, are known to support temporal fidelity in auditory processing (Nocon et al., 2023; Cai et al., 2018). We discussion that future work should investigate how CCK modulation influences temporal coding at both the circuit and single-cell level, and whether such changes align with or diverge from the mechanisms underlying frequency discrimination improvements.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      We appreciate the reviewer’s suggestion. While we recorded single-unit activity during HFS protocols, long-term stability over >1.5 hours was less consistent compared to fEPSP measurements, leading to higher variability in spike-based metrics. We therefore used fEPSPs as our primary quantitative measure for robustness. We agree, however, that single-neuron data could yield valuable complementary insights. In future experiments combining stable single-unit recording with synaptic measurements will be conducted to better link cellular excitability and network plasticity.

      (4) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      We agree that PPI involves multiple cortical and subcortical nodes. In our paradigm, layer IV neurons receive segregated MGv inputs, high-frequency activation of thalamocortical projections induces robust synaptic plasticity in layer IV. The potentiation at these synapses could amplify the cortical representation of weak prepulses, facilitating their detection and enhancing PPI performance. This interpretation is consistent with prior work showing that local CCK infusion combined with auditory stimuli can augment cortical responses (Li et al., 2014). We have expanded the Discussion to highlight that in aged animals, where baseline PPI performance is often reduced due to degraded auditory inputs (Ouagazzal et al., 2006; Young et al., 2010), restoring thalamocortical plasticity via CCK may partially compensate for sensory gating deficits. We further note that the exact contribution of layer IV to PPI circuitry warrants future investigation using pathway-specific perturbations.

      Comments on revisions:

      The manuscript is much improved and many of the issues or questions have been addressed. Ideally, evidence for the degree of transsynaptic spread for AAV9-Syn-ChrimsonR-tdTomato would also be provided in some form since in the authors' response in sounds like some was observed, as expected.

      We thank the reviewer for this important point and for the opportunity to clarify. As requested, we have carefully examined the possibility of transsynaptic spread in our experiments:

      We clarify that our histological verification was conducted by systematic sampling rather than exhaustive quantification. Under the same sampling procedure, we did not detect tdTomato-positive cortical somata after AAV9‑Syn‑ChrimsonR‑tdTomato injections into the MGB, whereas we observed rare EYFP-positive cortical somata after AAV9‑EF1a‑DIO‑ChETA‑EYFP (median < 1 cell per 0.4 × 0.4 mm² section, see Figure 2A and Figure S1F), consistent with occasional low-level transsynaptic spread reported in the literature.

      We have updated the Discussion sections to clearly report these findings, and to emphasize the potential for vector- and construct-dependent variability in transsynaptic spread. We also explicitly acknowledge this technical limitation and discuss its implications for data interpretation.

      We hope these clarifications and additions address the reviewer’s concern regarding viral specificity and transsynaptic spread.

      Discussion section:

      “Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this risk, we carefully control the injection volume, rate, and viral expression time, while also verifying expression post-hoc. Systematic sampling histological analysis detected no tdTomato-positive cortical somata in the ACx (Figure 2E lower panel), whereas rare EYFP-positive cortical somata were observed after AAV9-EF1a-DIO-ChETA-EYFP injections (median < 1 cell in 0.4 × 0.4 mm2 section, Figure S1F, corresponds to Figure 2A upper-middle panel). These construct‑dependent observations align with occasional low‑level trans‑synaptic transfer reported for AAV9 (Zingg et al., 2017) and indicate that off‑target cortical infection was negligible for ChrimsonR and exceedingly rare for ChETA under our experimental conditions.”

      Zingg, B., Chou, X.L., Zhang, Z.G., Mesik, L., Liang, F., Tao, H.W., and Zhang, L.I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron 93, 33-47. 10.1016/j.neuron.2016.11.045.

      Figure legend:

      " Representative histological images demonstrating low-level transsynaptic spread following AAV9-EF1a-DIO-ChETA-EYFP injection into the MGv. Rare EYFP-positive cortical neurons were observed (median < 1 cell per 0.4 × 0.4 mm² section). Scale bar: 100 µm."

      Reviewer #1 (Recommendations for the authors):

      Thank you for your efforts in revising the manuscript. While progress has been made, I have a few remaining concerns that I hope you can address to further strengthen the study.

      Focus of the Introduction:

      Auditory thalamocortical plasticity is known to be NMDA-dependent, albeit with weaker dependence during early development. Given that this work examines thalamocortical LTP in young adult and aged mice, I recommend refining the Introduction to place greater emphasis on auditory thalamocortical plasticity in the adult brain. The current discussion of somatosensory plasticity during early development, while interesting, seems less directly relevant to the present study. A sharper focus on the auditory system would better frame your research questions.

      We thank the reviewer for this constructive suggestion. We have revised the Introduction to emphasize adult auditory thalamocortical plasticity and to streamline content less directly related to our study. Specifically:

      (1) We now foreground evidence that thalamocortical inputs retain experience-dependent plasticity beyond the critical period in adult ACx, including neuromodulatory pairing, HFS-induced LTP, and experience-dependent reinstatement.

      (2) We explicitly note that adult auditory thalamocortical plasticity is more weakly NMDAR-dependent than in other cortices, thereby motivating our focus on CCK–CCKBR signaling as a permissive mechanism for adult LTP.

      (3) We have condensed the discussion of somatosensory plasticity during early development to a brief background and shifted the focus to adult auditory mechanisms and knowledge gaps that directly frame our research questions.

      These changes appear in the revised Introduction (paragraphs 2–3), which now provide a sharper rationale for investigating CCK‑dependent thalamocortical LTP in young adult and aged mice.

      Introduction section:

      “However, converging evidence shows that thalamocortical inputs retain a capacity for experience-dependent modification in adulthood. Sensory enrichment or deprivation can gate or reinstate thalamocortical plasticity. In the adult ACx, pairing sounds with neuromodulatory drive can reshape cortical representations. In vivo high-frequency stimulation (HFS) of dorsal lateral geniculate nucleus (LGN) or medial geniculate body (MGB) induces LTP in sensory cortices and has been linked to perceptual learning beyond the critical period. Notably, auditory thalamocortical plasticity appears less dependent on NMDA receptors compared to other cortical regions. The mechanisms underlying thalamocortical plasticity in the mature brain remain poorly understood.

      Cholecystokinin (CCK) and its receptor CCK-B receptor (CCKBR) are well positioned to influence thalamocortical transmission: Cck mRNA is abundant in MGB neurons and CCKBR is enriched in layer IV of ACx, the principal thalamorecipient layer.”

      Anatomical Specificity of MGv Targeting:

      The mouse MGv is a small and deep structure, and precise targeting is critical given the functional differences between MGv and MGd pathways. In the current figures:

      Fig. 1A suggests the electrode track may have approached the MGd.

      Fig. 2E indicates some viral spread beyond the MGB.

      Since MGv-AI and MGd-AC pathways exhibit distinct (and sometimes opposing) effects on plasticity, I encourage you to provide additional clarification or verification of the stimulated/infected regions. This would greatly enhance the interpretability of your LTP data.

      Please see above.

      Data Variability and Transparency:

      The reported thalamocortical LTP values exhibit remarkably small standard deviations, which is somewhat unexpected given typical experimental variability in such measurements. To address this concern, it would be helpful to include example raw traces of the recorded LTP (e.g., in a supplementary figure). This would allow readers to better evaluate the data quality and consistency.

      Please see above.

      Reviewer #2 (Recommendations for the authors):

      Overall, the authors did an excellent job of responding to our critiques, both in their direct responses and in the modified text. The modified text is also more readable than before. Two issues that the authors should consider addressing;

      (1) Unless I missed it, there is no commentary stated about the impact of using aged C57 mice, which lose their hearing, such that the effects seen in the older mice could be related to hearing loss rather than aging alone. Some discussion of this point should be made.

      We thank the reviewer for raising this important point. C57BL/6 mice are known to develop age-related hearing loss, which could potentially affect PPI performance in older animals. We note that in our internal screening we observed markedly reduced startle amplitudes and frequent negative PPI values in many mice >20 months, indicating severe auditory impairment. To minimize this confound a priori, we excluded mice older than 20 months and restricted the aged cohort to 17–19 months, which consistently exhibited robust startle responses and reliable PPI. While some degree of presbycusis may still be present in this age range in C57BL/6 mice, the improvement of PPI following CCK administration combined with acoustic exposure indicates that the auditory pathways remained sufficiently functional to support sensorimotor gating. In fact, the presence of partial hearing loss in these aged mice may have allowed us to better detect the beneficial effects of CCK, further highlighting its therapeutic potential for age-related deficits. The greater improvement in PPI observed in older mice —as compared to younger mice, whose PPI in control group is already high—likely reflect the combined effects of age-related hearing loss and CCK deficiency, with CCK-induced restoration of thalamocortical plasticity being the primary focus of our study. We have now added a discussion of this point in the revised manuscript.

      Discussion section:

      “In aged mice, PPI deficits are commonly observed due to impaired auditory processing. Notably, C57BL/6 mice exhibit age-related hearing loss (Johnson et al., 1997). Both age-associated changes in auditory function and CCK deficiency contribute to impaired sensory gating. The presence of partial hearing loss in aged mice may have facilitated the detection of CCK’s beneficial effects, further highlighting its therapeutic potential for age-related deficits. Our results suggest that enhanced thalamocortical plasticity mediated by CCK might partially compensate for these deficits by amplifying residual auditory signals in aged mice.”

      Johnson, K.R., Erway, L.C., Cook, S.A., Willott, J.F., and Zheng, Q.Y. (1997). A major gene affecting age-related hearing loss in C57BL/6J mice. Hearing Research 114, 83-92. https://doi.org/10.1016/S0378-5955(97)00155-X.

      (2) Minor point - I do not agree with the use of the term "ventral to bregma" to describe where the craniotomies were placed (e.g., line 599). The direction being described is more typically referred to as "lateral." If the authors prefer to use the term "ventral," perhaps additional clarification can be added.

      We thank the reviewer for pointing out this issue and apologize for any confusion. We agree that “ventral to bregma” is not the standard terminology and have revised the Methods section to use “below the temporal ridge”. We have also clarified that the craniotomy for accessing the auditory cortex was performed on the lateral aspect of the skull in rodents, just below the temporal ridge. We hope this revision resolves the ambiguity.

      Method section:

      “A craniotomy was performed over the temporal bone, as the auditory cortex is located on the lateral surface of the brain (coordinates: 1.5 to 3.0 mm below the temporal ridge and 2.0 to 4.0 mm posterior to bregma for mice; 2.5 to 6.5 mm below the temporal ridge and 3.0 to 5.0 mm posterior to bregma for rats) to access the auditory cortex.”

      “Six-week after CCK-sensor virus injection, a craniotomy was performed to access the auditory cortex at the temporal bone (1.5 to 3.0 mm below the temporal ridge and 2.0 to 4.0 mm posterior to bregma), and the dura mater was opened.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

      Weaknesses:

      Temporal mechanisms of neuronal specification are found in many nervous systems. However, the relationship between the temporal mechanisms identified in this study and those in other systems remains unclear.

      We will expand the Discussion section to highlight the temporal mechanisms between different nervous systems.

      Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β', and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising

      ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavors are put into understanding how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses Mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:

      (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and

      (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using single-cell transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      We do plan to conduct multi-omics experiments to provide a more comprehensive assessment of neuronal identity upon loss-of-function of E93. However, omics results will be summarized in a new manuscript, but not for the revised manuscript.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn. While the results might be an intrinsic nature of KC types in flies, the interpretation of the reader of the data should be more careful, and the authors should also mention this in their main text.

      We will describe and interpret this part of results in the main text in a more careful manner.

    1. Author response:

      Reviewer 1:

      We appreciate the reviewer’s positive assessment and in revision will expand the Discussion to clarify some of the mechanistic insights of this work, as well as to include expanded treatment of related studies in other model systems.

      Reviewer 2:

      We are grateful for the reviewer’s thorough and supportive comments. We will carefully revise assertions and conclusions for objectivity. Additional analysis of the Zelda experiments will be performed and experimental data tables will be updated to report these results. For the point about providing “insight into models explaining why H3K27me3 is absent prior to NC14,” we have recently submitted a related preprint that addresses this issue directly (Degen, Gonzaga-Saavedra, and Blythe, bioRxiv 2025). In summary, we find evidence that a maternal PcG imprint is indeed maintained through cleavage divisions, albeit through lower-order methylation states (maximally H3K27me2). We chose not to include these additional results in this manuscript to maintain the focus of this study on ZGA. Our revision of this manuscript will include a section in the Discussion that synthesizes the conclusions of the two studies.

      Reviewer 3:

      We thank the reviewer for recognizing the strength of our data and conclusions, and we agree that our results help settle conflicting claims in the field. We will emphasize Zelda’s context-dependent effects more clearly in the revised manuscript.

      References:

      Degen EA, Gonzaga-Saavedra N, Blythe SA. Lower-order methylation states underlie the maintenance and re-establishment of Polycomb modifications in Drosophila embryogenesis. bioRxiv [Preprint]. 2025 Jul 29:2025.07.25.666882. doi: 10.1101/2025.07.25.666882. PMID: 40766521; PMCID: PMC12324246.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis. Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We have further analyzed the mutations on the available PHD2 crystal structures in complex with HIFα to discern how these substitution mutations may impact PHD2 structure and function.  This analysis has been added into the discussion.

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed. We have performed an additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph were changed for clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  A brief paragraph discussing this has been included in the discussion.

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had K[d] of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. 

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K[d]’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We have expanded on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods. 

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data has been added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we have corrected and clarified our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data has been added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells have been added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We have performed an additional experiment as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.  Notably, our conclusion remained unchanged.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants has been added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis.

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2.  The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations.  The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter.  Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants.  The limitations of the luciferase assay have been expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif.  While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD.  We have discussed this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study.  We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog.  However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

      Reviewer #1 (Recommendations for the authors): 

      (1) To increase the impact and significance of this work, I would recommend determining the mechanism by which A228S and F366L impair PHD2. Are these mutations affecting interactions with proteins other than HIF1a? Furthermore, does the F366L mutation affect the hydroxylation rate? This should be measured. The authors should also perform a more in-depth structural analysis of these mutations and perhaps use AlphaFold to identify how these sites may be involved in other interactions. 

      We thank the reviewer for the recommendations.  A paragraph discussing the quandary of A228S and F366L has been added to the discussion as well as an in-depth structural analysis of each selected mutant.  While AlphaFold is excellent at predicting protein structures overall, its capability to predict the effect of single point mutation, such as those in this study, is limited.  Therefore, it was not utilized for this paper.

      (2) For the aggregation assay, I recommended injecting the same quantity of protein on the SEC. If the aggregation-prone mutants' yields were too low, then reduced amounts of the other mutants should be injected. 

      Agreed.  An additional experiment was performed in which similar concentrations of each mutant protein was loaded onto the SEC column and chromatograms was normalized according to the molecular concentration.  Results from this experiment have been added to replace the previously performed aggregation assay.  Notably, the data from the revised experiment did not change the outcome or conclusion of the study.

      (3) For the NMR kinetics data, the authors should discuss the impact of affinities and concentrations on the reaction rate and incorporate this analysis framework to interpret their data. 

      Done.  As discussed in depth in response to Public Reviewer 1’s fourth comment, we observed only a subtle reduction in hydroxylation efficiency of HIF1aCODD by PHD2 P317R in comparison to PHD2 WT.  Upon performing BLI, we found PHD2 P317R displays only a mild binding defect on the CODD and NODD.  The WT-like binding to the NODD by PHD2 P317R appears to be inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.   These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation.

      Reviewer #2 (Recommendations for the authors): 

      It is unclear where the source data came from describing the patient mutations, or if it is publicly available. Several minor issues were noted with several of the figures or methods: 

      (1) Figure 2C. It is not clear what data are being compared for significance. The lines don't seem to clearly distinguish this. 

      Done.  The significance lines have been adjusted in the figure to better convey which data are being compared.

      (2) Please incorporate the calculated biophysical constants (KD, TM, etc, average +/- std dev) from the tables into the figures or figure legends that show the data from which they are calculated.  

      Done.  References to the corresponding tables have been added to the appropriate figure legends.

      (3) Figure 3C, the data for F366L do not appear normalized in the same way as the other constructs. 

      CD melt values for F366L were normalized in the same way as other constructs but due to noisier data acquired between 25-37°C, the top value of the sigmoidal curve is slightly higher than the other constructs (F366L: 1.066, WT: 1.007, A228S: 1.000, P317R: 1.015, R371H: 1.005). 

      (4) For Figure 1B, it would be helpful to highlight the mutants characterized in the current study with a different color/symbol to help show the number of cases. 

      Done.  Dots representing the selected mutants have been highlighted in red in Figure 1B.

      (5) A description of the isotopic labeling of PHD2 is missing from the methods.

      Due to the nature of the NMR assay, no isotopic labeling was required for PHD2.

      Reviewer #3 (Recommendations for the authors): 

      (1) To further strengthen the manuscript, the authors could consider exploring the relevance of their in vitro findings in a more physiological context. 

      We thank the reviewer for the suggestion, and we will certainly consider furthering our investigation in a more physiological context for future studies.

      (2) If technically feasible, integrating direct analyses of HIF2α regulation by the PHD2 mutants would better reflect the clinical phenotype, given the known importance of HIF2α in erythrocytosis. 

      We agree that HIF2α is important in the context of erythrocytosis, but through MST we observed no difference in binding pattern between HIF1 and HIF2 and the selected PHD2 mutants.  As we had previously assigned >95% of residues for HIF1α ODD for NMR assay, we analyzed HIF1 with the supposition that any defects observed would likely apply to HIF2.  However, we agree that future studies on the impact of PHD2 mutants directly on HIF2 would be beneficial to supplement our understanding of pseudohypoxic disease.

      (3) Additionally, although perhaps more suitable for future work or discussion, structural modeling or highresolution structural studies of the P317R variant could offer valuable insight into the observed NODD selectivity defect. 

      We thank the reviewer for the suggestion. While solving the structure of PHD2 P317R in complex with NODD is beyond the scope of this manuscript, a crystal structure of PHD2 P317R in complex with an inhibitor has been solved and insights from this structure have been added to the discussion. 

      (4) Finally, a brief clarification or discussion of the limitations of the luciferase reporter assay-especially in the context of aggregation-prone mutants-would help readers better interpret the functional data. 

      We thank the reviewer for the suggestion.  The limitations of the luciferase reporter assay in regard to its inability to detect defects with aggregation-prone mutants have been elaborated on in the discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a combination of arginine methyltransferase inhibitors synergize with PARP inhibitors to kill ovarian and triple negative cancer cell lines in vitro and in vivo using preclinical mouse models.

      Strengths and weaknesses

      The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.

      They identify the DNA damage protein ERCC1 to be reduced in expression with PRMT inhibitors. As ERCC1 is known to be synthetic lethal with PARPi, this provides a mechanism for the synergy. They use cell lines only for their study in 2D as well as xenograph models.

      We sincerely thank Reviewer #2 for the insightful and constructive feedback, as well as for the kind recognition of the scientific quality of our work: “The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.” We sincerely thank Reviewer #2 for their thoughtful and constructive comments during both rounds of review, which have significantly improved the quality of our manuscript. In response, we have incorporated new results from additional experiments into the figures (Figures 6M and 6N) and made comprehensive revisions throughout the text, figures, and supplementary materials. Following the reviewer’s valuable suggestions, we also revised the Discussion section. In the “Recommendations for the authors” sections, we have provided detailed point-by-point responses to each comment, which were instrumental in guiding our revisions. We believe these updates have substantially strengthened the manuscript and fully addressed all reviewer concerns.

      Reviewer #2 (Recommendations for the authors): 

      Although the authors have addressed each recommendation from the reviewer, further revision of the manuscript are still necessary, as outlined below.

      Add these additional comments in the text to further enhance the comprehension and clarity of the data.

      (1) If the authors kept the tumors of various sizes in Figure 7I, it would be important to assess the protein and/or mRNA level of ERCC1 to further support their mechanism.

      Question (1): Please add the figures of new experiments (treatment diagram, curves for tumor volume and qRT-PCR data) to Figure 6.

      We thank the reviewers for their constructive suggestions. In response to the reviewers’ comments, we have added the treatment diagram and qPCR results to Figure 6. In this experiment, we shortened the treatment duration to seven days to assess early molecular responses to therapy rather than downstream effects. As expected, such short-term treatment did not result in significant differences in tumor growth among groups. The new results are now presented in Figure 6, panels M and N. The corresponding results and figure legends will also be included in the revised version of the manuscript

      (2) Figure 2G: please explain why two bands remain for sgPRMT1.

      Question (2): In the answer, the authors stated, "Upon knockdown of the major isoforms by CRISPR/Cas9, expression of this minor isoform may have increased as part of a compensatory feedback mechanism, rendering it detectable by immunoblotting." Please put the statement into the discussion section.

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the Discussion section to provide greater clarity and context for our findings.

      (3) (Previously point 5) What is the link with ERCC1 splicing because reduced overall ERCC1 expression is clear?

      Question (5): Please add the explanation you provide of links between ERCC1 splicing and PRMTi into the discussion section.

      "Furthermore, as shown in Figure 4G, we observed a reduction in the total ERCC1 mRNA reads following PRMTi treatment. This decrease may be attributed, at least in part, to the instability of the alternatively spliced ERCC1 transcripts, which could be more prone to degradation. In combination with the transcriptional downregulation of ERCC1 induced by PRMT inhibition, these alternative splicing events may lead to a further reduction in functional ERCC1 protein levels. This dual impact on ERCC1 expression, through both decreased transcription and the generation of unstable or nonfunctional isoforms, likely contributes to the enhanced cellular sensitivity to PARP inhibitors observed in our study."

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the Discussion section to provide greater clarity and context for our findings.

      (4) (Previously 6) Figure 7J: From the graph, it seems like Olaparib+G715 and G715+G025 have a similar effect on tumor volume (two curves overlap). Please discuss.

      Question (6): In the answer, the authors stated, "Our in vitro and in vivo findings, together with previously published data, consistently demonstrate that GSK715 is more potent than both GSK025 and Olaparib. Notably, treatment with GSK715 alone led to significantly greater inhibition of tumor growth compared to either GSK025 or Olaparib administered individually. This higher potency of GSK715 also explains the comparable levels of tumor suppression observed in the combination groups, including GSK715 plus Olaparib and GSK715 plus GSK025. These results suggest that GSK715 is likely the primary driver of efficacy in the two drug combination settings." Please put the statement in the corresponding result section for Figure 6J.

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the result section for Figure 6J to provide greater clarity and context for our findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript entitled "Molecular dynamics of the matrisome across sea anemone life history", Bergheim and colleagues report the prediction, using an established sequence analysis pipeline, of the "matrisome" - that is, the compendium of genes encoding constituents of the extracellular matrix - of the starlet sea anemone Nematostella vectensis. Re-analysis of an existing scRNA-Seq dataset allowed the authors to identify the cell types expressing matrisome components and different developmental stages. Last, the authors apply time-resolved proteomics to provide experimental evidence of the presence of the extracellular matrix proteins at three different stages of the life cycle of the sea anemone (larva, primary polyp, adult) and show that different subsets of matrisome components are present in the ECM at different life stages with, for example, basement membrane components accompanying the transition from larva to primary polyp and elastic fiber components and matricellular proteins accompanying the transition from primary polyp to the adult stage. 

      Strengths: 

      The ECM is a structure that has evolved to support the emergence of multicellularity and different transitions that have accompanied the complexification of multicellular organisms. Understanding the molecular makeup of structures that are conserved throughout evolution is thus of paramount importance. 

      The in-silico predicted matrisome of the sea anemone has the potential to become an essential resource for the scientific community to support big data annotation efforts and understand better the evolution of the matrisome and of ECM proteins, an important endeavor to better understand structure/function relationships. This study is also an excellent example of how integrating datasets generated using different -omic modalities can shed light on various aspects of ECM metabolism, from identifying the cell types of origins of matrisome components using scRNA-Seq to studying ECM dynamics using proteomics. 

      We greatly appreciate the positive feedback regarding the design of our study and the evolutionary significance of our findings.

      Weaknesses: 

      My concerns pertain to the three following areas of the manuscript: 

      (1) In-silico definition of the anemone matrisome using sequence analysis: 

      a) While a similar computational pipeline has been applied to predict the matrisome of several model organisms, the authors fail to provide a comprehensive definition of the anemone matrisome: In the text, the authors state the anemone matrisome is composed of "551 proteins, constituting approximately 3% of its proteome (see page 6, line 14), but Figure 1 lists 829 entries as part of the "curated" matrisome, Supplementary Table S1 lists the same 829 entries and the authors state that "Here, we identified 829 ECM proteins that comprise the matrisome of the sea anemone Nematostella vectensis" (see page 17, line 10). Is the sea anemone matrisome composed of 551 or 829 genes? If we refer to the text, the additional 278 entries should not be considered as part of the matrisome, but what is confusing is that some are listed as glycoproteins and the "new_manual_annotation" proposed by the authors and that refer to the protein domains found in these additional proteins suggest that in fact, some could or should be classified as matrisome proteins. For example, shouldn't the two lectins encoded by NV2.3951 and NV2.3157 be classified as matrisome-affiliated proteins? Based on what has been done for other model organisms, receptors have typically been excluded from the "matrisome" but included as part of the "adhesome" for consistency with previously published matrisome; the reviewer is left wondering whether the components classified as "Other" / "Receptor" should not be excluded from the matrisome and moved to a separate "adhesome" list. 

      In addition to receptors, the authors identify nearly 70 glycoproteins classified as "Other". Here, does other mean "non-matrisome" or "another matrisome division" that is not core or associated? If the latter, could the authors try to propose a unifying term for these proteins? Unfortunately, since the authors do not provide the reasons for excluding these entries from the bona fide matrisome (list of excluding domains present, localization data), the reader is left wondering how to treat these entries. 

      Overall, the study would gain in strength if the authors could be more definitive and, if needed, even propose novel additional matrisome annotations to include the components for now listed as "Other" (as was done, for example, for the Drosophila or C. elegans matrisomes). 

      The reviewer is correct to point out the confusing terminology used throughout our manuscript, where both the total of 829 proteins constituting the curated list of ECM domain proteins and the actual matrisome (excluding "others") were referred to as "matrisomes". In general, we followed the example set by Naba & Hynes in their 2012 paper (Mol Cell Proteomics. 2012 Apr;11(4):M111.014647. doi: 10.1074/mcp.M111.014647), where they define the "matrisome" as encompassing all components of the extracellular matrix ("core matrisome") and those associated with it ("matrisome-associated" proteins). This corresponds to our group of 551 proteins, comprising both core matrisome and matrisomeassociated proteins. The Naba & Hynes paper also contains the inclusive and exclusive domain lists for the matrisome that we applied for our dataset. In the revised manuscript, we have now labelled the group of 829 proteins as "curated ECM domain proteins/genes", which includes all proteins positively selected for containing a bona fide ECM domain. After excluding non-matrisomal proteins such as receptors, we arrive at the 551 proteins that constitute the "Nematostella matrisome". We have maintained this terminology throughout the revised manuscript and have revised Figures 1B and 4B accordingly.

      Regarding the category of "other" proteins, which by definition are not part of the matrisome although containing ECM domains, we have taken the reviewer's advice and classified these in more detail. We categorized all receptors as "adhesome" (202 proteins).  The remaining group of “other” secreted ECM domain proteins were then further subcategorized. Those exhibiting significant matches in the ToxProt database were subclassified as "putative venoms" (15 proteins). This group also includes the two lectins (NV2.3951 and NV2.3157), which had been originally shifted to the “other” category due to their classification as venoms. We categorized as “adhesive proteins” (28 proteins) factors such as coadhesins that due to their domain architecture resemble bioadhesive proteins described in proteomic studies of other invertebrate species, such as corals or sponges (see also https://doi.org/10.1016/j.jprot.2022.104506). Further sub-categories are stress/injury response proteins (9 proteins) and ion channels (6 proteins). The remaining 17 proteins were categorized as “uncharacterized ECM domain proteins”. These include highly diverse proteins possessing either single ECM domains or novel domain combinations. We decided to retain those in our dataset as candidates for future functional characterization.

      b) It is surprising that the authors are not providing the full currently accepted protein names to the entries listed in Supplementary Table S1 and have used instead "new_manual_annotation" that resembles formal protein names. This liberty is misleading. In fact, the "new_manual_annotation" seems biased toward describing the reason the proteins were positively screened for through sequence analysis, but many are misleading because there is, in fact, more known about them, including evidence that they are not ECM proteins. The authors should at least provide the current protein names in addition to their "new_manual_annotations". 

      c) To truly serve as a resource, the Table should provide links to each gene entry in the Stowers Institute for Medical Research genome database used and some sort of versioning (this could be added to columns A, B, or D). Such enhancements would facilitate the assessment of the rigor of the list beyond the manual QC of just a few entries. 

      d) Since UniProt is the reference protein knowledge database, providing the UniProt IDs associated with the predicted matrisome entries would also be helpful, giving easy access to information on protein domains, protein structures, orthology information, etc. 

      e) In conclusion, at present, the study only provides a preliminary draft that should be more rigorously curated and enriched with more comprehensive and authoritative annotations if the authors aspire the list to become the reference anemone matrisome and serve the community. 

      Table S1 has been updated to include links to the respective Stowers Institute IDs (first two columns), as well as SwissProt IDs and current descriptions from both the Stowers Institute (SI) and Swissprot.

      In our manual annotations, we prioritized these over automated ones due to the considerable effort invested in examining each sequence individually. The cnidaria-specific minicollagens and NOWA proteins might serve as an example. According to the SI descriptions, the minicollagens are annotated as “keratin-associated protein, predicted or hypothetical protein, collagen-like protein and pericardin”. We classified these as minicollagens on the basis of overall domain architecture and of signature domains and sequence motifs, such as minicollagen cysteine-rich domains (CRDs) and polyproline stretches (doi: 10.1016/j.tig.2008.07.001). NOWA is a CTLD/CRD-containing protein that is part of nematocyst tubules (doi:10.1016/j.isci.2023.106291). The first two NOWA isoforms, according to Si descriptions, were annotated as aggrecan and brevican core proteins, which is very misleading. We therefore feel that our manual annotations better serve the cnidarian research community in classifying these proteins.

      Automated annotations of ECM proteins often rely on similarities between individual domains, neglecting overall domain composition. For example, Swissprot descriptions annotate 31 TSP1 domain-containing proteins in our list as "Hemicentin-1", but closer inspection reveals that only one sequence (NV2.24790) qualifies as Hemicentin-1 due to its characteristic vWFA, Ig-like, TSP1, G2 nidogen, and EGF-like domain architecture. Regarding novel protein annotations, NV2.650 might serve as an example. While SI descriptions annotate this protein as "epidermal growth factor" based on the presence of several EGF-like domains, our analysis reveals two integrin alpha N-terminal domains that classify this sequence as integrin-related. We have therefore assigned a description (Secreted integrin-N-related protein) that references this defining domain and avoids misclassification within the EGF family.

      In cases where the automated annotation (including those in Genbank) matched our own findings, we adopted the existing description, as seen with netrin-1 (NV2.7734). We acknowledge that our manual annotations are not flawless and will be refined by future research. Nonetheless, we offer them as an approximation to a more accurate definition of the identified protein list.

      (2) Proteomic analysis of the composition of the mesoglea during the sea anemone life cycle: 

      a) The product of 287 of the 829 genes proposed to encode matrisome components was detected by proteomics. What about the other ~550 matrisome genes? When and where are they expressed? The wording employed by the authors (see line 11, page 13) implies that only these 287 components are "validated" matrisome components. Is that to say that the other ~550 predicted genes do not encode components of the ECM? This should be discussed. 

      Obviously, our wording was not sufficiently accurate here. In the revised Fig. 1B we indicated that 210 of the 551 matrisome (core and associated) proteins were confirmed by mass spectrometry. In total, 287 proteins were identified by mass spectrometry, meaning that 77 of those are non-matrisomal proteins belonging to the “adhesome” (47) and “other” (30) groups. The fact that the remaining 542 proteins of the matrisome predicted by our in silico analysis could not be identified has two major reasons: (1) Our study was focussed on the molecular dynamics of the mesoglea. Therefore, only mesogleas were isolated for the mass spectrometry analysis and nematocysts were mostly excluded by extensive washing steps. As nematocysts contribute significantly to the predicted matrisome, this group of proteins is underrepresented in the mass spectrometry analysis. (2) A significant fraction of the predicted ECM proteins constitutes soluble factors and transmembrane receptors. These might not be necessarily part of the mesoglea isolates. In addition, the isolation and solubilization method we applied might have technical limitations. Although we used harsh conditions for solubilizing the mesoglea samples (90°C and high DTT concentrations), we cannot exclude that we missed proteins which resisted solubilization and thus trypsinization. We confirmed that all genes predicted by the in silico analysis have transcriptomic profiles as demonstrated in supplementary table S4. We have clarified these points in the revised results part (p.6) and also revised the statement in line 16, page 13.

      b) Can the authors comment on how they have treated zero TMT values or proteins for which a TMT ratio could not be calculated because unique to one life stage, for example? 

      We did not include these proteins in the analysis of the respective statistical comparison. This involved only very few proteins (about 10).  

      c) Could the authors provide a plot showing the distribution of protein abundances for each matrisome category in the main figure 4? In mammals, the bulk of the ECM is composed of collagens, followed by fibrillar ECM glycoproteins, the other matrisome components being more minor. Is a similar distribution observed in the sea anemone mesoglea? 

      We have included such a plot showing protein abundances across life stages and protein categories (Fig. 4A). Collagens and basement membrane proteoglycans (perlecan) are the most abundant protein categories in the core matrisome while secreted factors dominate in the matrisome-associated group.

      d) Prior proteomic studies on the ECM of vertebrate organisms have shown the importance of allowing certain post-translational modifications during database search to ensure maximizing peptide-to-spectrum matching. Such PTMs include the hydroxylation of lysines and prolines that are collagen-specific PTMs. Multiple reports have shown that omitting these PTMs while analyzing LC-MS/MS data would lead to underestimating the abundance of collagens and the misidentification of certain collagens. The authors may want to reanalyze their dataset and include these PTMs as part of their search criteria to ensure capturing all collagen-derived peptides. 

      Thank you for this suggestion. We have re-analyzed our dataset including lysine and proline hydroxylation as PTM. While we obtained in total 70 more proteins using this approach, this additional group did not contain any large collagen or minicollagen we had not detected before. We only obtained two additional collagen-like proteins with very short triple helical domains (V2t013973001.1, NV2t024002001.1), one being a fragment. We don’t feel this justifies implementing a re-analysis of the proteome in our study.

      e) The authors should ensure that reviewers are provided with access to the private PRIDE repository so the data deposited can also be evaluated. They should also ensure that sufficient meta-data is provided using the SRDF format to allow the re-use of their LCMS/MS datasets. 

      We apologize for not providing the reviewer access in our initial submission and have asked the editorial office to forward the PRIDE repository link to all reviewers immediately after receiving the reviews. We did upload a metadata.csv file with the proteomics dataset. This file contains an annotation of all TMT labels to the samples and conditions and replicates used in the manuscript. It contains similar information as an SRDF format file. In addition, the search output files on protein and psm level have been provided. So, from our point of view, we provided all necessary information to reproduce the analysis.

      (3) Supplementary tables: 

      The supplementary tables are very difficult to navigate. They would become more accessible to readers and non-specialists if they were accompanied by brief legends or "README" tabs and if the headers were more detailed (see, for example, Table S2, what does "ctrl.ratio_Larvae_rep2" exactly refer to? Or Table S6 whose column headers using extensive abbreviations are quite obscure). Similarly, what do columns K to BX in Supplementary Table S1 correspond to? Without more substantial explanations, readers have no way of assessing these data points. 

      We have revised the tables and removed any redundant data columns. We also included detailed explanations of the used abbreviations, both in the headers and in a separate README file. Some of the information was apparently lost during the conversion to pdf files. We will therefore upload the original .xls files when submitting the revised manuscript.

      Reviewer #2 (Public review): 

      This work set out to identify all extracellular matrix proteins and associated factors present within the starlet sea anemone Nematostella vectensis at different life stages. Combining existing genomic and transcriptomic datasets, alongside new mass spectometry data, the authors provide a comprehensive description of the Nematostella matrisome. In addition, immunohistochemistry and electron microscopy were used to image whole mount and decellularized mesoglea from all life stages. This served to validate the de-cellularization methods used for proteomic analyses, but also resulted in a very nice description of mesoglea structure at different life stages. A previously published developmental cell type atlas was used to identify the cell type specificity of the matrisome, indicating that the core matrisome is predominantly expressed in the gastrodermis, as well as cnidocytes. The analyses performed were rigorous and the results were clear, supporting the conclusions made by the authors. 

      Thank you. We greatly appreciate the positive assessment of our study.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript by Bergheim et al investigates the molecular and developmental dynamics of the matrisome, a set of gene products that comprise the extracellular matrix, in the sea anemone Nematostella vectensis using transcriptomic and proteomic approaches. Previous work has examined the matrisome of the hydra, a medusozoan, but this is the first study to characterize the matrisome in an anthozoan. The major finding of this work is a description of the components of the matrisome in Nematostella, which turns out to be more complex than that previously observed in hydra. The authors also describe the remodeling of the extracellular matrix that occurs in the transition from larva to primary polyp, and from primary polyp to adult. The authors interpret these data to support previously proposed (Steinmetz et al. 2017) homology between the cnidarian endoderm with the bilaterian mesoderm. 

      Strengths: 

      The data described in this work are robust, combining both transcriptome and proteomic interrogation of key stages in the life history of Nematostella, and are of value to the community. 

      Thank you for your positive assessment of our dataset. 

      Weaknesses: 

      The authors offer numerous evolutionary interpretations of their results that I believe are unfounded. The main problem with extending these results, together with previous results from hydra, into an evolutionary synthesis that aims to reconstruct the matrisome of the ancestral cnidarian is that we are considering data from only two species. I agree with the authors' depiction of hydra as "derived" relative to other medusozoans and see it as potentially misleading to consider the hydra matrisome as an exemplar for the medusozoan matrisome. Given the organismal and morphological diversity of the phylum, a more thorough comparative study that compares matrisome components across a selection of anthozoan and medusozoan species using formal comparative methods to examine hypotheses is required. 

      Specifically, I question the author's interpretation of the evolutionary events depicted in this statement: 

      "The observation that in Hydra both germ layers contribute to the synthesis of core matrisome proteins (Epp et al. 1986; Zhang et al. 2007) might be related to a secondary loss of the anthozoan-specific mesenteries, which represent extensions of the mesoglea into the body cavity sandwiched by two endodermal layers." 

      Anthozoans and medusozoans are evolutionary sisters. Therefore, the secondary loss of "anthozoan-like mesenteries" in hydrozoans is at least as likely as the gain of this character state in anthozoans. By extension, there is no reason to prefer the hypothesis that the state observed in Nematostella, where gastroderm is responsible for the synthesis of the core matrisome components, is the ancestral state of the phylum. Moreover, the fossil evidence provided in support of this hypothesis (Ou et al. 2022) is not relevant here because the material described in that work is of a crown group anthozoan, which diversified well after the origin of Anthozoa. The phylogenetic structure of Cnidaria has been extensively studied using phylogenomic approaches and is generally well supported (Kayal et al. 2018; DeBiasse et al. 2024). Based on these analyses, anthozoans are not on a "basal" branch, as the authors suggest. The structure of cnidarian phylogeny bifurcates with Anthozoa forming one clade and Medusozoa forming the other. From the data reported by Bergheim and coworkers, it is not possible to infer the evolutionary events that gave rise to the different matrisome states observed in Nematostella (an anthozoan) and hydra (a medusozoan). Furthermore, I take the observation in Fig 5 that anthozoan matrisomes generally exhibit a higher complexity than other cnidarian species to be more supportive of a lineage-specific expansion of matrisome components in the Anthozoa, rather than those components being representative of an ancestral state for Cnidaria. Whatever the implication, I take strong issue with the statement that "the acquisition of complex life cycles in medusozoa, that are distinguished by the pelagic medusa stage, led to a secondary reduction in the matrisome repertoire." There is no causal link in any of the data or analyses reported by Bergheim and co-workers to support this statement and, as stated above, while we are dealing with limited data, insufficient to address this question, it seems more likely to me that the matrisome expanded in anthozoans, contrasting with the authors' conclusions. While the discussion raises many interesting evolutionary hypotheses related to the origin of the cnidarian matrisome, which is of vital interest if we are to understand the origin of the bilaterian matrisome, a more thorough comparative analysis, inclusive of a much greater cnidarian species diversity, is required if we are to evaluate these hypotheses. 

      DeBiasse MB, Buckenmeyer A, Macrander J, Babonis LS, Bentlage B, Cartwright P, Prada C, Reitzel AM, Stampar SN, Collins A, et al. 2024. A Cnidarian Phylogenomic Tree Fitted With Hundreds of 18S Leaves. Bulletin of the Society of Systematic Biologists [Internet] 3. Available from: https://ssbbulletin.org/index.php/bssb/article/view/9267

      Epp L, Smid I, Tardent P. 1986. Synthesis of the mesoglea by ectoderm and endoderm in reassembled hydra. J Morphol [Internet] 189:271-279. Available from: https://pubmed.ncbi.nlm.nih.gov/29954165/ 

      Kayal E, Bentlage B, Sabrina Pankey M, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. 2018. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol [Internet] 18:1-18. Available from: https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-018-1142-0

      Ou Q, Shu D, Zhang Z, Han J, Van Iten H, Cheng M, Sun J, Yao X, Wang R, Mayer G. 2022. Dawn of complex animal food webs: A new predatory anthozoan (Cnidaria) from Cambrian. The Innovation 3:100195 

      Steinmetz PRH, Aman A, Kraus JEM, Technau U. 2017. Gut-like ectodermal tissue in a sea anemone challenges germ layer homology. Nature Ecology & Evolution 2017 1:10 [Internet] 1:1535-1542. Available from: https://www.nature.com/articles/s41559-017-0285-5

      Zhang X, Boot-Handford RP, Huxley-Jones J, Forse LN, Mould AP, Robertson DL, Li L, Athiyal M, Sarras MP. 2007. The collagens of hydra provide insight into the evolution of metazoan extracellular matrices. J Biol Chem [Internet] 282:6792-6802. Available from: https://pubmed.ncbi.nlm.nih.gov/17204477/ 

      We agree with the reviewer that only the analysis of several additional anthozoan and medusozoan representatives will yield a valid basis for a reconstruction of the ancestral cnidarian matrisome and allow statements about ancestral or novel features within the phylum. We have therefore revised our statements in the discussion part of the manuscript by implementing the cited literature and also findings from medusozoan genome analysis (e.g. Gold et al., 2018) demonstrating that changes in gene content are as common in the anthozoans as in medusozoans, which questioned the previously stated “basal” state of Nematostella or of anthozoans in general.

      Reviewer #1 (Recommendations for the authors): 

      (1) In Figure 2A, an "o" is missing in the labeling of the "developing cnidcytes" population. 

      Thank you, we have corrected the typo.

      (2) It would be helpful to have the different life stages indicated as headers of the heat maps presented in Figure 4. 

      We have included symbolic representations for the different life stages on top of the heat maps in addition to the respective labels at the bottom.

      Reviewer #2 (Recommendations for the authors): 

      Important changes: 

      (1) Figure 2B The x-axis tissue names should be changed to something more easily readable/understandable - some are clear, but others are not. Perhaps abbreviations could be expanded in the legend. 

      We have expanded the legend in Fig. 2B to render it more easily readable. We have also rotated the maps in A to have them aligned with the ones in Fig.3B.

      (2) Figure 3B This figure would be improved by the inclusion of cluster names, to understand better the mapping. 

      We have added relevant cluster names to Fig. 3B and as stated above aligned the orientation of the maps in Fig. 2B and Fig. 3B.

      (3) Figure 3C As with 2B, I find the y-axis cnidocyte cell state names to be unclear at times. Perhaps abbreviations could be expanded in the legend. 

      All abbreviations were expanded in Fig.3C axis labels.

      (4) Many of the supplementary tables are not well exported or easily readable as is (gene names are truncated, headers truncated, etc), which means that they may not be easily usable by researchers in the field interested in following up on this work in other contexts. Indeed, to be more usable, please consider sharing these supplementary data as .csv files, for example, instead of as .pdfs. 

      We are sorry for this inconvenience, which was obviously caused by the conversion to pdf files. We will upload the original csv files when submitting the revised manuscript.

      Smaller nitpicky comments: 

      (5) Page 2 line 4 & page 3 line 7: Please consider a term other than "pre-bilaterian". The drawing/ordering of a phylogeny of extant species is not meaningful in terms of more or less ancestral. e.g. if the tips are flipped in the drawing of the tree, can we say that bilaterians are pre-cnidarians? What does that mean? 

      We have used that term on the basis that cnidarians existed before the appearance of bilaterians according to the fossil record and molecular phylogenies (McFadden et al., 2021; Adoutte et al., 2000;Cavalier-Smith et al., 1996; Collins, 1998; Kim et al., 1999; Medina et al., 2001; Wainright et al., 1993). To acknowledge remaining uncertainties in the timing of origin of animals, we will use the term “early-diverging metazoans” instead, which is widely accepted in the cnidarian community. 

      (6) Page 3 line 9 I was confused by the use of "gastrula-shaped body" to describe cnidarians, which are on the whole very morphologically diverse and don't all resemble gastrulae (that can also be quite diverse). 

      This term is sometimes used to refer to the diploblastic cnidarian body plan (outer ectoderm, inner endoderm) with a mouth that corresponds to the blastopore. To avoid misunderstandings, we changed it in the revised manuscript to “Cnidarians, the sister group to bilaterians, are characterized by a simple body plan with a central body cavity and a mouth opening surrounded by tentacles.”

      Reviewer #3 (Recommendations for the authors): 

      (1) In general, I felt there was a lot of discussion about protein structure and diversity that is difficult to follow without a figure. I think some of the information in Supplementary Figures S5, S9, and S11 should be in the main figures. 

      Following the reviewer’s suggestion, we have integrated Fig. S5 (collagens) into the main Fig. 2 and Fig. S9 (polydoms) into Fig. 4. As metalloproteases are not extensively discussed in the manuscript (and also due to the large size of the figure) we have kept Fig. S11 as a supplementary figure.

      (2) Page 3, Line 7: The use of the term "pre-bilaterian" is inappropriate. Cnidarians and bilaterians are evolutionary sisters. Therefore, each lineage derives from the same split and is the same age. The cnidarian lineage is not older than the bilaterian lineage. 

      Following a similar request by reviewer 2 we have replaced this term by “early diverging metazoans”.

      (3) Page 5, Line 10. How were in silico matrisomes from early-branching metazoan species predicted? 

      We applied the same bioinformatic pipeline as for the Nematostella matrisome. We clarified this in the respective methods part.

      (4) Page 16, Line 8: This should be Thus. 

      Obviously, the wording of this sentence was ambiguous. We changed it to ”In contrast, the adult mesoglea is significantly enriched in elastic fiber components, such as fibrillins and fibulin. This compositional shift likely adds to the visco-elastic properties (Gosline 1971a, b) of the growing body column (Fig. 4B,D, supplementary table S7).”

    1. Author response:

      We thank the editors and reviewers for their encouraging comments and constructive feedback. We will revise the text to enhance clarity as suggested. New experiments are planned to address questions raised regarding the time course of responses to the hit compounds. We also intend to examine additional endogenous readouts of the integrated stress response, including effects on translation. The effects of lead compound 20 will be examined in a wider range of cells, including primary cells.

    1. Author response:

      We are going to modify the text following Reviewer’s comments and perform embryo direct labelling experiments to experimentally address the contraction of the two “belts” proposed in our model. We feel that this aspect is feasible in a reasonable time and important for the model proposed. We appreciate the relevance of using this framework to identify molecular drivers of the regionalized tissue behaviours uncovered and how these might be altered in mutant models, but feel that these aspects demand efforts beyond the the reasonable revision periods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The manuscript by Senn and colleagues presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS-expressing cells. This study aims to exploit these RAS-targeting circuits as cancer cell classifiers, enabling the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain, the RBDCRD domain of the RAS effector protein CRAF, is fused to the histidine kinase domain, which carries an inactivating amino acid exchange either in its ATP-binding site (N509A) or in its phosphorylation site (H399Q). Dimerization or nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL. The phosphorylated DNA-binding protein NarL, fused to the transcription activator domain VP48, binds its responsive element and induces the expression of the output protein. In comparison to mutated RAS, the effect of the RAS activator SOS-1 and the RAS inhibitor NF1 on the sensing ability as well as the tunability of the RAS sensor were examined. A RAS targeting circuit with an AND gate was designed by expressing the RAS sensor proteins under the control of defined MAPK response elements, resulting in a large increase in the dynamic range between mutant and wild-type RAS. Finally, the RAS targeting circuits were evaluated in detail in a set of twelve cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream. 

      Strengths: 

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines and to function, at least in part, as an RAS mutant cell classifier. 

      Weaknesses: 

      The use of an appropriate "therapeutic gene" might revert the oncogenic properties of RAS mutant cell lines. However, a therapeutic strategy based on this four-plasmid-based system might be difficult to implement in RAS-driven solid cancers. 

      Thank you for the insightful comments. We agree that the delivery of a four-plasmid system represents a major challenge for translating RAS-targeting circuits into therapeutic applications. Reducing the number of plasmids –ideally consolidating all components onto a single vector– will be critical for clinical implementation.

      Viral delivery is generally the most efficient strategy for DNA-based therapies, but viral vectors have limited packaging capacities, which differ by virus type[1]. The RAS_sensor_F.L.T. circuit under the EF1α promoter requires ~7.7 kb for the sensing components alone, excluding the output gene. This exceeds the packaging limit of adeno-associated virus (AAV) and is at the upper boundary for lentiviral vectors but could potentially be accommodated by larger vectors such as γ-retroviruses, poxviruses, or herpesviruses¹. Co-transduction with dual AAVs [2] or ongoing engineering to expand packaging capacity [3] may also offer future solutions. An additional route to reduce construct size could be alternative splicing, especially given redundancy between the two NarX fusion proteins[4]. 

      An advantage of our current architecture is that synthetic response elements replace constitutive promoters, reducing construct size. For example, the MAPK-driven PY2_NarX&NarL circuits range between 4.9 and 5.2 kb depending on the transactivation domain, bringing them within AAV packaging limits for the sensor module[5], though co-delivery of the output gene would still be necessary. For lentiviruses, this is within the packaging capacity of 8 kb<sup>1</sup> and would allow for inclusion of ~3 kb output genes.

      Still, assembling multiple modules onto a single vector introduces new challenges, including possible crosstalk or interference between neighboring promoters [6]. For example, placing the output gene too close to MAPK response elements may trigger unwanted MAPKdependent expression, potentially bypassing the intended AND-gate logic. Moreover, expressing three genes under separate response elements may shift expression ratios and reduce circuit functionality. Nonetheless, the absence of constitutive promoters and the RAS-dependence of MAPK response elements could provide partial robustness, since even unintended activation would still reflect RAS signaling to some extent. Further, our data (Fig. 1d) show that some deviation in component levels can be tolerated, provided all parts are sufficiently expressed. Nonetheless, assembling the circuit on a single vector will require careful design and rigorous validation to ensure optimal performance. 

      While addressing this is beyond the scope of the current study, we agree that future efforts should focus on vector consolidation and delivery strategies. We now include a paragraph discussing these challenges in the revised manuscript.

      Reviewer #2 (Public review): 

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision. 

      Thank you very much for the thoughtful evaluation, precise critique, and constructive suggestions.

      As correctly noted, our study initially focused on developing and optimizing input sensors and processing units for synthetic gene circuits targeting mutated RAS. To address the concern regarding therapeutic relevance, we have now incorporated functional validation using a clinically relevant output protein: herpes simplex virus thymidine kinase (HSV-TK), which converts ganciclovir into a cytotoxic compound. We replaced the mCerulean reporter with HSV-TK and tested the resulting RAS-targeting circuits in both RAS-mutant and wild-type cancer cell lines. The results, now presented in a new chapter (Figure 8 and Supplementary Fig. 14), demonstrate robust killing of RAS-mutant cells and support the potential therapeutic utility of these circuits.

      Major comments: 

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs. 

      Thank you for this helpful suggestion. We have updated the introduction to reflect the rapidly evolving landscape of RAS-targeting therapies, including the development of inhibitors for nonG12C mutations such as KRASG12D (e.g., MRTX1133). Given the pace and breadth of these advances, we also refer readers to a recent comprehensive review that provides an in-depth overview of current RAS-targeting strategies.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented. 

      To further support the generalizability of our RAS sensor, we titrated plasmid doses for a panel of oncogenic RAS variants, including multiple KRAS mutants as well as HRAS<sup>G12D</sup and NRAS<sup>G12D</sup. Across all tested variants, we observed concentration-dependent activation of the RAS sensor. At 1.67 ng/well, the sensor output for all oncogenic RAS variants was at least as high as that for KRAS<sup>G12D</sup>, suggesting that the behavior observed in our initial design and optimization is representative of a broader set of RAS mutations.

      We also noted that high overexpression of wildtype HRAS and NRAS can lead to substantial activation of the sensor, exceeding that observed with wildtype KRAS. This underscores the importance of considering all RAS isoforms when assessing circuit specificity and avoiding potential off-target activation in healthy cells.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]). 

      Thank you for pointing this out. We repeated the experiment to reassess the effect of NF1 on RBDCRD-NarX-SYFP2 expression and were able to confirm statistical significance. Accordingly, we have replaced Figure 2a with updated data. To facilitate better visual comparison across conditions, we also standardized the y-axis range across all relevant flow cytometry plots.

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc. 

      Thank you for this insightful comment. We agree that fluorescent reporters are limited to approximating expression levels, and that a functional output protein is more appropriate for assessing therapeutic potential. To address this, we replaced mCerulean with the therapeutic suicide-gene, HSV-TK, and tested the circuits in RAS-mutant and wild-type cancer cell lines. These experiments demonstrate that our circuits can express functional proteins and induce cell death in two RAS-mutant cell lines while showing low toxicity in a RAS wild type cell line (new chapter including Fig. 8 and Supplementary Fig.14). 

      Comparing confluence of cells transfected with the RAS-targeting circuits to cells transfected with non-toxic GFP-output negative control or the constitutively expressed EF1αHSV-TK positive control allowed us to estimate the killing-strength of the circuits in each cell line. In RAS-mutant HCT-116 the confluence curves were similar to the positive control, indicating effective killing (Fig. 8b). At lower DNA dose in HCT-116, or in SW620 with lower transfection efficiency, the killing of transfected RAS-driven cancer cells was less pronounced, falling approximately midway between the controls (Fig. 8g&j). In the RAS wild type cell line, Igrov-1, the RAS circuits showed continued growth similar to the non-toxic negative control (Fig. 8d), suggesting low toxicity. 

      While this may indicate low circuit activation in Igrov-1, an alternative explanation for the low toxicity could also be insufficient transfection efficiency. Testing in SW620 –which had similar transfection efficiency as Igrov-1 (Supplementary Fig. 14a)– showed that this moderate transfection efficiency was sufficient for RAS-circuit-dependent killing (Fig. 8d & 8g), supporting the notion of low activation in Igrov-1 and selective cytotoxicity in RAS-driven cancer cells.

      Nonetheless, it is important to note that comparisons between the cell lines need to be interpreted cautiously because of inter-cell line differences in transfection, growth, and HSV-TK/ganciclovir (GCV)-sensitivity (Supplementary Fig. 14) and further validation will be essential. 

      A conclusive assessment will require more efficient delivery strategies, such as viral vectors (as discussed above). Efficient delivery would allow to investigate selectivity in a more realistic setting with patient-derived RAS-mutant cancer and healthy cells as well as testing in an vivo model. While beyond the scope of the current study, we view it as a critical direction for future work and have therefore added a paragraph about this to our discussion.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12Dresponsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? 

      This is a great point. We agree that the observed differences in output levels (Fig. 2) could arise from non-linear amplification due to increased expression of RBDCRD-NarX, rather than RAS binding or dimerization alone. To further investigate this possibility, we performed titrations of KRAS<sup>G12D</sup> in combination with the functional RAS sensor and a series of constitutively active and inactive control constructs (Supplementary Fig. 4).

      Inactive controls lacking NarX dimerization showed only a modest increase in output expression, similar to direct mCerulean expression under the EF1α promoter. Transfection of the output plasmid alone, with NarL, or with NarL and non-RAS-binding RBD<sup>R89L</sup> CRD<sup>C168S</sup> -NarX, resulted in minimal RAS-dependent increases (Supplementary Fig. 4a). Importantly, after normalization using the EF1α-driven mCherry transfection control, these effects were fully or even slightly over-compensated (Supplementary Fig. 4b), showing that we don’t include the effect of EF1α-dependent increased leakiness in the data presented throughout the manuscript, but also that –due to the normalization– we potentially underestimate the dynamic range of the RAS-targeting circuits.

      In contrast, constitutively dimerizing NarX controls (both membrane-bound and cytosolic dimerized via the FKBP–FRB system) exhibited a more pronounced RAS-dependent increase in output –even after normalization– confirming the presence of non-linear amplification (up to 3–4fold). However, this effect was still lower than that achieved with the functional RAS-binding sensor (8-fold at 1.67 ng/well KRAS<sup>G12D</sup>; 14-fold at 5–15 ng/well), indicating that the increase in expression of the sensor parts is not the full explanation of the effect we see. Instead, RAS binding and dimerization further amplify the response and are necessary for full activation (Supplementary Fig. 4b).

      We also addressed the reviewer’s suggestion by testing the MAPK response elements used in Fig. 4f with constitutively dimerizing NarX. These controls generally showed lower fold changes between KRAS<sup>G12D</sup>; and KRAS<sup>WT</sup> than the corresponding RAS-binding circuits  (Supplementary Fig. 7), with one exception: the combination of SRE_NarX and PY2_NarL-VP48. 

      Together, these data show that non-linear amplification via increased expression and dimerization contributes to output activation. However, RAS binding and induced dimerization of the NarX sensor are required for full functionality and enhanced signal strength. This underscores that integrating the MAPK response elements with the binding-based RAS sensor into RAS-targeting circuits generally improves the distinction between cells with KRAS<sup>G12D</sup>;  and KRAS<sup>WT</sup> and that it was the combination that allowed to reach maximal fold changes.

      It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered. 

      Thank you for this comment. We now mention in the manuscript the potential mechanisms by which (over-)activated RAS or MAPK signaling can increase protein synthesis. We cite relevant reports of the mechanisms we found, including upregulation of translational initiation and machinery[10]  and ribosomal biogenesis[11].

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, bindingtriggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated. 

      Thank you for this input. We understand that comparing the results from HEK293 cells transfected with KRAS<sup>G12D</sup>;  or KRAS<sup>WT</sup> (Fig. 5d) to those from HCT-116<sup>WT</sup>    and HCT-116<sup>k.o</sup>. cells (Fig. 6b–d) may be misleading if interpreted as a direct comparison of RAS signaling levels. Our intent was not to compare HEK293 with KRAS<sup>WT</sup> directly to HCT-116<sup>k.o</sup>.., but rather to contrast the behavior of the EF1α-driven RAS sensor and the MAPK-responsive RAS-targeting circuits within each cell line context.

      Specifically, we observed that in HEK293 cells expressing KRAS<sup>G12D</sup>, the MAPK-based RAS-targeting circuits produced higher output than the EF1α-expressed RAS sensor. In contrast, in HCT-116<sup>WT</sup> cells, the EF1α-expressed RAS sensor resulted in higher output levels than the RAS-targeting circuits. Despite this, the MAPK-driven circuits showed an improved dynamic range compared to the EF1α-expressed RAS sensor in HCT-116, due to the reduced background expression in the HCT-116<sup>k.o</sup>.. cells. We have revised the manuscript text to clarify this distinction.

      We agree that an HCT-116<sup>k.o</sup> cell line with stable integration of KRAS<sup>WT</sup> would provide a more direct comparison. Nonetheless, HCT-116<sup>k.o</sup>.. cells still express endogenous NRAS and HRAS, both of which are capable of activating the RAS sensor (as shown in Fig. 1g). Therefore, we believe that HCT-116<sup>k.o</sup>. cells are more comparable to HEK293 with KRAS<sup>WT</sup> than to the NF1 condition in Fig. 2, in which all endogenous RAS isoforms are inactivated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text. 

      Thank you for this helpful observation. The figure references were indeed incorrect due to a typo. The results discussed in the text refer to Figure 6f (not 6g), which is now Figure 7a in the revised version. To further highlight these findings, we have added a new Figure 7b that better illustrates how different MAPK response elements enabled us to identify, for each RAS-mutant cell line, a RAS-targeting circuit that showed stronger activation than in all RAS wild-type lines. We have also expanded the corresponding section in the main text to elaborate on these results and their significance.

      Reviewer #3 (Public review): 

      Summary: 

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation. 

      This approach is interesting. The design is novel and could be implemented for other RASmediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting aberrant RAS activity. 

      Strengths: 

      Novel circuit design, through optimization and characterization of the circuit components, solid data. 

      Weaknesses: 

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims. 

      Thank you very much for the thoughtful and supportive comments. We fully agree with the reviewer’s suggestions for improving the translational potential of the RAS-targeting circuits.

      As a first step toward therapeutic relevance, we replaced the fluorescent reporter with HSV-TK, a clinically validated suicide gene, and demonstrated killing in RAS-mutant cancer cell lines. This is described above and in the new section of the manuscript (Figure 8).

      We also agree that testing in patient-derived cancer cells and especially healthy cells with wild-type RAS activity will be essential. However, testing in primary or patient-derived cells presents delivery challenges: transient transfection of our current four-plasmid system is unlikely to achieve sufficient expression. As discussed in our response to Reviewer #1, development of a more efficient delivery strategy –such as viral vector-based delivery– is a necessary next step.

      Once a delivery system is established, identifying relevant off-target tissues throughout the body with high physiological RAS signaling will be key to assessing selectivity. While comparative data on RAS activation across healthy tissues are scarce[12,13], recent atlases of transcription factor activity[14,15] provide insights to identify off-target cells with high activation of RAS-dependent transcription factors and may even approximate RAS activity across healthy tissue. Alternatively, our single-input sensors for RAS and MAPK pathway activity could be used in vivo to identify off-target cells based on endogenous activity.

      Once relevant target and off-target cells have been identified, patient-derived cancer and healthy cells can help select and adapt cancer-specific RAS-targeting circuits and nominate therapeutic candidates for further safety and efficacy assessment[6,8].

      Reviewer #1 (Recommendations for the authors): 

      For the most part, the data in this study are very convincing and very well presented. The cartoons make it easier to understand the complex experimental setups. 

      (1) Did the authors use wild-type Sos-1 or a constitutively active membrane-bound catalytic domain in their studies? How is SOS-1 activated when in case Sos-1 wild-type was used? 

      Thank you for this feedback. We used the constitutively active catalytic domain of Sos-1 (AA5641049; PDB ID 2II0). 

      (2) Figure 1f: In case of KRAS-G12D, it looks like the output expression does not really correlate with the RAS-GTP level. Can the authors give an explanation? 

      Thank you for this interesting question. We believe the observed discrepancy arises primarily from differences in the sensitivity and readout dynamics of the two assays. The RAS-GTP pulldown ELISA appears insufficiently sensitive to detect small changes in RAS-GTP levels at lower KRAS<sup>G12D</sup> plasmid doses (0.19, 0.56, or 1.67 ng). Only at 5 ng and 15 ng do we observe clear increases in RAS-GTP signal (25% and 700%, respectively). In contrast, the RAS sensor shows strong activation already in the 0.56–5 ng range but begins to saturate at higher doses (see Figure 1f and Figure 1e).

      Beyond the differing technical sensitivities of the ELISA (plate reader) and flow cytometry, an important conceptual distinction may further explain this behavior: the RAS sensor likely integrates RAS signaling over time. Once NarX binds RAS-GTP and dimerizes, it activates NarL, triggering mCerulean expression. If the rate of mCerulean production exceeds its degradation, signal accumulates throughout the assay duration. Thus, the flow cytometry readout reflects time-integrated signaling, allowing small differences in RAS-GTP to be amplified into measurable differences in output—especially at low input levels. This may explain why flow cytometry detects circuit activation earlier and more steeply than the pulldown assay, which provides a snapshot of RAS-GTP abundance at a single time point and saturates less readily at high input levels.

      Together, these factors likely explain the observed differences in signal dynamics: the RAS sensor exhibits steep activation followed by saturation at high plasmid doses (flow cytometry), while the ELISA shows limited sensitivity at low doses but a broader linear range at higher doses.

      (3) Figure 2b: It appears that even in the case of KRAS-G12D and Sos-1, only a few cells are positive. Does this result depend on low cell density, low transfection efficiency, or a wide range of the expression level? As a control, nuclear staining could be shown. 

      Thank you for this question. In the experiment shown in Figure 2b, our goal was to assess the membrane localization of the RBD^CRD-NarX-SYFP2 construct, which serves as a proxy for RAS-bound sensor. To enable accurate computational segmentation and separation of membrane signal from adjacent cells, we intentionally reseeded cells at low density in glassbottom plates for confocal imaging.

      The observed variability in signal likely reflects a combination of transient transfection and heterogeneous expression levels. While the overall transfection efficiency was approximately 70%, expression varied between individual cells. To account for this, we analyzed the membrane-to-total signal ratio per cell, which internally normalizes the membrane signal to the total cellular expression of SYFP2 and controls for differences in transfection efficiency.

      In response to the reviewer’s suggestion, we have updated the figure to include nuclear staining to aid interpretation. We would like to emphasize, however, that the images are intended to illustrate subcellular localization per cell, not expression frequency or intensity across the population.

      Minor points 

      (1) Figure 1b: "The third plasmid expresses NarL, .." should be changed to "The third plasmid expresses NarL-VP48, .." 

      Done

      (2) Figure 1c, right part: The orange arrow should be labeled NarX-H399Q (not N509A). 

      Done

      (3) Supplementary Table 6 and 7: [cells/wells] - should probably be [cells 10*3/well]. 

      Thank you for these points, we updated the manuscript accordingly

      Reviewer #2 (Recommendations for the authors): 

      Minor comments: 

      (1) N509A seems mislabeled in Figure 1b. 

      (2) It would help the readers if the authors could elaborate a bit on what is known about the RBD and CRD mutations used here. 

      Thank you for the input, we added a paragraph in the paper to expand on the effect of these commonly used mutations.

      (3) The KRASWT&Sos1 condition is not explained within the text for Figure 1f, which is the first figure with the KRASWT&Sos1 condition, but rather later on for Figure 2a. Adding a description of this condition to the discussion of Figure 1f would add clarity to this figure. 

      Thank you, we corrected this.

      (4) Citing AlphaFold2 structural predictions as having "revealed that longer linkers between the sensor's RBDCRD and NarX-derived domains could bring the NarX domains into closer proximity" is probably an overstatement. AlphaFold2 generally has low confidence in the placement of long flexible linkers, and the longer linkers in the illustration could facilitate NarX and NarL being even farther apart than they are in the original design. 

      Thank you for this input. We agree that AlphaFold2 predictions generally have low confidence in the placement of long, flexible linkers, and we did not intend to imply that the structural models were predictive of actual linker conformations. Rather, the models were used heuristically to generate the hypothesis that longer linkers might facilitate better positioning of the NarX domains for dimerization.

      As described in the Methods, we manually rotated the flexible linker regions to explore plausible conformations. These exploratory models showed that with a short (1x GGGGS) linker, it was more challenging to bring the NarX domains into close proximity, whereas longer linkers allowed greater positional flexibility. This modeling exercise provided a structural rationale for experimentally testing longer linkers. We have revised the manuscript text to clarify that the structural predictions were used to motivate linker design –not to validate or predict structural outcomes.

      (5) Figure 3b shows that the fold change (KRASG12D/KRASWT) is higher at shorter linker lengths and lower at longer linker lengths, and that the output expression of mCerulean is lower at shorter linker lengths and higher at longer linker lengths. Having a bar plot with the output expression mCerulean levels comparing KRASG12D and KRASWT next to each other would be a significantly more informative representation of this data. In particular, the readers might be interested in understanding the effect of linker length on off-target activation from the sensor, which is not clear from this figure. 

      Thank you for the suggestion. We adapted Figure 3b to better present this. 

      (6) While it is implied that the sentence "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range." is comparing these RAS binding domains to RBDCRD, for clarity it should be noted what the point of reference is for this "slightly higher or similar dynamic range." 

      (7) Claims are made throughout the text that require supporting data, and thus require a reference to a figure, but there are a few instances where the reference is several sentences after the discussion of data and findings begins. For example, the discussion of Figure 3c begins with the claim "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range," but there is no reference to the data or figure being discussed until the end of the discussion of Figure 3c. This formatting is also present in Figure 3d and Figure 6f. 

      Thank you for mentioning these imprecisions and inconsistencies, we addressed them in the manuscript. 

      (8) In Figures 5d and 5e, the formatting of underscores and dashes is occasionally inconsistent within the text. (ex. "PY2_NarX_FLT or PY2_NarL-FLT" on page 13.). 

      Thank you for this precise observation. The formatting differences were intentional and reflect distinct design principles. Specifically:

      An underscore (e.g., PY2_NarX_FLT) denotes that two separate proteins are expressed –here, PY2-driven RBDCRD-NarX and EF1α-driven NarL-F.L.T.

      A dash (e.g., PY2_NarL-F.L.T.) indicates a fusion protein –i.e., PY2-driven NarL-F.L.T. combined with EF1α-driven RBDCRD-NarX.

      This notation is used to distinguish expression sources and fusion constructs while avoiding redundancy with the base circuit (EF1α_NarX + EF1α_NarL-VP48). We hope the included schematic diagrams in each relevant figure helps the reader interpret these combinations.

      (9) The text claims that "loss-of-function mutations in RBDCRD decreased activation. However, the dynamic range was only 3-fold" and attributes this claim to Figure 6a. For a claim about specific fold-change activation, one would expect a corresponding figure with quantitative measurements of this fluorescence to be referenced. 

      Thank you for this remark. We made a supplementary figure (Supplementary Fig. 11) to show the quantitative measurement of the 3-fold dynamic range between HCT-116<sup>WT</sup> and HCT-116<sup>k.o</sup>. when using the EF1a-expressed RAS sensor with NarL-VP48.

      (10) The claim of this Figure 2d is that the effect of RAS-GTP levels on mCerulean output is amplified in comparison to Figures 2a, 2b, and 3c, representing expression, RAS binding, and dimerization respectively. While visually this might be true from the figure, the readers might be confused by the lack of significance between the control and the NF1 condition, alongside the variation between the triplicates. Could this experiment be repeated to gain clearer data and to support their claim more effectively? 

      Thank you for this important observation. To address the concern regarding variability and statistical significance in Figure 2d, we repeated the experiment using 24-well plates to increase the number of cells analyzed per condition. This improved the consistency of the data and allowed us to reduce variability across replicates. As a result, we now observe a statistically significant difference between the control and the NF1 condition. The updated results are shown in the revised Figure 2.

      (11) The readers might be less familiar with the concept of "composability" than "modularity" and it would be good to explain it if the authors did intend to use the former. 

      Thank you for this comment. We changed it to modularity to avoid confusion. 

      References

      (1) Shahryari, A., Burtscher, I., Nazari, Z. & Lickert, H. Engineering Gene Therapy: Advances and Barriers. Advanced Therapeutics vol. 4 Preprint at https://doi.org/10.1002/adtp.202100040 (2021).

      (2) Mcclements, M. E. & Maclaren, R. E. Adeno-Associated Virus (AAV) Dual Vector Strategies for Gene Therapy Encoding Large Transgenes. YALE JOURNAL OF BIOLOGY AND MEDICINE vol. 90 (2017).

      (3) Wagner, H. J., Weber, W. & Fussenegger, M. Synthetic Biology: Emerging Concepts to Design and Advance Adeno-Associated Viral Vectors for Gene Therapy. Advanced Science vol. 8 Preprint at https://doi.org/10.1002/advs.202004018 (2021).

      (4) Doshi, J., Willis, K., Madurga, A., Stelzer, C. & Benenson, Y. Multiple Alternative Promoters and Alternative Splicing Enable Universal Transcription-Based Logic Computation in Mammalian Cells. Cell Rep 33, 108437 (2020).

      (5) Wu, Z., Yang, H. & Colosi, P. Effect of genome size on AAV vector packaging. Molecular Therapy 18, 80–86 (2010).

      (6) Dastor, M. et al. A Workflow for in Vivo Evaluation of Candidate Inputs and Outputs for Cell Classifier Gene Circuits. ACS Synth Biol 7, 474–489 (2018).

      (7) Preuß, E. et al. TK.007: A novel, codon-optimized HSVtk(A168H) mutant for suicide gene therapy. Hum Gene Ther 21, 929–941 (2010).

      (8) Angelici, B., Shen, L., Schreiber, J., Abraham, A. & Benenson, Y. An AAV gene therapy computes over multiple cellular inputs to enable precise targeting of multifocal hepatocellular carcinoma in mice. Sci Transl Med 13, (2021).

      (9) Mesnil, M. & Yamasaki, H. Bystander Effect in Herpes Simplex Virus-Thymidine Kinase/Ganciclovir Cancer Gene Therapy: Role of Gap-Junctional Intercellular Communication 1. CANCER RESEARCH vol. 60 http://aacrjournals.org/cancerres/articlepdf/60/15/3989/2478218/ch150003989.pdf (2000).

      (10) Proud, C. G. Ras, PI3-kinase and mTOR signaling in cardiac hypertrophy. Cardiovascular Research vol. 63 403–413 Preprint at https://doi.org/10.1016/j.cardiores.2004.02.003 (2004).

      (11) Azman, M. S. et al. An ERK1/2driven RNAbinding switch in nucleolin drives ribosome biogenesis and pancreatic tumorigenesis downstream of RAS oncogene. EMBO J 42, (2023).

      (12) von Lintig, F. C. et al. Ras activation in normal white blood cells and childhood acute lymphoblastic leukemia. Clin Cancer Res 6, 1804–10 (2000).

      (13) Guha, A., Feldkamp, M. M., Lau, N., Boss, G. & Pawson, A. Proliferation of human malignant astrocytomas is dependent on Ras activation. Oncogene 15, 2755–2765 (1997).

      (14) Pan, L. et al. HTCA: a database with an in-depth characterization of the single-cell human transcriptome. Nucleic Acids Res 51, D1019–D1028 (2023).

      (15) Pan, L. et al. Single Cell Atlas: a single-cell multi-omics human cell encyclopedia. Genome Biol 25, (2024).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1:

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition.

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition in MP-46  CDX model treated with our BAF ATPase inhibitor can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Reviewer 3:<br /> Supplementary Figure 2C<br /> Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

      We thank the reviewer for bringing this to our attention. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well-written, and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth-inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with the loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study including the strong challenge of the on-target effect, the assays used, and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors introduce the field stating that SMARCA4 inhibitors are more effective in SMARCA2 deficient cancers and the converse. Since the desirable outcome of cancer therapy would be synthetic lethality it is not clear why a dual inhibitor is desirable. Wouldn't this be associated with more side effects? It is not known how the inhibitor developed here impacts normal cells, in particular T cells which are essential for any durable response to cancer therapies in patients. Another weakness is that the UM cell lines used do not molecularly resemble metastatic UM. These UM most frequently have mutations in the BAP1 tumor suppressor gene. It is not clear if the described SMARCA2/4 inhibitor is efficacious in BAP1 mutant UM cell lines in vitro or BAP1 mutant patient-derived xenografts in vivo.

      We thank the reviewer for their insightful and constructive comments. As we demonstrate in Fig. 1d, uveal melanoma cells are selectively and deeply sensitive to BAF ATPase inhibition, and provides a therapeutic window. This is confirmed in Fig. 4a-c, as we demonstrated robust tumor growth inhibition, achieved at a dose well-tolerated in xenograft study. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017) and manuscript describing results of this clinical trial is currently in preparation.

      As the reviewer mentioned, BAP1 loss is a signature of metastatic uveal melanoma. MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity that work through a different mode as previously developed SMARCA4/SMARCA2 inhibitors. They also demonstrate the anti-tumor effects of the compounds on uveal melanoma cell proliferation and tumor growth. The findings indicate that the drugs exert their effects by altering chromatin accessibility at binding sites for lineage-specific transcription factors within gene enhancer regions. In uveal melanoma, altered expression of the transcription factor, SOX10, and SOX10 target gene underlies the anti-proliferative effects of the compounds. This study is significant because the discovery of new SMARCA4/SMARCA2 inhibitory compounds that can abrogate uveal melanoma tumorigenicity has therapeutic value. In addition, the findings provide evidence for the therapeutic use of these compounds in other transcription factor-dependent cancers.

      Strengths:

      The strengths of this manuscript include biochemical evidence that the new compounds are selective for SMARCA4/SMARCA2 over other ATPases and that the mode of action is distinct from a previously developed compound, BRM014, which binds the RecA lobe of SMARCA2. There is also strong evidence that FHT1015 suppresses uveal melanoma proliferation by inducing apoptosis. The in vivo suppression of tumor growth without toxicity validates the potential therapeutic utility of one of the new drugs. The conclusion that FHT1015 primarily inhibits SMARCA4 activity and thereby suppresses chromatin accessibility at lineage-specific enhancers is substantiated by ATAC-seq and ChIP-seq studies.

      Weaknesses:

      The weaknesses include a lack of more precise information on which SMARCA4/SMARCA2 residues the drugs bind. Although the I1173M/I1143M mutations are evidence that the critical residues for binding reside outside the RecA lobe, this site is conserved in CHD4, which is not affected by the compounds. Hence, this site may be necessary but not sufficient for drug binding or specifying selectivity. A more precise evaluation of the region specifying the effect of the new compounds would strengthen the evidence that they work through a novel mode and that they are selective. Another concern is that the mechanisms by which FHT1015 promotes apoptosis rather than simply cell cycle arrest are not clear. Does SOX10 or another lineage-specific transcription factor underlie the apoptotic effects of the compounds?

      We thank the reviewer for the valuable comments.

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      The reviewer also poses a great question regarding the mechanism of apoptosis. The mechanism of apoptosis is extremely complex, but we observed a decrease in pro-survival BCL-2 protein expression in response to FHT-1015, in the experiment corresponding to Supplementary Fig. 5e. In the experiment described in Fig. 3k, we also monitored caspase 3/7 activity over time, and SOX10 overexpression rescued 92-1 cells from FHT-1015 induced apoptosis. This suggests the role of SOX10 as an important mediator of response to BAF ATPase inhibition, including apoptosis induced by FHT-1015.

      Additional Reviews:

      The referees would like to draw the authors' attention to the following issues that would best benefit from additional revision. 

      The clinical relevance of the study would be strengthened by the use of uveal melanoma cell lines with BAP1 mutations that better represent metastatic uveal melanoma. The use of patient-derived xenografts would also be pertinent and would be a useful addition. Similarly, attention to the effects of the inhibitor on non-cancerous proliferative cells such as blood/T/immune cells would also strengthen the manuscript. As the study reports the administration of one of the inhibitors in mice for the xenograft experiments, it would be important to assess any potential effects on blood cell counts and better discuss the eventual toxicity or lack of toxicity and how it was assessed. 

      The authors should better explain how SOX10 over expression can rescue viability in the presence of the inhibitor. Similarly given the critical roles of BRG1, SOX10, and MITF in cutaneous melanoma some specific discussion on the sensitivity of cutaneous melanoma cells to the inhibitor should be considered, and potential differences with uveal melanoma highlighted. 

      Aside from these issues, the authors are urged to consider the other points mentioned below. 

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1d, as well as the text in the manuscript referring to this figure, would benefit from indicating specific cell lines used for UM. The same for the sentence in line 153. 

      We thank the reviewer for bringing this to our attention. We have added the cell line names and updated the manuscript accordingly.

      For any of the studies conducted, is there any link with the genetics of UM? E.g. BAP1 wildtype/BAP1 mutant? 

      As addressed above in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Row 191 - How were peaks classified as enhancer-occupied? 

      We used annotatePeaks function of HOMER package to annotate genomic locations, as well as H3K27ac ChIP-seq to annotate peaks as enhancer-occupied. We thank the reviewer to pointing it out and have updated the manuscript accordingly to include this information.

      Row 259, the two cell lines should be named, also in Figure 3i. 

      We have added the cell line names and updated the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      As a proof of concept, this study is truly excellent and the authors should be commended. However, it is desirable that new knowledge in cancer is translated to the clinic. To this end there are a few things needed to strengthen the study. 

      I am rephrasing my statements from the public review to say that I would recommend testing the inhibitor in T cells (side effects) and BAP1 mutant cell lines (for clinical relevance). 

      As addressed in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Regarding concerns for any potential side effect on T cells, we observed an increase in both CD4 and CD8 T-cell populations in the peripheral blood and the spleen, when naïve, non-tumor bearing CD-1 mice were dosed with SMARCA2/4 dual ATPase inhibitor FHD-286 once daily for 14 days. FHD-286 is a compound similar to FHT-1015 described in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/). In addition, FHD-286 has been tested in tumor bearing syngeneic models. When B16F10 tumor bearing C57BL/6 were dosed with FHD-286 for 10 days, we observed an increase in CD69+ activated CD8 T-cell infiltration in the tumor microenvironment (doi:10.1136/jitc-2022-SITC2022.0888).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Determine drug binding by crystal structure or generate additional SMARCA4 or SMARCA2 mutations in the region near I1173/I1143 that are not conserved in CHD4 and test them in an ATPase assay for effects on drug inhibition. For example, Q1166 in SMARCA4 and Q1136 in SMARCA4 could be changed to Alanine as in CHD4. Would this abrogate drug inhibition? 

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      (2) The finding that SOX10 can rescue the antiproliferative effects of FHT1015 suggests that SMARCA4 is primarily needed for SOX10 expression. However, the co-occupancy of SMARCA4 and SOX10 at enhancers suggests that they cooperate to promote chromatin accessibility. It is unclear how over-expression of SOX10 can promote chromatin accessibility in drug-inhibited cells since SOX10 does not have chromatin remodeling activity. ATAC-seq in cells over-expressing SOX10 and treated with the drug could identify SOX10-dependent targets that do not require SMARCA4 activity and clarify the mechanism. It would also be informative to determine if SOX10 over-expression abrogates the effects of FHT1015 on both cell cycle and apoptosis, helping to resolve whether it is a partial or complete rescue of proliferation. 

      We agree that running ATAC-seq in cells overexpressing SOX10 would clarify this mechanism. However, shifts in corporate strategy deprioritized any further experiments for this project. One potential mechanism that SOX10 overexpression can partially rescue BAF inhibition phenotype is through overexpressed SOX10 localizing to open chromatin regions (mostly promoters) across the genome. We know from our ATAC-seq data (Fig. 2) that BAF inhibition leads to loss of chromatin accessibility at SOX10 enhancer sites, while promoter regions are only partially affected. Therefore, we think that overexpression of SOX10 would allow upregulation of its target genes via binding to the promoter regions. In this model, the enhancer-driven SOX10 target genes are likely to remain silenced.  

      (3) Although the in vivo studies indicate that the drugs are well-tolerated, additional in vitro studies to determine the effects of the drug on the proliferation/survival of non-cancerous cells would further validate their therapeutic utility.

      Author Response: The reviewer raises a critical question. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017), and it was well tolerated at continuous daily dose of up to 7.5 mg QD and at intermittent dose of up to 17.5 mg QD.  Manuscript describing results of this clinical trial is currently in preparation.

    1. Author response:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth.  Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number.  It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler.  I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c<sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct.  The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values.  A simple way to determine this number is to have the simulation code print the value to which c<sub>max</sub>  is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values.  In another section of this response I will describe how to do this with the simulation code written and used by Siljestam and Rueffler; doing so confirms the value that I obtained with my own code.  Furthermore, I will now give a theoretical derivation of this factor.

      As specified by Siljestam and Rueffler, the positions of the m pathogens in (m-1)-dimensional antigenic space correspond to the vertices of a regular simplex centered at the origin, with distance between vertices equal to 1.  The squared distance from the origin to each of the m vertices of such a simplex is (m-1)/2m (https://polytope.miraheze.org/wiki/Simplex).  Thus, the sum of the m squared distances is (m-1)/2.  For the (0, 0) homozygote, condition is multiplied by a factor of exp(-(vr)<sup>2</sup>/2) for each pathogen, where r is the distance from the origin.  It follows that, with v=20, all the pathogens together decrease condition by a factor of exp(20<sup>2</sup>∙(m-1)/4) = exp(100∙(m-1)).  Thus, increasing or decreasing m by 1 changes this value by a factor of exp(100) = 2.7∙10<sup>43</sup>.

      This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      That shows only that the results are not extremely sensitive to c<sub>max</sub> or K.  They are, nonetheless, exquisitely sensitive to m and v.  This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c<sub>max</sub> a change large enough to have a major effect on the results.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v=20.  As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v.  This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions.  Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable.  I will therefore describe how my conclusions about sensitivity to parameter values can be verified using the simulation code provided by Siljestam and Rueffler themselves, with only small, easily understood modifications.  I will consider adding this description as a supplement when I revise the manuscript.

      The starting point is the Matlab file MHC_sim_Dryad.m, available at https://doi.org/10.5061/dryad.69p8cz98j.  First, we can add a line that prints the value of the variable logcmax, which represents the natural logarithm of cmax determined and used by the code.  Below line 116 (‘prework’), add the line ‘logcmax’ (with no semicolon).

      Now, at the Matlab prompt, execute MHC_sim_Dryad(false, 8, 20, 1) to run the simulation for the Gaussian model with m=8, v=20, and K=1.  The output will indicate that logcmax=700, in accord with the theoretical factor exp(100*(m-1)) derived above.  The allelic diversity, n<sub>e</sub>, will rise to a steady state-level of about 140, as in the red curve of my Fig. 2.

      Now lower m to 7, i.e,  run MHC_sim_Dryad(false, 7, 20, 1).  The output will indicate that logcmax=600.  This confirms that lowering m by 1 causes the code to lower the value of c<sub>max</sub> by a factor exp(100)=2.7∙10<sup>43</sup>, which must also be the factor by which the condition of the most fit homozygote would increase without this adjustment.

      With the change of m to 7 and the compensatory change in c<sub>max</sub>, steady-state allelic diversity remains high.  But what if m changes but c<sub>max</sub> remains the same, as it would in reality?

      To find out, we can fix the value of c<sub>max</sub> to the value used with m=8 by adding the following line below the line previously added: ‘logcmax = 700’.  With this additional modification in place, executing MHC_sim_Dryad(false, 7, 20, 1) confirms that without a compensatory change to c<sub>max</sub>, lowering m from 8 to 7 mostly eliminates allelic diversity, in accord with the corresponding curve in my Fig. 2.  Similarly, raising m from 8 to 9, or changing v from 20 to 19.5 or 20.5 (executing MHC_sim_Dryad(false, 8, 19.5, 1) or MHC_sim_Dryad(false, 8, 20.5, 1)), largely eliminates diversity, confirming the other results in my Fig. 2.  Results for the bitstring model can also be confirmed, though this requires additional changes to the code.

      Thus, the extreme sensitivity of the results of Siljestam and Rueffler to parameter values can be verified with the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”.

      Response to Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem.  However, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c<sub>max</sub>.  Rather, they describe the adjustment of c<sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”.  Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>).  In this sense there is no loss of generality, but the automatic adjustment of c<sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I had hoped that the final paragraph of the Discussion would make the basis for the title clear.  I will consider whether this can be clarified in a revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Comments on revised version: 

      I have reviewed the revised manuscript and read the rebuttal. The authors have carefully addressed my concerns. There is however one point that needs further work: 

      This follows up on my major point #1 in my initial review. I had I asked the authors to demonstrate that FOLFIRI + AZD are less toxic to untransformed colorectal cells than colorectal cancer cell lines.  It is good to see that the authors took my advice and show effects of the drug treatments on the untransformed colorectal cell line CCD841. It seems to be less sensitive to AZD and FOLFIRI in the figure in the rebuttal. What surprises me is that I cannot find these new figures anywhere in the revised manuscript. Also, the data seem preliminary, because I do not see any standard errors in the graphs, and I cannot find a description of the time of drug incubation. I ask the authors to make sure that the CCD841 data are reproducible, and make sure they incorporate the data in the revised manuscript. 

      We thank the reviewer for this insightful comment. In the initial revised version of the manuscript, we did not include results from the untransformed colorectal cell line CCD841, as those experiments had only been performed once and were considered preliminary. However, we fully agree with the reviewer on the importance of including these data.

      To address this, we have repeated the experiments in CCD841 cells to ensure reproducibility. We now report the results from three independent experiments testing the combination of AZD2858 and FOLFIRI on healthy epithelial colon cells. These results are shown in Supplementary Figure S7, where blue matrices represent cell viability and black matrices reflect the level of synergy between AZD2858 and FOLFIRI.

      Our results confirm that, individually, each drug has little to no effect on healthy cells, and no consistent synergistic interaction was observed, except in Experiment 1, which could not be reproduced. Importantly, the drug concentrations used were identical to those applied in the cancer cell experiments, allowing for direct comparison between normal and malignant cell responses.

      Reviewer #2:

      Comments on latest version: 

      Morano et al. have revised their manuscript in response to the points raised by reviewer #3 as follows.

      (1) Fig. 2E: Correcting the previously erroneous labelling of this Fig. makes it match the textual description. 

      (2) Figs 3A and B: The revised textual description of the flow cytometry BrdU incorporation is now precise. 

      (3) Fig. 3E: Removing the suspect WB images is a pragmatic decision that does not significantly affect the overall conclusions of the paper. 

      (4) Fig. 3D: Despite its puzzling appearance this data is now described accurately in the text as "DSBs remained elevated after the combined treatment" rather than "increased after the combined treatment. A more convincing increase in the presumed damaged DNA band is evident in Fig. 4D when AZD2858 is combined with a much lower concentration of SN38 (1.5nM) which could mean that the concentration used in Fig. 3D (300nM) induced maximal damage that could not be further enhanced. 

      We thank the reviewer for their thoughtful comments and constructive feedback, which have helped us improve the clarity and rigor of the manuscript.

      Reviewer #3:

      Comments on latest version: 

      The authors have addressed most of the concerns that I raised in the first round of revision and I have no further questions. I appreciate the authors's efforts in carrying out an preliminary in vivo work, although as the authors pointed out the compound seems to be not effective in vivo. Future work is desired to address this to clarify the significance of the work. 

      We thank the reviewer for acknowledging our efforts in addressing the previous concerns. We also appreciate the recognition of our preliminary in vivo work. While these results suggest limited in vivo efficacy of the compound at this stage, we agree that additional studies will be necessary to fully evaluate its therapeutic relevance. We consider this an important next step and are committed to pursuing it in future work.

    1. Author response:

      General Statements

      In this paper we demonstrate that the lipid packing of the plasma membrane has a huge impact on the stability of caveolae. By using interdisciplinary techniques, we show that the widely used dynamin inhibitor Dyngo-4a adsorbs and inserts to lipid bilayers leading to a decreased lipid packing and hence reduced caveolae dynamics and internalization even in cells lacking dynamin. We have added experiments that validates that Dyngo-4a treatment does not result in fragmentation or disassembly of the caveolae.  A FRAP assay of cytosolic caveolae has been employed to address questions concerning scission. Moreover, as suggested by the reviewers, we have also included new simulation data that show and expand on the fact that Dyngo-4a positions in the lipid leaflet similar to cholesterol and preferentially associates with cholesterol clusters, affecting the spatial distribution of cholesterol in the membrane. We believe that these added data have greatly improved the paper and strengthened our conclusions that the lipid packing is a critical determinant in the balance between internalization and stable plasma membrane association of membrane vesicles.

      As requested, we have expanded the introduction to provide more detailed information about previous findings in the field. Changes and addition to the text has been highlighted in red for easier tracking.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors use Dyngo-4a, a known Dynami inhibitor to test its influence on caveolar assembly and surface mobility. They investigate, whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol.

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a.

      Overall, in this reviewers opinion, claims 1, 3, 4, 5 are well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      However, there is no convincing assay for caveolar endocytosis presented besides the "caveola duration" which although unclearly described seems to be the time it takes in imaging until a caveolae is not picked up by the tracking software anymore in TIRF microscopy.

      Since the main claim of the paper is a mechanism of caveolar endocytosis being blocked by Dyngo-4a, a true caveolar internalization assays is required to make this claim. This means either the intracellular detection of not surface connected caveolar cargo or the quantification of caveolar movement from TIRF into epifluorescence detection in the fluorescence microscope. Otherwise, the authors could remove the claim and just claim that caveolar mobility is influenced.

      We thank the reviewer for the nice constructive comments, and we very much appreciate the positive critique. We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. In addition, we are currently preforming CTxB HRP experiments to quantify the number of caveolae at PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Reviewer #1 (Significance):

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020).

      The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This manuscript uses the small molecule dynamin inhibitors dynasore and dyngo to show that in dynamin triple knockout cells that these inhibitors impact lipid packing and organization in the plasma membrane. Data showing that dyngo affects caveolin dynamics using tirf microscopy is also shown and is interpreted to reflect inhibition of caveolae scission from the membrane.

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion that dyngo is preventing caveolae scission from the membrane. Study of caveolae endocytosis is based on a TIRF assay that has inherent limitations:

      - Caveolae are defined as bright cav1-positive spots in diffraction limited TIRF and their disappearance presumed to be endocytic events. Cav1 spots are presumed to be caveolae but the authors do not consider that they may be flat non-caveolar oligomers. The diffraction limited TIRF approach interprets the large structures as caveolae but evidence to that effect is lacking.

      This is a valid comment and to address this we have now included data showing colocalization of cavin1 and EHD2 to the Cav1-GFP spots. We can however not determine if they are flat or invaginated. We do have extensive experience imaging caveolae using TIRF microscopy and carefully chose cells that display low expression of fluorescently labelled caveolin to avoid non-caveolar structures.

      - The analysis (and the diagram presented in figure 4) considers that caveolae can either diffuse laterally in the membrane or internalize and does not consider that caveolae can flatten and possibly fragment in the membrane. Is it not possible that loss of Cav1 spots is a fragmentation event and not necessarily a scission event?

      This is a good question, yet, fragmentation and disassembly would result in shorter track durations and this is not what is observed in data. We have now also included data showing that cavin1 is persistently associated with the Cav1 spots identified as caveolae during Dyngo-4a treatment indicating that these are caveolae. Furthermore, IF stainings showing colocalization of Cav1GFP with cavin1 or EHD2 after Dyngo-4a treatment have also been added. We have now also expanded on the different interpretations of the data in the results section.

      - The analysis is based on overexpression of Cav1-GFP that may alter the stoichiometry between Cav1 and cavin1 such that while caveolae may be expressed, larger non-caveolar structures may accumulate.

      Yes, this is correct, we have specifically imaged cell expressing low levels of Cav1-GFP to avoid accumulated non-caveolar structures that can be spotted in cells with high expression.

      - Cav1 has been shown to be internalized via the CLIC pathway (Chaudary et al, 2014) and if dyngo is impacting clathrin then maybe it is also impacting CLIC endocytosis and thereby Cav1 endocytosis via this pathway?

      Dyngo-4a has been shown to not affect CLIC endocytosis (McCluskey et al., 2013) and in our data we do not see internalization following Dyngo-4a treatment.

      - The longer Cav1 TIRF track time and shorter displacement with dyngo is consistent with inhibition of caveolae scission. However, as the authors discuss, could not reduced membrane undulations due to dyngo's impact on membrane order be responsible for the longer tracks? Alternatively, perhaps the altered lipid packing is corralling Cav1 movement and reducing non-caveolar Cav1 endocytosis, resulting in shorter tracks of longer duration? The proposed interaction of dyngo with cholesterol could prevent scission but also stabilize large (flat?) Cav1 oligomers in the membrane, perhaps reducing Cav1 oligomer fragmentation.

      We completely agree that membrane undulations contribute to instability of the TIRF-field and therefore disruption of cav1-GFP tracks as we discuss in the results section and have been described in previous work (Larsson et al., 2023). Yet, we have also shown that internalization of caveolae results in shorter tracks (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015). Furthermore, the tracked Cav1-GFP spots are persistently positive for cavin1 both with and without Dyngo-4a treatment showing that the majority do not disassemble become internalized by other pathways. Additionally, the added IF stainings after 30 min Dyngo-4a treatment also show that the Cav1-GFP spots remain positive for cavin1 and EHD2 just as ctrl-treated cells.

      My point here is not to discredit the data but only to suggest that the TIRF approach used is an indirect measure of caveolae scission from the membrane that requires substantiation using other approaches.

      We appreciate these comments and have tried to address these by adding new data and discussions on the interpretation of the tracking data in the results section.

      Dyngo is certainly generally affecting lipid packing via cholesterol and thereby affecting Cav1 dynamics in the plasma membrane. The claim of caveolae scission should be qualified and alternative possibilities considered and discussed. If the authors persist in arguing that dyngo is affecting caveolae scission then the effect should be substantiated by accumulation of caveolae by quantitative EM and high spatial and temporal resolution imaging of Cav1 and cavin1 to define the endocytic events. As the latter represents a new, and potentially very challenging, line of experimentation, I would suggest that it is beyond the scope of the current study. As indicated above the additional experiments are not necessary and qualification of the claims would be sufficient.

      We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. We are also currently preforming CTxB HRP experiments to quantify the number of caveolae at the PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Other points

      Figure 1C - Cav1 positive spots cannot be interpreted to be caveolae from diffraction limited confocal images. Same comment applies to Fig 4G - caveola? duration.

      We completely agree with this and that the claims should be qualified. We have added IF stainings showing that the Cav1-GFP structures are also positive for cavin1. We have now clarified that we cannot distinguish between flat or different curved states of caveolae using this methodology. We have also changed the labelling of Fig. 4G.

      Figure 4C - it is not clear why this EM data is not quantified - for both the number of caveolae and clathrin coated pits - as this would help clarify the interpretation of the effect reported.

      We are currently preforming CTxB HRP experiments to quantify the number of caveolae using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Figure 4D - the AFM experiments should perhaps be repeated as the non-significant effect of dyngo on the Young's modulus may be a result of insufficient n values.

      We would like to clarify that to ensure the robustness of our AFM measurements, we performed the experiments with sufficient biological and technical replicates. Specifically, each data point shown in Figure 4D represents a Young’s modulus value averaged from approximately sixty force-distance curves per cell. For each condition, we collected force-distance maps on eight to nine individual cells, obtained from two separate petri dishes per day. We repeated this process on two independent days. In total, we analysed thirty-one cells for the DMSO control and thirty-three cells for the Dyngo-4a treatment. We performed the “student’s t-test with Welch’s correction” to access the statistical significance between the two conditions, as described in the main text. We believe that the sample size and statistical approach are sufficient to support the conclusions presented. Furthermore, we also analysed cell stiffness by calculating the slope of the linear portion of the force-distance curves. This analysis also did not reveal any statistically significant differences between the conditions (data not shown), further supporting our conclusion that Dyngo-4a treatment does not significantly alter the Young’s modulus under our experimental setup (or conditions).

      Reviewer #2 (Significance):

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion is that dyngo is preventing caveolae scission from the membrane.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Larsson et al present experimental and computational data on the role of Dyngo4a (a compound that was developed to inhibit dynamin) on the dynamics of caveolae. The manuscript mostly documents effects of Dyngo on caveolae, with one experiment to suggest a mechanism for this result. This one rather unconvincing result forms the focus of the manuscript contributing to a disconnect between the data and the presentation. Additionally, there are concerns with data interpretation. The writing could also benefit from revision to address grammar mistakes, strengthen referencing, and increase precision. Overall, the manuscript requires substantial revisions before being considered for publication. The central claim, in particular, needs stronger evidence to support the proposed mechanism.

      We thank the reviewer for the thorough review and for experimental suggestions that we believe has strengthened our data further.

      Significant issues (in approximate order of importance):

      (1) The data supporting the central mechanistic explanation appears limited. There is no evidence that Dyngo remains in one leaflet

      The simulations show that the energy barrier for moving in between bilayers is very high. Furthermore, simulations of C-Laurdan has shown that it does not readily flip in between membrane leaflets (Barucha-Kraszewska et al., 2013) supporting that it reports on the outer lipid leaflet when added to cells. We have however now changed this and state that Dyngo-4a decreased the lipid order in the plasma membrane.

      - the GP of the PM is very low compared to previous measurements,

      The absolute GP-values will vary between setups depending on what filters are used so they are not comparable between laboratories. What is of importance is that we found a significant change in the relative GP-values in cells treated with Dyngo-4a and control cells. It is this change that we report. We have not performed any GP-measurements on this cell type earlier so it is unclear what previous measurements reviewer #3 are referring to.

      - effects on other membranes are not explored,

      The order of the intracellular membranes is as expected lower than that of the plasma membrane. Differentiating different intracellular membranes of interest like endocytotic vesicles from other intracellular membranes would be very difficult but, more importantly, our study is focused on what is happening in the plasma membrane where caveolae reside and would be of minor interest for plasma membrane dynamics.

      - dynamin-directed effects of Dyngo are not considered,

      In the discussion section we discuss the difficulties with disentangling dynamin-direct and indirect effects.

      (2) The QCM-D measurements and claims require explanation as several aspects remains unclear. In Fig S2, the 'softness' (what does this mean?) changes by 4-fold with DMSO alone (what does this mean?), then fractionally more with Dyngo. Then fractionally more again when Dyngo is removed (why?). Then it remains somewhat higher when both Dyngo and DMSO are removed, which is somehow interpreted as Dyngo remaining in the bilayer, but not DMSO.

      We understand the confusion of the reviewer and hope our explanations provide clarity. QCM-D measurements are based on an oscillating quartz crystal sensor. Specifically, alterations in oscillation frequency (ΔF) and the rate of energy dissipation from the sensor surface (ΔD) are what is measured. Allowing the measurement of: 1) materials adsorbing to the sensor surface, 2) changes in the viscoelastic properties of a solution in contact with the sensor surface, 3) changes in the material adsorbed to the sensor surface upone exposure to different solutions. The ratio of ΔD/-ΔF reports the mechanical softness or rigidity of an adsorbed material, in this case the SLB.

      A “buffer shift” is the term used when there is not an adsorption to the sensor surface, but rather an effect from altering the solution above the sensor surface. One reason is because different solutions can have different densities (e.g., a DMSO-buffer mixture vs buffer alone), which impacts the oscillations of the sensor. It was observed that the DMSO-buffer mixture alone gave a large buffer shift in comparison to the adsorption of the Dyngo-4a into the SLB, thereby muddling the data interpretation. Thus, in Fig. S2 the system was first equilibrated with the DMSO-buffer mixture prior to addition of the Dyngo-4a solution to allow for clearer visualization of the two events. In QCMD to assess if something has made a permeant change to the system you change back to the solutions used before the addition, thus first we washed with a DMSO-Buffer mixture followed by buffer alone. Control experiments were carried out in which no Dyngo-4a was added (also shown in Fig. S2). The control shows the same “buffer shift” from the DMSO-buffer mixture occurs in both systems and that upon returning to a buffer only condition there is no permanent change to the system caused from exposure to the DMSO. In contrast, once the system that received Dyngo-4a is changes back to a buffer only system we see that mass has been added to the system (ΔF) with little change to the dissipation (ΔD), thereby resulting in a lower ratio of ΔD/-ΔF, which is to say that the SLB after the adsorption of Dyngo-4a was more rigid that the SLB without Dyngo-4a.

      These interpretations are difficult to grasp, as the authors seem to be implying simple amphiphilic partitioning into the membrane, which should all be removable by efficient washing.

      Amphiphilic partitioning is not fully reversible by “efficient washing” it depends on partitioning coefficients.

      I do not doubt that this compound interacts with membranes, but the quantifications appear ambiguous. A bilayer with 16 mol% (or worse, 30% if all in one leaflet) Dyngo is very unlikely (to remain a bilayer). Even if such a bilayer was conceivable, the authors are claiming an ADDITION of Dyngo that would INCREASE the area of one leaflet by 30%, which needs explanation as it appears unlikely.

      We understand that in our attempt provide numbers in the results section for the amount of binding observed in QCM-D, this can easily be interpreted as this is what is observed to insert into the PM. However, as discussed in the discussion, we also see aggregations of Dyngo-4a that associate with the membrane in the simulations which likely could contribute to the binding observed in QCM-D prior to washing. The precise amount of membrane inserted Dyngo-4a is difficult to measure as we discuss in the text. In order to make this clearer, we have now moved all these details to the discussion section where we elaborate on this. Furthermore, since Dyngo-4a, like cholesterol, is intercalating in between the head groups of the lipids the area would not increase in direct proportion to the mol%.

      Also, there are no replicates shown, so unclear how reproducible these effects are?

      For clarity, only single experiments are shown. However, multiple experiments were performed and the range in measured values for 3 technical repeats can be observed in the standard deviations found in the main text (e.g., 6 ± 2 mol%).

      (3) The simulations are insufficiently described and difficult to interpret. How big are these systems? Why do the figures show the aqueous system with lateral boundaries?

      There are no explicit boundaries used in the simulations, periodic boundary conditions are applied in all three dimensions. The lateral boundaries observed in the figures correspond to the simulation box edges and are a visual artifact of 2D projections with QuickSurf representation. No artificial wall or constraints were introduced laterally. Additional technical details, including the system size and periodic boundary conditions have now been added to the methods section.

      It seems quite important that multiple Dyngo molecules aggregate rather than partition into membranes - is this likely to occur in experiment?

      Yes, this is important and with the additional simulation experiments suggested by Reviewer #3 it has been clarified that they contribute a great deal to the change in lipid packing of lipid bilayers containing cholesterol.  However, it is hard to test aggregation is the cellular system, but we believe that this happens and contribute to the effect on membranes. We have now emphasized the effect of the aggregates in the text.

      PMF simulations are strongly suggesting that Dyngo does not spontaneously cross membranes, which is inconsistent with its drug-like amphiphilicity (cLogP~2.5 is optimally suited for membrane permeation) and known effects on intracellular proteins. This suggests an artefact in these PMFs.

      As stated in the submitted version of the manuscript, logP was used to validate the topology and the observed value was in a very good agreement with cLogP. Moreover, this validation complemented the standard procedure of CHARMM-GUI ligand modelling, that provided a reasonable penalty score (around 20) for the Dyngo-4a topology. POPC and cholesterol molecules are standard in the force field and validated by numerous studies. The parameters used for the membrane simulations and AWH in particular are very common for this type of studies. Thus, we do not see what may cause any artifacts in the free energy profile construction. In fact, amphiphilicity of the molecule may be one of the key reasons that Dyngo-4a molecule remains at the aqueous interface of the membrane and does not cross the membrane spontaneously. Also, we believe that the energy barrier of 40-60 kJ/mol is not prohibitively high and Dyngo-4a molecules may still overcome the barrier eventually, though we expect majority to reside in the upper leaflet.

      The authors should experimentally measure the permeation of Dyngo through bilayers (or lack thereof), to more robustly support their finding that Dyngo does not cross membranes spontaneously.

      We thank the reviewer for the suggestion, however this if very technically challenging and would require establishment of precise systems which is beyond the scope of this manuscript.

      (4) Why not measure effect of Dyngo on lipid packing directly and more broadly in model membranes?

      With the added modelling experiments supporting the previous simulations and the calculated GP values from the C-Laurdan experiments on cellular plasma membrane, we do not find it necessary to include more model membranes experiments than the already existing ones on lipid monolayers and supported lipid bilayers.

      (5) Statistics should not be done on individual cells (n>26), but rather on independent experiment (N=3?)

      We have performed the statistics on live cell particle tracking according to previous literature on similar systems (Boucrot et al., 2011; Larsson et al., 2023; Shvets et al., 2015; Stoeber et al., 2012).

      (6) Fig 1G is important but rather unclear. Firstly, these kymographs are an odd way to show that the caveolae are not moving. More importantly, caveolae in normal cells have been shown to be quite stable and immobile (eg doi: 10.1074/jbc.M117.791400), yet here they are claimed to be very mobile.

      Although this might be an odd and unconventional way to depict dynamic processes, we believe that this is a very illustrative way to show track stability over time in bulk rather than just a kymograph over a few structures in a cell. Furthermore, we are not claiming that caveolae are very mobile but rather the opposite very stable in agreement with previous work (Boucrot et al., 2011; Larsson et al., 2023; Mohan et al., 2015). We have now edited the text to make this even clearer.

      Also, if Dyngo prevents caveolae scission, there should be more of them at the membrane - why no quantification like Fig 1C to show accumulation of caveolae upon Dyngo treatment? Or directly counting caveolae via EM, as in Fig 4C?

      We are currently preforming CTxB HRP experiments using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long. However, Dynasore has previously been shown, by EM, to increase the number of caveolae at the PM (Moren et al., 2012; Sinha et al., 2011).

      (7) The writing can be made more precise and referencing could be strengthened.

      The introduction was written in a short format, and we have now extended this and made it more precise.

      Some examples:

      (a) 'scissoned' is not a word in English,

      Thanks, we have now changed this.

      (b) what is meant by "Cav1 assembly is driven by high chol content"? There are many types of caveolin assemblies.

      We agree that this can be made more precise and have now clarified this in the introduction.

      (c) "This generates a unique membrane domain with distinct lipid packing and a very high curvature." Unclear what 'this' refers to and there is no reference here, so what is the evidence for either of these claims? Caveolin-8S oligomers are not curved. Perhaps 'this' is caveolae, but they are relatively large and also not very highly curved and I am unaware of measurements of lipid packing therein.

      Caveolae are around 50 nm which in biology is a very high curvature of a membrane. It has been extensively proven that caveolae have a distinct lipid composition highly enriched in cholesterol and sphingolipids, which thereby also will generate a unique lipid packing as compared to the surrounding membrane. Yet, the reviewer is correct that lipid packing has not been measured in a caveola for obvious technical challenges. Thus, we have now changed the text to “special lipid composition”.

      The sentence following that one again makes a specific, but unreferenced, claim.

      (d) intro claims that lipid packing is critical for fission, but it is unclear quite what is meant by this claim. The references do not help, as they are often about the basic biophysics of lipids, rather than how packing affects fission.

      We have now edited the text.  

      (e) intro strongly implies that caveolae remain membrane attached because of stalled scission. How strong is the evidence for this? The fact that EHD2 is at the neck is not definitive,

      We used the term stalled scission to describe that all omega shaped membrane invaginations do not scission in the same automatic way as clathrin coated vesicles. We have now changed this in the text. Caveolae are shown to be released (undergo scission) and be detected as internal caveolae if the protein EHD2 is removed. Hence this must be interpreted as if EHD2 stalls scission. The evidence includes data compiled over the last 12 years from others and us which include for example: 1) Caveolae with EHD2 have a longer duration time (Larsson et al., 2023; Mohan et al., 2015; Moren et al., 2012; Stoeber et al., 2012), Knock down of EHD2 results in more internalized caveolae as measured by CTxB HRP using EM (Moren et al., 2012) and shorter duration time at the PM (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015; Stoeber et al., 2012). 2) EHD2 overexpression results in less internalized caveolae as measured by CTxB HRP using EM (Stoeber et al., 2012). Furthermore, 3) overexpression or acute addition of purified EHD2 via microinjection counteracts lipid induced scission of caveolae and hence result in caveolae stabilization at the PM (Hubert et al., 2020). It is very hard to see that the release and internalization of caveolae could result from anything else than that these have undergone scission. EHD2 has been found around the rim of caveolae (Matthaeus et al., 2022) and overexpression of EHD2 oligomerizing mutants have been shown to expand the caveola neck (Hoernke et al., 2017; Larsson et al., 2023).

      (f) unclear what is meant by 'lipid packing frustration' and how Dyngo supposedly induces it.

      Lipid packing frustration refers to what is usually referred to as lipid packing defect, but since lipid membranes are describe as a fluid system it should not have defects whereby, we believe that lipid packing frustration is more accurate. However, we have now changed the text and use “decreased lipid packing” or “decreased lipid order” more thoroughly to describe the effect on the plasma membrane.

      (8) IF of Cav1 is insufficient to claim puncta as caveolae. Co-stained puncta of caveolin with cavin are much stronger evidence. Same issue for Cav1-GFP puncta.

      We agree and have now provided IF showing cavin1 and EHD2 colocalization to Cav1GFP in non and Dyngo-4a-treated cells.

      (9) Fig 3E claims that "preferred position of Dyngo-4a was closer to the head groups" but the minimum looks to be in similar place as Fig 3B without cholesterol. Response:

      We appreciate the reviewer’s observation. The PMF minima in the POPC and POPC:Chol membranes are indeed close in absolute position (~1.1–1.2 nm from the bilayer center). However, as clarified in the revised text, the presence of cholesterol leads to a slight shift of Dyngo-4a closer to the headgroup region and broadens the positional distribution. This is also evident from the added density profiles (Fig. S3A) and is now described more precisely in the manuscript.

      Critically, these results do not support the notion that Dyngo affects lipid packing sufficiently, which is not measured in the simulations (though could be).

      We thank the reviewer for the excellent suggestion. In response, we have now included a detailed analysis of Dyngo-4a’s effect on lipid packing in the simulations. As described in the revised manuscript, we measured deuterium order parameters, area per lipid (APL), and lipid–Dyngo–cholesterol spatial distributions (Figs. 3-H, S3C-E). The results demonstrate that Dyngo-4a decreases lipid order in POPC:Chol membranes. Both single molecules and clusters reduce the order parameter by up to 0.04 units, particularly in the upper leaflet, where Dyngo-4a reside.The reduction is most pronounced in the midchain region of the sn1 tail and around the double bond of the sn2 tail. These effects were accompanied by increased APL in POPC:Chol membranes and by colocalization of Dyngo-4a near cholesterol-rich regions. Together, these data confirm that Dyngo-4a perturbs membrane organization and lipid packing in a composition-dependent manner. We believe these additions directly address the concern and demonstrate that the simulations indeed support the conclusion that Dyngo-4a modulates lipid packing.

      Finally, the simulation data do not show "that Dyngo-4a is competing with cholesterol"; it is unclear what 'competition' means in this context, but regardless, the data only shows that Dyngo sits at a similar location as cholesterol.

      We agree with the reviewer that “competition” was an imprecise term. We have rephrased the relevant sections to clarify that Dyngo-4a and cholesterol localize to overlapping regions and exhibit spatial coordination. As now stated in the manuscript, cholesterol appears to partially displace Dyngo-4a from its preferred depth seen in pure POPC, broadens its membrane distribution, and alters lipid packing. According to the order parameters there is an interplay between chol and Dyngo-4a and the heatmaps show that the distribution of chol in the membrane gets less uniform in the presence of Dyngo-4a. These interactions suggest that Dyngo-4a perturbs cholesterol-rich domains.

      As new analysis routines were added to the study, we have now also added the details on those to the Methods section of the text.

      (10) AFM measures the stiffness of the cell (as correctly explained in Results section) not "overall stiffness of the PM" as stated in the Discussion.

      We thank the reviewer for pointing this out, we have now altered this in the discussion section.

      (11) Fig2A: what was the starting lipid surface pressure? How does Dyngo insertion depend on initial lipid packing?

      The starting pressure lipid pressure was 20 mN m<sup>-1</sup which we now have incorporated in the figure legend. We performed several such experiments with a starting pressure ranging from 20-23 mN m<sup>-1</sup> showing consistent results which we described in the materials and methods section. Given that we also performed QCMD analysis and simulations on bilayers showing that Dyngo-4a adsorbed and inserted respectively, we have not performed a titration of starting pressures resulting in a MIP of Dygo-4a.

      (12) Fig 4B is a strange approach to measure membrane motion. Why not RMSD or some other displacement based method? As its shown, it implies that the area of the cell changes.

      The method that we used to quantify the area of the cell which is attached (or close to) the glass and thereby is visible in TIRF microscopy. This is area indeed changes over time which has been frequently observed and used to describe and quantify the mobility, lamellipodia and filopodia formation among other things. We agree that RMSD can also be used to analyze the data before and after treatments and we have now included RMSD­­­­ analysis in the manuscript.

      Reviewer #3 (Significance):

      The title, abstract, and introduction of the manuscript are largely framed around lipid packing, but most of the data investigate other unexpected effects of treating cells with Dyngo4a. The only measurement for lipid packing (or any other membrane properties) is Fig 4E-F. Therefore, this paper is effectively an investigation of an artefact of a common reagent, which itself could be a valuable contribution. However, the mechanism to explain its effect requires stronger evidence, and its broad biological significance needs further exploration.

      Overall, the impact of documenting the effects of Dyngo4a on membranes appears modest but may be valuable to the membrane trafficking community.

      Barucha-Kraszewska, J., S. Kraszewski, and C. Ramseyer. 2013. Will C-Laurdan dethrone Laurdan in fluorescent solvent relaxation techniques for lipid membrane studies? Langmuir. 29:1174-1182.

      Boucrot, E., M.T. Howes, T. Kirchhausen, and R.G. Parton. 2011. Redistribution of caveolae during mitosis. J Cell Sci. 124:1965-1972.

      Hoernke, M., J. Mohan, E. Larsson, J. Blomberg, D. Kahra, S. Westenhoff, C. Schwieger, and R. Lundmark. 2017. EHD2 restrains dynamics of caveolae by an ATP-dependent, membrane-bound, open conformation. Proc Natl Acad Sci U S A. 114:E4360-E4369.

      Hubert, M., E. Larsson, N.V.G. Vegesna, M. Ahnlund, A.I. Johansson, L.W. Moodie, and R. Lundmark. 2020. Lipid accumulation controls the balance between surface connection and scission of caveolae. Elife. 9.

      Larsson, E., B. Moren, K.A. McMahon, R.G. Parton, and R. Lundmark. 2023. Dynamin2 functions as an accessory protein to reduce the rate of caveola internalization. J Cell Biol. 222.

      Matthaeus, C., K.A. Sochacki, A.M. Dickey, D. Puchkov, V. Haucke, M. Lehmann, and J.W. Taraska. 2022. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 13:7234.

      McCluskey, A., J.A. Daniel, G. Hadzic, N. Chau, E.L. Clayton, A. Mariana, A. Whiting, N.N. Gorgani, J. Lloyd, A. Quan, L. Moshkanbaryans, S. Krishnan, S. Perera, M. Chircop, L. von Kleist, A.B. McGeachie, M.T. Howes, R.G. Parton, M. Campbell, J.A. Sakoff, X. Wang, J.Y. Sun, M.J. Robertson, F.M. Deane, T.H. Nguyen, F.A. Meunier, M.A. Cousin, and P.J. Robinson. 2013. Building a better dynasore: the dyngo compounds potently inhibit dynamin and endocytosis. Traffic. 14:1272-1289.

      Mohan, J., B. Moren, E. Larsson, M.R. Holst, and R. Lundmark. 2015. Cavin3 interacts with cavin1 and caveolin1 to increase surface dynamics of caveolae. J Cell Sci. 128:979-991.

      Moren, B., C. Shah, M.T. Howes, N.L. Schieber, H.T. McMahon, R.G. Parton, O. Daumke, and R. Lundmark. 2012. EHD2 regulates caveolar dynamics via ATP-driven targeting and oligomerization. Mol Biol Cell. 23:1316-1329.

      Shvets, E., V. Bitsikas, G. Howard, C.G. Hansen, and B.J. Nichols. 2015. Dynamic caveolae exclude bulk membrane proteins and are required for sorting of excess glycosphingolipids. Nat Commun. 6:6867.

      Sinha, B., D. Koster, R. Ruez, P. Gonnord, M. Bastiani, D. Abankwa, R.V. Stan, G. Butler-Browne, B. Vedie, L. Johannes, N. Morone, R.G. Parton, G. Raposo, P. Sens, C. Lamaze, and P. Nassoy. 2011. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 144:402-413.

      Stoeber, M., I.K. Stoeck, C. Hanni, C.K. Bleck, G. Balistreri, and A. Helenius. 2012. Oligomers of the ATPase EHD2 confine caveolae to the plasma membrane through association with actin. EMBO J. 31:2350-2364.

    1. Author response:

      Reviewer #1 (Public review):

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 (Public review):

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows.

      First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      (1) To distinguish autophagosomes from autolysosomes, the authors used vps16 RNAi cells, which are supposed to be fusion deficient. However, the extent to which fusion is actually inhibited by knockdown of Vps16A is not shown. The co-localization rate of Atg8 and Lamp1 should be shown (as in Figure 8). Then, after identifying pre-fusion autophagosomes and lysosomes, the localization of each should be analyzed.

      Thank you for this insightful comment. We analyzed the colocalization of 3xmCherry-Atg8a and GFP-Lamp1, which label autophagic structures and lysosomes, respectively, in Vps16A RNAi fat body cells. As expected, Vps16A silencing markedly reduced the overlap between these two signals, indicating a strong block in autophagosome–lysosome fusion. Moreover, both 3xmCherry-Atg8a and GFP-Lamp1 became more perinuclearly localized compared to the control (luciferase RNAi) cells.

      It is also possible that autophagosomes and lysosomes are tethered by factors other than HOPS (even if they are not fused). If this is the case, autophagosomal trafficking would be affected by the movement of lysosomes.  

      Thank you for raising this possibility. While we cannot fully exclude that autophagosomes might be indirectly transported via tethering to lysosomes, we consider this unlikely. We believe that in Drosophila fat cells, autophagosomes and lysosomes rapidly fuse once in close proximity. Therefore, even if alternative tethering mechanisms exist, they are unlikely to permit prolonged joint trafficking without fusion.

      (2) The authors analyze autolysosomes in Figures 6 and 7. This is based on the assumption that autophagosome-lysosome fusion takes place in cells without vps16A RNAi. However, even in the presence of Vps16A, both pre-fusion autophagosomes and autolysosomes should exist. This is also true in Figure 8H, where the fusion of autophagosomes and lysosomes is partially suppressed in knockdown cells of dynein, dynactin, Rab7, and Epg5. If the effect of fusion is to be examined, it is reasonable to distinguish between autophagosomes and autolysosomes and analyze only autolysosomes.  

      Thank you for this careful observation. The 3xmCherry-Atg8a reporter is well suited to identify both autophagosomes and autolysosomes, as the mCherry fluorophore is resistant to degradation in the acidic environment of autolysosomes. Notably, mCherry-Atg8a–positive autolysosomes appear larger and brighter than pre-fusion autophagosomes, which are typically smaller and dimmer, especially under fusion-deficient conditions (e.g., Figure 4). Therefore, we use these morphological differences as a proxy to distinguish between the two.

      To improve structural assignment, we incorporated endogenous Lamp1 staining (Figure 10) and a Lamp1-GFP reporter (Figure 10—figure supplement 1). Vesicles positive for mCherryAtg8a but negative for Lamp1 are considered pre-fusion autophagosomes. Structures double-positive for mCherry-Atg8a and Lamp1 represent autolysosomes, while Lamp1positive, Atg8a-negative vesicles correspond to non-autophagic lysosomes. To clarify these interpretations, we revised the Results section and explained these reporters in more detail.

      (3) In this study, only vps16a RNAi cells were used to inhibit autophagosome-lysosome fusion. However, since HOPS has many roles besides autophagosome-lysosome fusion, it would be better to confirm the conclusion by knockdown of other factors (e.g., Stx17 RNAi).  

      Thank you for this valuable suggestion. We initially considered using Syntaxin17 RNAi; however, our recent findings indicate that loss of Syx17 results in a HOPS-dependent tethering lock between autophagosomes and lysosomes (DOI: 10.1126/sciadv.adu9605). In this case, tethered vesicles would likely move together, confounding the interpretation of autophagosome-specific trafficking.

      Therefore, we turned to other SNAREs such as Vamp7 and Snap29. One Snap29 RNAi was located on the appropriate chromosome needed for our genetic experiments. We generated a transgenic fly line expressing both Snap29 RNAi and the mCherry-Atg8a reporter under a fat body-specific R4 promoter. When we tested our key trafficking hits in this background, we observed similar autophagosome localization phenotypes as in Vps16A RNAi cells. These results, now included in the revised manuscript (see Figure 6), confirm that the observed transport phenotypes are not specific to Vps16A or HOPS complex loss.

      (4) Figure 8: Rab7 and Epg5 are also known to be directly involved in autophagosomelysosome tethering/fusion. Even if the fusion rate is reduced in the absence of Rab7 and Epg5, it may not be the result of defective autophagosome movement, but may simply indicate that these molecules are required for fusion itself. How do the authors distinguish between the two possibilities?

      Thank you for this important point. While Rab7 and Epg5 indeed participate in autophagosome–lysosome tethering and fusion, our data suggest they also contribute to autophagosome movement. This is evident from the distinct phenotypes observed upon Rab7 or Epg5 RNAi compared to Vps16A or SNARE RNAi. Depletion of Vps16A, Syx17, Vamp7, or Snap29 (factors involved specifically in fusion) results in perinuclear accumulation of autophagosomes. In contrast, Rab7 or Epg5 RNAi leads to a dispersed autophagosome pattern throughout the cytoplasm.

      These differences suggest that Rab7 and Epg5 play additional roles in positioning autophagosomes. Supporting this, our co-immunoprecipitation experiments show that Epg5 interacts with dynein motors. Therefore, we propose that Rab7 and Epg5 influence both autophagosome fusion and their microtubule-based transport.

      Reviewer #2 (Public review):  

      One limitation of the study is the genetic background that serves as the basis for the screening. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and block the trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome, and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.  

      Thank you for highlighting this limitation. We have tried to conduct time-lapse imaging of live fat body cells expressing 3xmCherry-Atg8a and GFP-Lamp1 to visualize the movement and fusion events of pre-fusion autophagosomes (3xmCherry-Atg8a positive and GFP-Lamp1 negative) and lysosomes (GFP-Lamp1 positive). Despite different experimental setups and durations of starvation, no vesicle movement was observed at all, so live imaging of larval Drosophila fat tissue will require time-consuming optimizations of in vitro culture conditions. Consistent with this, we did not find any literature data where organelle motility in fat body cells was successfully observed. Nuclear positioning in fat body cells was investigated in detail in an excellent study, however the authors were able to observe only very little movement of the nuclei by live imaging (Zheng et al. Nat Cell Biol. 2020 Mar;22(3):297-309. doi: 10.1038/s41556-020-0470-7), further highlighting the technical difficulties of live or timelapse imaging in this tissue type.

      Specific comments  

      (1) Several genes have been described that when depleted lead to perinuclear accumulation of Atg8-labeled vesicles. There seems to be a correlation of this phenotype with genes required for autophagosome-lysosome fusion; however, some genes required for lysosomal fusion such as Rab2 and Arl8 apparently did not affect autophagosome positioning as reported here. Thus, it is unclear whether the perinuclear positioning of autophagosomes is truly a general response to disruption of autophagosome-lysosome fusion, or may reflect additional aspects of Vps16A/HOPS function. A few things here would help. One would be an analysis of Atg8a vesicle localization in response to the depletion of a larger set of fusionrelated genes. Another would be to repeat some of the key findings of this study (effects of specific dynein, dynactin, rabs, effectors) on Atg8a localization when Syx17 is depleted, rather than Vps16A. This should generate a more autophagosome-specific fusion defect.  

      Thank you for this insightful suggestion. We recently discovered that Syx17 depletion induces a HOPS-dependent tethering lock between autophagosomes and lysosomes (DOI: 10.1126/sciadv.adu9605), making it unsuitable for modeling autophagosome-specific fusion defects. In contrast, Vamp7 and Snap29 knockdowns do not appear to cause such tethering lock. We were able to generate a suitable Drosophila line using a Snap29 RNAi transgene located on a compatible chromosome. Upon testing key hits from our screen in this background, we found that autophagosomes redistributed similarly, supporting our conclusions. These new results have been included in the revised manuscript (see Figure 6)

      Third, it would greatly strengthen the findings to monitor pre-fusion autophagosome localization without disrupting fusion. Such vesicles could be identified as Atg8a-positive Lamp-negative structures. The effects of dynein and rab depletion on the tracking of these structures in a post-induction time course would serve as an important validation of the authors' findings.  

      Thank you for this helpful suggestion. As described above, we attempted time-lapse imaging of 3xmCherry-Atg8a and GFP-Lamp1-expressing fat body cells under various conditions to identify motile pre-fusion autophagosomes. However, we did not observe any vesicle movement, regardless of the starvation duration or experimental setup. As this likely reflects technical limitations of ex vivo fat body imaging, we were unable to achieve live tracking of autophagosome dynamics without introducing perturbations. This limitation is now discussed in the revised manuscript.

      (2) The authors nicely show that depletion of Shot leads to relocalization of Atg8a to ectopic foci in Vps16A-depleted cells; they should confirm that this is a mislocalized ncMTOC by colabeling Atg8a with an MTOC component such as MSP300. The effect of Shot depletion on Atg8a localization should also be analyzed in the absence of Vps16A depletion.  

      Thank you for this positive comment. We co-labeled Atg8a with the minus-end microtubule marker Khc-nod-LacZ in both shot single knockdown and shot; vps16A double knockdown cells. Ectopic Khc-nod-LacZ-positive MTOC foci were clearly visible in both conditions, and Atg8a-positive autophagosomes accumulated around these structures. These findings confirm that Shot depletion induces ectopic MTOC formation, which correlates with autophagosome relocalization. The new data have been incorporated into the revised manuscript (see Figure 1O-S).

      (3) The authors report that depletion of Dynein subunits, either alone (Figure 6) or codepleted with Vps16A (Figure 2), leads to redistribution of mCherry-Atg8a punctae to the "cell periphery". However, only cell clones that contact an edge of the fat body tissue are shown in these figures. Furthermore, in these cells, mCherry-Atg8a punctae appear to localize only to contact-free regions of these cells, and not to internal regions of clones that share a border with adjacent cells. Thus, these vesicles would seem to be redistributed to the periphery of the fat body itself, not to the periphery of individual cells. Microtubules emanating from the perinuclear ncMTOC have been described as having a radial organization, and thus it is unclear that this redistribution of mCherry-Atg8a punctae to the fat body edge would reflect a kinesin-dependent process as suggested by the authors.  

      Thank you for this detailed observation. We frequently observe autophagosomes accumulating in contact-free peripheral regions of dynein-depleted cells, resulting in an asymmetric distribution. While previous studies describe a radial microtubule organization in fat body cells, none of them directly label MT plus ends, the direction of kinesin-based transport.

      To further explore this, we overexpressed a HA-tagged kinesin, Klp98A-3xHA, in both control and Vps16A RNAi backgrounds. Immunolabeling revealed that Klp98A localizes to the contact-free peripheral regions in both conditions, consistent with the distribution of autophagosomes in dynein knockdown cells. This supports our interpretation that kinesindependent transport drives autophagosome redistribution in the absence of dynein, and that fat body cells exhibit subtle asymmetries in MT polarity that influence this transport. These new results have been included in the revised manuscript (see Figure 3G, H).

      (4) To validate whether the mCherry-Atg8a structures in Vps16A-depleted cells were of autophagic origin, the authors depleted Atg8a and observed a loss of mCherry- Atg8a signal from the mosaic cells (Figure S1D, J). A more rigorous experiment would be to deplete other Atg genes (not Atg8a) and examine whether these structures persist.  

      Thank you for the suggestion to further validate our reporter. We depleted Atg1, a key kinase required for phagophore initiation, in the Vps16A RNAi background. This completely abolished the punctate mCherry-Atg8a distribution in knockdown cells (see Figure 1—figure supplement 1E, K), confirming that the labeled structures are indeed of autophagic origin.

      (5) The authors found that only a subset of dynein, dynactin, rab, and rab effector depletions affected mCherry-Atg8a localization, leading to their suggestion that the most important factors involved in autophagosome motility have been identified here. However, this conclusion has the caveat that depletion efficiency was not examined in this study, and thus any conclusions about negative results should be more conservative.  

      Thank you for this constructive feedback. We agree that negative results must be interpreted conservatively due to potential differences in knockdown efficiency. We have revised our conclusions accordingly, clarifying that the factors identified are key for autophagosome motility, while acknowledging the possibility of false negatives.

      Reviewer #3 (Public review):  

      Major concerns:

      (1) The localization of EPG5 should be determined. The authors showed that EPG5 colocalizes with endogenous Rab7. Rab7 labels late endosomes and lysosomes. Previous studies in mammalian cells have shown that EPG5 is targeted to late endosomes/lysosomes by interacting with Rab7. EPG5 promotes the fusion of autophagosomes with late endosomes/lysosomes by directly recognizing LC3 on autophagosomes and also by facilitating the assembly of the SNARE complex for fusion. In Figure 5I, the EPG5/Rab7colocalized vesicles are large and they are likely to be lysosomes/autolysosomes.

      Thank you for suggesting to improve our Epg5 localization data. We performed triple immunostaining for Atg8a, Lamp1-3xmCherry, and Epg5-9xHA in S2R+ cells. In addition to triple-positive structures—likely representing autolysosomes—we observed Atg8a and Epg59xHA double-positive vesicles that lacked Lamp1-3xmCherry signal, which likely correspond to pre-fusion autophagosomes. Based on these results, we propose that in addition to arriving via the endocytic route, Epg5 may also reach lysosomes through autophagosomes. These findings have been included in the revised manuscript (see Figure 5K).

      (2) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation. The authors need to verify their findings in Stx17 KO cells, as it has a relatively specific effect on the fusion of autophagosomes with late endosomes/lysosomes.  

      Thank you for this valuable suggestion. We initially considered Syntaxin17 for validation; however, we recently found that loss of Syx17 leads to a HOPS-dependent tethering lock between autophagosomes and lysosomes, which would confound interpretation, as autophagosomes remain tethered to lysosomes (DOI: 10.1126/sciadv.adu9605). Therefore, Syntaxin17 loss is not suitable for our purpose. Among the remaining fusion SNAREs, one RNAi line targeting Snap29 was available on a compatible chromosome for generating Drosophila lines equivalent to those used in the screen. We established this Snap29 RNAicontaining tester line and crossed it with our top hits. We observed that autophagosome motility was comparable to that in the Vps16A RNAi background, further supporting our conclusions. These results have been incorporated into the revised manuscript (see Figure 6)

      (3) Quantification should be performed in many places such as in Figure S4D for the number of FYVE-GFP labeled endosomes and in Figures S4H and S4I for the number and size of lysosomes.  

      Thank you for pointing this out. We performed the suggested quantifications and statistical analyses for FYVE-GFP labeled endosomes, as well as for the number and size of lysosomes. The updated data are now presented in the revised Figure 5—figure supplement 1.

      (4) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gained from this study may not apply to other cell types. This needs to be discussed.

      Thank you for raising this important point. We agree that our findings may not be fully generalizable to all cell types. Given that the organization of the microtubule network depends on both cell function and developmental stage, it is plausible that the molecular machinery described here operates differently elsewhere. We now mention this limitation in the Discussion.

      Minor concerns:  

      (5) Data in some panels are of low quality. For example, the mCherry-Atg8a signal in Figure 5C is hard to see; the input bands of Dhc64c in Figure 5L are smeared.  

      Thank you for pointing this out. We repeated the experiment shown in Figure 5C and replaced the panel with a clearer image. The smeared Dhc64C input bands in Figure 5L result from the unusually large size of this protein, which affects its electrophoretic migration. We mentioned this point in the corresponding figure legend.

      (6) In this study, both 3xmCherry-Atg8a and mCherry-Atg8a were used. Different reporters make it difficult to compare the results presented in different figures.  

      Thank you for this comment. Both 3xmCherry-Atg8a and mCherry-Atg8a are well-established reporters that behave similarly as autophagic markers. Nevertheless, to avoid confusion, we ensured that each figure uses only one type of reporter consistently, which is now clearly indicated in the revised manuscript.

      (7) The small autophagosomes presented in Figures such as in Figure 1D and 1E are not clear. Enlarged images should be presented.  

      Thank you for your suggestion. We repeated these experiments and replaced the relevant panels with higher-quality images, including enlarged insets to better visualize small autophagosomes. These updated figures are now included in the revised manuscript.

      (8) The authors showed that Epg5-9xHA coprecipitates with the endogenous dynein motor Dhc64C. Is Rab7 required for the interaction?  

      Thank you for this insightful question. We tested this by co-transfecting S2R+ cells with Epg5-9xHA and different forms of Rab7: wild-type, GTP-locked (constitutively active), and GDP-locked (dominant-negative). Our results indicate that the strength of Epg5-Dhc interaction does not change in the presence of either GTP-locked or GDP-locked Rab7. However, we believe that Epg5 and dynein are recruited to the vesicle membranes via Rab7 in vivo, so we did not include these results in the revised manuscript.

      (9) The perinuclear lysosome localization in Epg5 KD cells has no indication that Epg5 is an autophagosome-specific adaptor.

      Thank you for this important comment. Accordingly, we have toned down our statements about Epg5 functions throughout the revised manuscript.

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 6: What do "autolysosome maturation" and "small autolysosomes" mean? Do different numbers of lysosomes fuse to a single autophagosome?

      Thank you for highlighting this point. We concluded that the formation of smaller autolysosomes—compared to controls—is likely due to a defect in autolysosome maturation, as is often the case. We had not explicitly considered whether a different number of lysosomes fuse with each autophagosome during this process. We clarified this issue in the revised manuscript.

      (2) Figure 5A shows that the localization of endogenous Atg8 requires Epg5, but the data is not as clear as for mCherry-Atg8 (Figure 4B). Why the difference?  

      Thank you for this question. The difference arises because the mCherry-Atg8a reporter strongly labels autolysosomes, as the mCherry fluorophore remains stable in acidic compartments. As a result, mCherry-Atg8a labels both autophagosomes and autolysosomes, but the strong autolysosomal signal originating from the surrounding GFP negative, nonRNAi cells can make accumulated autophagosomes appear fainter in fusion-defective cells (as in Figure 4). In contrast, endogenous Atg8a is degraded in lysosomes, and therefore labels only autophagosomes. This means that the appearance of these two experiments can be slightly different, but since in both cases autophagosomes no longer accumulate in the perinuclear region of Vps16A,Epg5 double RNAi cells we can conclude that Epg5 is required for autophagosome positioning. We explained this difference of the two methods in the revised manuscript where it first appears (Figure 1B and Figure 1—figure supplement 1A).

      (3) Blue letters on the black micrographs are hard to see. Some of the other letters are also small and hard to read.  

      Thank you for this suggestion. We improved the visibility and readability of the labels in the revised figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors employ a combined proteomic and genetic approach to identify the glycoprotein QC factor malectin as an important protein involved in promoting coronavirus infection. Using proteomic approaches, they show that the non-structural protein NSP2 and malectin interact in the absence of viral infection, but not in the presence of viral infection. However, both NSP2 and malectin engage the OST complex during viral infection, with malectin also showing reduced interactions with other glycoprotein QC proteins. Malectin KD reduce replication of coronaviruses, including SARS-COV2. Collectively, these results identify Malectin as a glycoprotein QC protein involved in regulating coronavirus replication that could potentially be targeted to mitigate coronavirus replication.

      Overall, the experiments described appear well performed and the interpretations generally reflect the results. Moreover, this work identifies Malectin as an important pro-viral protein whose activity could potentially be therapeutically targeted for the broad treatment of coronavirus infection. However, there are some weaknesses in the work that, if addressed, would improve the impact of the manuscript.

      Notably, the mechanism by which malectin regulates viral replication is not well described. It is clear from the work that malectin is a pro-viral protein in the work presented, but the mechanistic basis of this activity is not pursued. Some potential mechanisms are proposed in the discussion, but the manuscript would be strengthened if additional insight was included. For example, does the UPR activated to higher levels in infected cells depleted of malectin? Do glycosylation patterns of viral (or non-viral) proteins change in malectindepleted cells? Additional insight into this specific question would significantly improve the manuscript.

      We concur with the reviewer that the mechanism by which Malectin regulates viral replication is an important point to elucidate further. Our proteomics data were able to offer additional insight into the questions posed here. We examined the upregulation of protein markers of the UPR and other stress response pathways in cells depleted of MLEC (Fig. S15D). We find that the UPR pathways are moderately but insignificantly upregulated, while the Heat Shock Factor 1 (HSF1) pathway is moderately and significantly upregulated. The fold change increase of these marker proteins are relatively small, so while upregulation of this pathway may contribute to the suppression of CoV replication, it may not fully explain the phenotype.

      In addition, to address the second question, we compared the glycosylation patterns of endogenous proteins in MLEC-KD cells (Fig. S15E-G). We found that there is a small increase in abundance of glycopeptides associated with LAMP2, SERPHINH1, RDX, RPL3/5, CADM4, and ITGB1, however these fold changes are small and tested to be insignificant. These results indicate there is relatively little modification of endogenous glycoproteins upon MLEC-depletion. These findings support a more direct role for MLEC in regulating viral replication.

      We added the following section to the manuscript text to discuss these results:

      “In uninfected cells, MLEC KD leads to relatively little proteome-wide changes, with MLEC being the only protein significantly downregulated and no other proteins significantly upregulated, supporting the specificity of MLEC KD in MHV suppression (Fig.  S15C). To determine whether MLEC KD alters general host proteostasis, we further examined the levels of protein markers of stress pathways based on previous gene pathway definitions(Davies et al., 2023; Grandjean et al., 2019; Shoulders et al., 2013) (Fig. S15D). We find that there are modest but significant increases in protein levels associated with the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response (UPR) pathways are largely unmodified. 

      We also probed the effect of MLEC KD on endogenous protein glycosylation. We find that there is only a small increase in abundance of glycopeptides, including those associated with the ribosome (Rpl3, Rpl5), a cytoskeletal protein (Rdx), the integrin Itgb1, and the ER-resident chaperone Serphinh1 (Fig. S15E-G).”

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). In addition, there are only minor increases in endogenous glycopeptide levels (Fig. S15E-G). Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      Further, the evidence for increased interactions between OST and malectin during viral infection is fairly weak, despite being a major talking point throughout the manuscript. The reduced interactions between malectin and other glycoproteostasis QC factors is evident, but the increased interactions with OST are not well supported. I'd recommend backing off on this point throughout the text, instead, continuing to highlight the reduced interactions.

      We agree that the fold change increase of OST interactions with malectin are small compared to the fold change decrease of other glycoproteostasis factors We have modified the text to less emphasize this point and instead highlight the reduced interactions:

      “Further, MHV infection retains the association of MLEC with the OST complex while titrating off other interactors, potentially leading to more efficient glycoprotein biogenesis.”

      I was also curious as to why non-structural proteins, nsp2 and nsp4, showed robust interactions with host proteins localized to both the ER and mitochondria? Do these proteins localize to different organelles or do these interactions reflect some other type of dysregulation? It would be useful to provide a bit of speculation on this point.

      We also find these ER and mitochondrial protein interactions curious, which we initially reported on (Davies, Almasy et al. 2020 ACS Infectious Diseases). In this prior report, we found that when expressed in HEK293T cells, SARS-CoV-2 nsp2 and nsp4 have partial localization to mitochondrial-associated ER membranes (MAMs), as determined by subcellular fractionation. Given that malectin has also been shown to have MAMs localization (Carreras-Sureda, et al. 2019 Nature Cell Biology), we have added additional text in the Discussion to speculate on this point:

      “Additionally, MLEC has also been shown to localize to ER-mitochondria contact sites (MAMs)(Carreras-Sureda et al., 2019), which regulate mitochondrial bioenergetics. We have previously shown that SARS-CoV-2 nsp2 and nsp4 can partially localize to MAMs(Davies et al., 2020), so these viral proteins may also dysregulate MLEC and MAMs activity to promote infection.”

      Again, the overall identification of malectin as a pro-viral protein involved in the replication of multiple different coronaviruses is interesting and important, but additional insights into the mechanism of this activity would strengthen the overall impact of this work.

      Thank you for this endorsement. We hope the additional analyses and discussion points in the revised manuscript further homed in on a direct mechanistic function for MLEC in modulating viral replication.

      Reviewer #2 (Public Review):

      Summary:

      A strong case is presented to establish that the endoplasmic reticulum carbohydrate binding protein malectin is an important factor for coronavirus propagation. Malectin was identified as a coronavirus nsp2 protein interactor using quantitative proteomics and its importance in the viral life cycle was supported by using a functional genetic screen and viral assays. Malectin binds diglucosylated proteins, an early glycoform thought to transiently exist on nascent chains shortly after translation and translocation; yet a role for malectin has previously been proposed in later quality control decisions and degradation targeting. These two observations have been difficult to reconcile temporally. In agreement with results from the Locher lab, the malectininteractome shown here includes a number of subunits of the oligosaccharyltransferase complex (OST). These results place malectin in close proximity to both the co-translational (STT3A or OST-A) and post-translational (STT3B or OST-B) complexes. It follows that malectin knockdown was associated with coronavirus Spike protein hypoglycosylation.

      Strengths:

      Strengths include using multiple viruses to identify interactors of nsp2 and quantitative proteomics along with multiple viral assays to monitor the viral life cycle.

      Weaknesses:

      Malectin knockdown was shown to be associated with Spike protein hypoglycosylation. This was further supported by malectin interactions with the OSTs. However, no specific role of malectin in glycosylation was discussed or proposed.

      We have emphasized our hypotheses on this point in the discussion and added a summary figure to highlight the specific role of malectin.

      Given the likelihood that malectin plays a role in the glycosylation of heavily glycosylated proteins like Spike, it is unfortunate that only 5 glycosites on Spike were identified using the MS deamidation assay when Spike has a large number of glycans (~22 sites). The mass spec data set would also include endogenous proteins. Were any heavily glycosylated endogenous proteins hypoglycosylated in the MS analysis in Fig 5D?

      Thank you for this suggestion. We compared the glycosylation patterns of endogenous proteins in MLEC-KD cells (Fig. S15E-G). We found that there is a small increase in abundance of glycopeptides associated with LAMP2, SERPHINH1, RDX, RPL3/5, CADM4, and ITGB1, however these fold changes are small and tested insignificant. These results indicate there is relatively little modification of endogenous glycoproteins upon MLEC-depletion. We added the following sections:

      “We also probed the effect of MLEC KD on endogenous protein glycosylation. We find that there is only a small increase in abundance of glycopeptides, including those associated with the ribosome (Rpl3, Rpl5), a cytoskeletal protein (Rdx), the integrin Itgb1, and the ER-resident chaperone Serphinh1 (Fig. S15E-G).”

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). In addition, there are only minor increases in endogenous glycopeptide levels (Fig. S15E-G). Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      The inclusion of the nsp4 interactome and its partial characterization is a distraction from the storyline that focuses on malectin and nsp2.

      We believe the nsp4 comparative interactome and functional genomics data offers a rich resource for further functional investigation by others, if made public. While we found the malectin and nsp2 storyline the most compelling to pursue, we believe the inclusion of the nsp4 data strengthens the overall approach, in agreement with Reviewer #3’s comments.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Davies and Plate set out to discover conserved host interactors of coronavirus non-structural proteins (Nsp). They used 293T cells to ectopically express flag-tagged Nsp2 and Nsp4 from five human and mouse coronaviruses, including SARS-CoV-1 and 2, and analyzed their interaction with host proteins by affinity purification mass-spectrometry (AP-MS). To confirm whether such interactors play a role in coronavirus infection, the authors measured the effects of individual knockdowns on replication of murine hepatitis virus (MHV) in mouse Delayed Brain Tumor cells. Using this approach, they identified a previously undescribed interactor of Nsp2, Malectin (Mlec), which is involved in glycoprotein processing and shows a potent pro-viral function in both MHV and SARS-CoV-2. Although the authors were unable to confirm this interaction in MHVinfected cells, they show that infection remodels many other Mlec interactions, recruiting it to the ER complex that catalyzes protein glycosylation (OST). Mlec knockdown reduced viral RNA and protein levels during MHV infection, although such effects were not limited to specific viral proteins. However, knockdown reduced the levels of five viral glycopeptides that map to Spike protein, suggesting it may be affected by Mlec.

      Strengths:

      This is an elegant study that uses a state-of-the-art quantitative proteomic approach to identify host proteins that play critical roles in viral infection. Instead of focusing on a single protein from a single virus, it compares the interactomes of two viral proteins from five related viruses, generating a high confidence dataset. The functional follow-ups using multiple live and reporter viruses, including MHV and CoV2 variants, convincingly depict a pro-viral role for Mlec, a protein not previously implicated in coronavirus biology.

      Weaknesses:

      Although a commonly used approach, AP-MS of ectopically expressed viral proteins may not accurately capture infection-related interactions. The authors observed Mlec-Nsp2 interactions in transfected 293T cells (1C) but were unable to reproduce those in mouse cells infected with MHV (3C). EIF4E2/GIGYF2, two bonafide interactors of CoV2 Nsp2 from previous studies, are listed as depleted compared to negative controls (S1D). Most other CoV2 Nsp2 interactors are also depleted by the same analysis (S1D). Previously reported MERS Nsp2 interactors, including ASCC1 and TCF25, are also not detected (S1D). Furthermore, although GIGYF2 was not identified as an interactor of MHV Nsp2/4 in human cells (S1D), its knockdown in mouse cells reduced MHV titers about 1000 fold (S4). The authors should attempt to explain these discrepancies.

      We acknowledge these limitations in AP-MS from ectopically expressed viral proteins and have addressed these discrepancies with further elaboration in the text:

      “A limitation of our study is the initial overexpression of individual proteins for AP-MS, in which we find some variation between our data with other AP-MS studies. We sought to overcome these variations by focusing on conserved interactors and testing interactions in a live infection context.”

      “We also found GIGYF2-KD strongly suppressed MHV infection, despite GIGYF2 not interacting with MHV nsp2 (Fig. S1D), highlighting the importance of proteostasis factors in infection regardless of direct PPIs.”

      More importantly, the authors were unable to establish a direct link between Mlec and the biogenesis of any viral or host proteins, by mass-spectrometry or otherwise. Although it is clear that Mlec promotes coronavirus infection, the mechanism remains unclear. Its knockdown does not affect the proteome composition of uninfected cells (S15B), suggesting it is not required for proteome maintenance under normal conditions. The only viral glycopeptides detected during MHV infection originated from Spike (5D), although other viral proteins are also known to be glycosylated. Cells depleted for Mlec produce ~4-fold less Spike protein (4E) but no more than 2-fold less glycosylated spike peptides (5D), compounding the interpretation of Mlec effects on viral protein biogenesis. Furthermore, Spike is not essential for the pro-viral role of Mlec, given that Mlec knockdown reduces replication of SARS-CoV-2 replicons that express all viral proteins except for Spike (6A/B).

      Thank you, these are all important points. We have acknowledged these compounding factors in the Discussion:

      “Concurrently, knockdown of MLEC leads to impediment of nsp production and aberrant glycosylation of other viral proteins like Spike, though it should be noted that the decrease in Spike glycopeptides is compounded by the overall decrease in Spike protein. Given that MLEC is pro-viral in a SARS-CoV-2 replicon model lacking Spike (Fig. 6), MLEC can promote CoV replication independent of Spike production.”

      Any of the observed effects on viral protein levels could be secondary to multiple other processes.Interventions that delay infection for any reason could lead to an imbalance of viral protein levels because Spike and other structural proteins are produced at a much higher rate than non-structural proteins due to the higher abundance of their cognate subgenomic RNAs. Similarly, the observation that Mlec depletion attenuates MHV-mediated changes to the host proteome (S15C/D) can also be attributed to indirect effects on viral replication, regardless of glycoprotein processing. In the discussion, the authors acknowledge that Mlec may indirectly affect infection through modulation of replication complex formation or ER stress, but do not offer any supporting evidence. Interestingly, plant homologs of Mlec are implicated in innate immunity, favoring a more global role for Mlec in mammalian coronavirus infections.

      We examined the upregulation of protein markers of the UPR and other stress response pathways in cells depleted of MLEC (Fig. S15D). We find that the UPR pathways are moderately but insignificantly upregulated, while the Heat Shock Factor 1 (HSF1) pathway is moderately and significantly upregulated. The fold change increase of these marker proteins are relatively small, so while upregulation of this pathway may contribute to the suppression of CoV replication, it may not fully explain the phenotype. Please all see similar points brought up by reviewer 1 (comment 1). We added the following section to the manuscript text to discuss these results:

      “In uninfected cells, MLEC KD leads to relatively little proteome-wide changes, with MLEC being the only protein significantly downregulated and no other proteins significantly upregulated, supporting the specificity of MLEC KD in MHV suppression (Fig.  S15C). To determine whether MLEC KD alters general host proteostasis, we further examined the levels of protein markers of stress pathways based on previous gene pathway definitions(Davies et al., 2023; Grandjean et al., 2019; Shoulders et al., 2013) (Fig. S15D). We find that there are modest but significant increases in protein levels associated with the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response (UPR) pathways are largely unmodified. 

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). […] Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      Finally, the observation that both Nsp2 (3C) and Mlec (3E/F) are recruited to the OST complex during MHV infection neither support nor refute any of these alternate hypotheses, given that Mlec is known to interact with OST in uninfected cells and that Nsp2 may interact with OST as part of the full length unprocessed Orf1a, as it co-translationally translocates into the ER. Therefore, the main claims about the role of Mlec in coronavirus protein biogenesis are only partially supported.

      We have acknowledged this point in the Discussion. 

      “We find that nsp2 interacts with several OST complex members, including DDOST, STT3A, and RPN1, though whether this is as part of the uncleaved Orf1a polyprotein during co-translational ER translocation or as an individual protein is unclear.”

      Reviewer #2 (Recommendations For The Authors):

      What is the proof that MLEC is a type I membrane protein? If it is strictly sequence analysis, this conclusion should be tapered in the text.

      Our response: We have added appropriate evidence on the biochemical characterization of MLEC topology from Galli et al., 2011, and cryo-EM structural characterization by Ramírez et al., 2019.

      “As it was surprising that nsp2, a non-glycosylated, cytoplasmic protein, would interact with MLEC, an integral ER membrane protein with a short two amino acid cytoplasmic tail(Galli et al., 2011; Ramírez et al., 2019), we assessed a broader genetic interaction between nsp2 and MLEC.”

      Validation of some of the nsp2 and malectin interactome components by pulldowns should be included.

      Our response: The interactions between nsp2 and Ddost, Stt3A, and Rpn1 passed a stringent confidence filter in our AP-MS experiment (Fig. 3C) based on several replication. For this reason, we do not believe additional validation by Western blotting will offer much useful information.

      NGI-1 inhibition of glycosylation looks to be very weak in Fig. 5B and Fig. S14B.

      Our response: It is important to note that the NGI-1 inhibition assay used a suboptimal NGI-1 concentration to prevent full suppression of MHV infection, which we have found previously. We have added this justification in the Methods section and associated figure legend (Fig. S14A).

      “The 5 uM NGI-1 dosage was chosen as it resulted in partial inhibition of glycosylation while not completely blocking MHV infection.”

      “This dosage and timing were chosen to partially inhibit the OST complex without fully ablating viral infection, as NGI-1 has been shown previously to be a potent positive-sense RNA virus inhibitor(Puschnik et al., 2017)  (Fig. S14)”

      Summary model figure at the end would help to communicate the conclusions.

      Our response: Thank you for this suggestion. We agree and have added a summary model figure at the end as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Given that there are different mutations identified at different CDK12 sites as illustrated in Figure 1B it would be nice to know which ones have been functionally classified as pathogenic and for which ones that the pathogenicity has not been determined. This would be especially interesting to perform in light of the differences in the LOH scores and WES data presented - specifically, are the pathogenic mutations vs the mutations for which true pathogenicity is unknown more likely to display LOH or TD?

      Alterations were classified as pathogenic when resulting in frameshift, nonsense, or cause an aminoacid change likely to alter function (according to ANNOVAR).  Four patients were called CDK12<sup>BAL</sup> but were negative for TDP signatures. Three of these had CDK12 mutations downstream of the kinase domain, which may be less likely to ablate protein activity. Most functionally validated pathogenic mutations include disruption of the kinase domain (PMID: 25712099). We added a sentence to the Results section (under “Identification of genomic characteristics that associate with CDK12 loss in prostate cancer”) to highlight this caveat on pathogenic mutation calls.

      For the cell inhibition studies with the CDK12/13 inhibitor, more details characterizing the specificity of this molecule to these targets would be useful. Additionally, could the authors perform short-term depletion studies with a PROTAC to the target or short shRNA or non-selected pool CRISPR deletion studies of CDK12 in these same cell lines to complement their pharmacological studies with genetic depletion studies? Also perhaps performing these same inhibitor studies in CDK12/13 deleted cells to test the specificity of the molecule would be useful.

      We are not aware of a CDK12-specific PROTAC, and generate such as reagent is beyond the scope of the present study. Regarding the specificity of the CDK12/13 inhibitor molecules, additional information on the specificity and in vivo dose selection were added to the Results section (under “CDK13 is synthetic lethal in cells with biallelic CDK12 loss”). Cells with CDK12-KO did not tolerate CDK13-KO, so we were unable to generate double knockouts to test for CDK12/13 inhibitor non-specific effects. 

      Additionally, expanding these studies to additional prostate cancer cell lines or organdies models would strengthen the conclusions being made. More information should be provided about the dose and schedule chosen and the rationale for choosing those doses and schedules for the in vivo studies proposed should be presented and discussed. Was there evidence for maximal evidence of inhibition of the target CDK12/13 at the dose tested given the very modest tumor growth inhibition noted in these studies.

      With respect to additional acute CDK12 loss models, our Tet-inducible shCDK12 models show only minor growth slowdown and do not appear to phenocopy the strong arrest or apoptosis seen with CDK12 KO or inhibition, respectively. Future work is ongoing to generate CDK12-degron regulated cell lines. We added a new immunoblot panel showing that acute CRISPR/sgRNA targeting of CDK12 does indeed lead to BRCA2 and ATM protein decrease (Fig. S4g), providing some orthogonal genomic targeting evidence of the acute HR gene effect.  We are continuing efforts to collect and generate additional CDK12<sup>BAL</sup> cell models, in both 2D and 3D culture systems, but none are presently available. We added a 3D culture drug dose curve with LuCaP189.4 exposed to THZ531 (Fig. S7m), which confirms heightened sensitivity vs two CDK12-intact lines. 

      Regarding assessments of CDK12 targets; as we are not aware of any unique CDK12 substrates, it is fair to ask but difficult to measure precise CDK12 inhibition by the compounds in tumors. We dosed mice using the same protocol as detailed in the original report testing SR4835 in mice (PMID: 31668947). We performed immunoblots on lysates from 3 and 28 day treated PDX tumors and did not see any consistent decreases in pRBP1(Ser2) or ATM or increases in γH2A.X (data not shown). However, we did see increases in APA usage and downregulation of DNA repair transcripts with three day treatment (Fig. 6k-l), as would be expected from on target acute effects.

      Reviewer #2 (Public review)

      One caveat that continues to be unclear as presented, is the uncoupling of cell cycle/essentiality of CDK12/13 from HR-directed mechanisms. Is this purely a cell cycle arrest phenotype acutely with associated down-regulation of many genes?

      In regard to untangling the effects of cell arrest on HR gene expression, this is a difficult question given that many HR genes, including BRCA2, are S/G2 linked. We attempted to account for those effects in the acute CDK12 inhibition experiment by including a palbociclib (CDK4/6i) control, which caused cell arrest and decreased BRCA1/2 RNA expression with no apparent 5/3’ transcript imbalance determined by qPCR (Fig. 4e,g). Though overall BRCA1 and BRCA2 mRNA levels are lower in the stable 22Rv1-CDK12-KO2 and KO5 lines, they do not show selective 3’ loss (Fig. 5c), suggesting the downregulation in these lines is mostly due to their slower growth (Fig. S4k) and not intronic polyA usage.

      While the RAD51 loading ssRNA experiments are informative, the Tet-inducible knockdown of BRCA2 and CDK12 is confusing as presented in Figure 5, shBRCA2 + and -dox are clearly shown. However, were the CDK12_K02 and K05 also knocked down using inducible shRNA or a stable knockout? The importance of this statement is the difference between acute and chronic deletion of CDK12. Previously, the authors showed that acute knockdown of CDK12 led to an HR phenotype, but here it is unclear whether CDK12K02/05 are acute knockdowns of CDK12 or have been chronically adapted after single cell cloning from CRISPR-knockout. 

      As a clarification, the 22Rv1-CDK12-KO2 and 22Rv1-CDK12-KO5 are stable CRISPR knockout clonal lines that were expanded from single cells. We added a new figure to include more validation of these lines (Fig. S5). We tried multiple times to reproduce the HRd phenotype and PARPi sensitivity with siRNA and inducible shRNA lines but were unable to see clear sensitivity differences, despite seeing the expected shifts with shBRCA2 controls (data not shown). It is possible the degree of knockdown (~80%), timing (8 days), or specific cell lines used in our experiments were not sufficient to expose the acute phenotype by this method.

      However, we were able to see acute HR gene decreases by inhibitor treatment (Fig. 4) or acute CRISPR (Fig. S4g).

      Given the multitude of lines, including some single-cell clones with growth inhibitory phenotypes and ex-vivo derived xenografts, the variability of effects with SR4835, ATM, ATR, and WEE1 inhibitors in different models can be confusing to follow. Overall, the authors suggest that the cell lines differ in therapeutic susceptibility as they may have alternate and diverse susceptibilities. It may be possible that the team could present this more succinctly and move extraneous data to the supplement.  

      We appreciate the complexity of the data and attempted to use multiple models to report consistency and variability. We are not able to ascertain what data would be extraneous, and elected to present data we view as relevant in the main figures while moving supporting data in the supplement.

      The in-vitro data suggests that SR4835 causes growth inhibition acutely in parental lines such as 22RV1. However, in vivo, tumor attenuation appears to be observed in both CDK12 intact and deficient xenografts, LuCAP136 and LuCaP 189.4 (albeit the latter is only nominally significant). Is there an effect of PARPi inhibition specifically in either model? What about the 22RV1-K02/05? Do these engraft? Given the role of CDK12/13 in RNAP II, these data might suggest that the window of susceptibility in CDK12 (mutant) tumors may not be that different from CDK12 intact tumors (or intact tissue) when using dual CDK12/13 inhibitors but rather represent more general canonical essential functions of CDK12 and CDK13 in transcription. From a therapeutic development strategy, the authors may want to comment in the discussion on the ability to target CDK13 specifically.

      Though the response of the CDK12<sup>BAL</sup> models to some compounds is variable, we believe those mixed results are important and future studies may be able to better explain why some show shifts in sensitivity while others do not. We hope future studies with additional models will help determine which sensitivities are more consistently true, and perhaps provide explanations for differences between models.

      Regarding SR4835, we find, and others have reported, a toxic (i.e. apoptotic) effect for in vitro treatment with dual CDK12/13 inhibitors (Fig. 4f, S4e,f); in fact, that may be why previous studies have used short timepoints in cell culture assays with these dual inhibitors. In mice, SR4835 was tolerated well but only LuCaP 189.4 showed statistically significant decreases in tumor volume and weight (Fig. 6j). We did not test PARPi responses in the PDX models, nor did we attempt engrafting the 22Rv1-CDK12-KO cell lines, but both would be worthwhile experiments in the future. Beyond CDK12<sup>BAL</sup> tumors, we agree that CDK12/13 inhibitors could be effective in cancer therapies more generally (e.g. triggering acute HRd, loss of RNAP2 phosphorylation). We added a line to the discussion section about ongoing efforts to combine PARPi and CDK12/13i, which we expect to be synergistic in CDK12-intact tumors due to the acute loss phenotype. We certainly agree that development of a specific CDK13 inhibitor would be the ideal therapeutic option for CDK12<sup>BAL</sup> tumors. However, CDK12 and CDK13 are 43% conserved at the protein level (PMID: 26748711), with 92% conservation in the active site (PMID: 30319007), and there are no available pharmacologic inhibitors that discriminate between CDK12 and CDK13.

      Reviewer #3 (Public review):

      It is generally assumed that CDK12 alterations are inactivating, but it is noteworthy that homozygous deletions are comparatively uncommon (Figure 1a). Instead many tumors show missense mutations on either one or both alleles, and many of these mutations are outside of the kinase domain (Figure 1b). It remains possible that the CDK12 alterations that occur in some tumors may retain residual CDK12 function, or may confer some other neomorphic function, and therefore may not be accurately modeled by CDK12 knockout or knockdown in vitro. This would also reconcile the observation that knockout of CDK12 is cell-essential while the human genetic data suggest that CDK12 functions as a tumor suppressor gene.

      Thank you for the feedback. It is a keen observation that homozygous deletions of CDK12 are not typical, though many mutations are upstream frameshifts that are expected to lead to loss of functional protein and mRNA via nonsense mediated decay. LuCaP189.4, our only natural mutant model, has two upstream frameshifts leading to complete protein loss (Fig 5b, S4h-i). We also added a caveat previously mentioned (in response to Reviewer 1) that mutations downstream of the kinase domain may be less likely to be fully pathogenic. For upstream missense mutations, the possibility of neuromorphic function remains an intriguing possibility that cannot be ruled out and would not be captured in our current models. Hopefully additional models can be developed, both natural and engineered, to help dissect that question in future studies.  

      It is not entirely clear whether CDK12 altered tumors may require a co-occurring mutation to prevent loss of fitness, either in vitro or in vivo (e.g. perhaps one or more of the alterations that occur as a result of the TDP may mitigate against the essentiality of CDK12 loss).

      We concur. Another caveat with the CRISPR models, beyond reliance on upstream frameshift mutations, is the simultaneous loss of alleles. In human tumors, there may be a period of single copy loss before the second hit that may provide a window for adaptation. It is possible that sequential loss is far easier for a cell to tolerate than acute bi-allelic inactivation. We agree that the question of what (if any) cooperating genetic alterations are required to tolerate CDK12 loss is an important one that we plant to further explore in future work.

      Recommendations for Authors:

      Reviewer #1 (Recommendations for Authors):

      The authors have thoroughly addressed all issues of data availability, reagents, in vivo protocols, and animal approvals associated with the studies presented in this manuscript. Specific comments and experimental suggestions that in my opinion would strengthen the conclusions of this interesting and compelling manuscript are included above

      Reviewer #2 (Recommendations for the authors):

      The authors were thorough in their studies. As a general note, switching between the cell lines is often overwhelming in interpreting the data given cell-to-cell variability in response. If possible, consolidating the text/conclusions in results would improve the readability of the manuscript.

      The variety of cell lines and models is perhaps expansive at times, but we hope the inclusion of these different models helps support the conclusions. 

      Is it possible to knockout CDK12 acutely using a degron-based approach, instead of utilizing an inhibitor that targets both CDK12/13?

      There is a HeLa cell line made with analog-sensitive CDK12 (Bartkowiak, Yan, and Greenleaf 2016) but we were unaware of any such prostate lines at the time of this work. We are attempting to develop engineered prostate lines with specific CDK12 degradation but do not yet have them available.

      How do the authors address a lower BRCA1/2 level in for example 22RV1-K05, does this cell line have increased sensitivity to PARPi over its parental 22RV1 line? Could this be added to Figure 5h/i?

      The lower BRCA2 levels in 22Rv1CDK12-KO5 is likely due to the slower growth rate (Fig. S4k), as BRCA2 expression is S/G2 linked. While the mRNA level of BRCA2 overall is lower in the KO5 line, we do not observe the 5’/3’ transcript imbalance (Fig. 5c). The 22Rv1-CDK12-KO lines did not show increased sensitivity to carboplatin, while inducible shBRCA2 did (Fig. S7a), so we do not believe this lower BRCA2 confers functional HRd. We did test the KO lines with olaparib (Fig. S7d) and saw a modest increased sensitivity compared to parental 22Rv1, but not to the extent measured in the BRCA1 mutant line UWB1.289.

      What is the clonality of the LuCAP 189.4 lines upon derivation? Is biallelic CDK12 loss seen in all cells?

      We do not know the exact clonality of the LuCAP 189.4 PDX or CL model, but we do see highly uniform CDK12 protein loss in these cells (quantified by IHC staining, data not shown).

      The authors state that 22RV1-K02/05 has an increased growth arrest to CDK13 inhibition. However, in Figure 6h, it appears the viability is not significantly different compared to the parental 22RV1 line. Similar aspects noted in 189.4-vec/CDK12?

      We found that 22Rv1 KO2/KO5 have growth arrest with sgCDK13 and cell death with CDK12/13 inhibitor. We did notice that SR4835 did not show the differential effects we anticipated (Fig. 6h), as was seen with THZ531 (Fig. 6i). SR4835 is a non-covalent inhibitor, while THZ531 is a covalent binder, so there are some functional differences between these compounds that might explain the lack of differential effects in the isogenic lines in a 4 day in vitro assay. We included the SR4835 in vitro data because it was used for the in vivo experiment. THZ531 is not suited for animal use.

      Could the authors comment on SR4835 response in vivo as a function of tumor growth rate?

      The in vivo SR4835 treated LuCaP189.4 did show signs of reduced proliferation with decreased Cell Cycle and DNA Replication in the RNA-seq signatures, but a more detailed investigation into cell cycle arrest vs apoptotic response has yet to be fully explored. We plan to conduct additional PDX experiments if we can obtain a selective CDK13 inhibitor. 

      Do the authors explore TDPs in their isogenic cell lines?

      We performed low coverage WGS on the 22Rv1 KO clones and added that to the paper (Fig. S5c). We did not see any obvious signs of TDP. We suspect the phenotype takes longer to accumulate and is not apparent within the ~20 passages our clones underwent in culture. This would be consistent with the tumor analysis (Fig. 2b) showing increase in TDs from primary to metastatic tumors, suggesting TDs accumulate over time.

      A future study may allow for screening synthetic lethals in the context of CDK12 loss in the presence or absence of SR4835 inhibition.

      We are actively pursuing experiments to identify new synthetic lethal targets by CRISPR and drug screens in CDK12 loss models and hope to report those in a future study.

      Reviewer #3 (Recommendations for the authors):

      As discussed above, the authors may wish to adjust their terminology to "CDK12-altered" rather than "CDK12 lost" or "CDK12-inactivated" to leave open the possibility that some tumors may retain residual CDK12 function or adopt neomorphic functions.

      Thank you for the additional comments and feedback. The possibility of neomorphic CDK12 allele function is important. As our models were all complete protein loss mutations, we decided to retain “biallelic loss” as our preferred nomenclature, but the note is well taken.

      The plots in Figures 1f-h are interesting and suggest that certain cancer genes (especially oncogenes) are recurrently gained in CDK12-altered tumors. It may be interesting to look at this on the individual level rather than the cohort level to see whether any groups of oncogenes tend to be gained together in an individual patient - this could inform whether certain combinations of cancer drivers cooperate with CDK12 alteration to drive oncogenesis.

      Thank you for the idea of looking at the patient-level for TDP-enriched oncogenes. A preliminary assessment did not identify recurrent co-gained oncogenes. We will continue these analyses as additional patient tumors with CDK12 alterations are identified. 

      The finding that AR gene or enhancer are recurrently gained with TDP is interesting and I am curious whether the authors have thoughts on whether these alterations can also be seen in the 1-2% of CDK12altered primary prostate cancers that are treatment naïve, and where AR pathway alterations are not as frequently seen.

      We did not focus on CDK12 altered primary prostate cancers, but we did check if there is AR amplification enrichment in the 6 CDK12<sup>BAL</sup> cases of the TCGA-PRAD dataset and did not identify enrichment. However, with such small numbers we would hesitate to draw any hard conclusions. 

      It could be interesting to more comprehensively characterize some of the CDK12 KO-adapted lines in Figure 5 (e.g. by WES or WGS) to determine whether they exhibit the TDP and/or whether they have acquired any secondary mutations that allow them to adapt to CDK12 loss.

      We are planning to do further genomics characterization of the CDK12-KO lines and will hopefully include that in a future study. Genomic analyses of the 22Rv1 clones (see copy number plots in Fig. S5c) did not identify a TDP. We plan to repeat the genomic assessments over additional cell passages and we have planned additional experiments designed to understand why some cells tolerate CDK12 loss and others do not.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Hurtado et al. show that Sox9 is essential for retinal integrity, and its null mutation causes the loss of the outer nuclear layer (ONL). The authors then show that this absence of the ONL is due to apoptosis of photoreceptors and a reduction in the numbers of other retinal cell types such as ganglion cells, amacrine cells, and horizontal cells. They also describe that Müller Glia undergoes reactive gliosis by upregulating the Glial Fibrillary Acidic Protein. The authors then show that Sox9+ progenitors proliferate and differentiate to generate the corneal cells through Sox9 lineage-tracing experiments. They validate Sox9 expression and characterize its dynamics in limbal stem cells using an existing single-cell RNA sequencing dataset. Finally, the authors argue that Sox9 deletion causes progenitor cells to lose their clonogenic capacity by comparing the sizes of control and Sox9-null clones. Overall, Hurtado et al. underline the importance of Sox9 function in retinal and corneal cells.

      Strengths:

      The authors have characterized a myriad of striking phenotypes due to Sox9 deletion in the retina and limbal stem cells which will serve as a basis for future studies.

      Weaknesses:

      Hurtado et al. investigate the importance of Sox9 in the retina and limbal stem cells. However, the overall experimental narrative appears dispersed.

      (1) The authors begin by characterizing the phenotype of Sox9 deletion in the retina and show that the absence of the ON layer is due to photoreceptor apoptosis and a reduction in other retinal cell types. The authors also note that Müller glia undergoes gliosis in the Sox9 deletion condition. These striking observations are never investigated further, and instead, the authors switch to lineage-tracing experiments in the limbus that seem disconnected from the first three figures of the paper. Another example of this disconnect is the comparison of Sox9 high and Sox9 low populations using an existing scRNA-seq dataset and the subsequent GO term analysis, which does not directly tie in with the lineage-tracing data of the succeeding Sox9∆/∆ experiments.

      We thank the reviewer for their thoughtful observations. We would like to clarify the rationale behind the structure of our study and how the different parts are conceptually connected.

      Our central aim was to investigate the role of Sox9 in the adult eye. Given that Sox9 has been extensively studied during embryonic development, we specifically chose to use an inducible conditional knockout strategy (CAG-CreERTM) in order to assess its function postnatally, in the adult eye. This approach revealed a severe retinal phenotype, whereas the cornea showed no overt phenotype. A major strength of our experimental design is that it allowed us to examine the role of Sox9 specifically in the adult eye, avoiding confounding effects from embryonic development. Nevertheless, this approach entails an inherent limitation: the mosaic nature of the CAG-CreERTM system leads to substantial variability in both the extent and distribution of Sox9 inactivation among individual animals. We invested considerable effort over extended periods to obtain reliable and biologically meaningful data despite this variability. We did not proceed further because this mosaicism poses a significant limitation when attempting to dissect downstream mechanisms in a consistent and reproducible manner, making it extremely challenging to perform in-depth mechanistic studies.

      Regarding the cornea, given the absence of a clear phenotype upon Sox9 deletion, we expanded our investigation by adding lineage-tracing and transcriptomic analyses to better understand Sox9’s potential role in adult limbal epithelial stem cells. These additional experiments provided valuable insight into Sox9 function in the adult cornea, even in the absence of gross morphological changes. Thus, while the retinal and corneal data stem from different experimental approaches, they are unified by a shared goal: understanding the celltype-specific and tissue-specific functions of Sox9 in the adult eye.

      To ensure that other readers do not perceive this apparent disconnect, and overstate our conclusions, we have modified the manuscript.  In the Introduction section, we have included the main findings from studies conducted to date on the role of Sox9 in the cornea and retina, and we have removed the corresponding section from the Discussion. We believe it is now clear that our study focuses on the role of Sox9 in the adult eye, in contrast to previous studies, which focused on the developing eye.

      In the Discussion section, we have added a new paragraph at the beginning and end that explicitly addresses the relationship between the retinal and limbal findings, illustrating how a single transcription factor can play distinct roles in different tissues within the same organ.

      Regarding the reviewer’s comment that the scRNA-seq analyses appear disconnected from the lineage-tracing data, we respectfully disagree. These analyses provide independent transcriptional confirmation that Sox9 is a marker of limbal stem cells, reinforcing the conclusions drawn from our in vivo experiments. These approaches are complementary and they converge on the same biological insight: Sox9 marks a population with stem-like properties in the adult limbus. Nevertheless, we acknowledge the reviewer’s concern and have moderated the tone of our statements in the revised version of the manuscript to better reflect the supporting nature of the scRNA-seq data, without overstating its functional implications.

      (2) A major concern is that a single Sox9∆/∆ limbal clone has a sufficiently large size, comparable to wild-type clones, as seen in Figure 6D. This singular result is contrary to their conclusion, which states that Sox9-deficient stem cells minimally contribute to the maintenance of the cornea.

      We thank the reviewer for this important observation.

      Ligand-independent activity of Cre-ER fusion proteins has been repeatedly reported in various mouse models (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009). This basal recombinase activity is thought to arise from inappropriate nuclear translocation or proteolysis of the Cre-ER fusion protein, leading to low-level recombination even in the absence of tamoxifen. Consistent with this, prior studies using the same CAGG-CreERTM; R26R-LacZ system for clonal analysis in the cornea have observed sparse reporter expression before tamoxifen administration (Dorà et al., 2015).

      In line with these findings, we also detected minimal background LacZ staining in Sox9Δ/ΔLacZ corneas (mean surface area: 0.85%; n = 8 eyes). This low-level staining likely reflects recombination events in transient amplifying or more differentiated cells, which are not expected to generate long-lived clones. However, in the rare instance of a large clone, as shown in Figure 6D, we believe that a spontaneous recombination event may have occurred in a bona fide limbal stem cell, giving rise to a sustained contribution. To rigorously address this potential artefact and assess the true contribution of Sox9-deficient stem cells, we conducted a comparative analysis of 8 control (Sox9Δ/+-LacZ) and 5 mutant (Sox9Δ/ΔLacZ) corneas. This analysis revealed a highly significant 8-fold reduction in the LacZpositive surface area in mutant samples (Sox9Δ/+-LacZ: 6.65 ± 1.77%; Sox9Δ/Δ-LacZ: 0.85 ± 0.85%; paired t-test, p = 0.00017; Figs. 6E and F; Table S12).

      We chose to include the image of the large clone in the main figure precisely because it does not align with our working hypothesis. We believe that showing such exceptions transparently is scientifically important and may be valuable for other researchers using similar inducible systems. Nonetheless, based on previous literature, the number of samples analyzed, and the statistically significant reduction in clonal contribution, we maintain that the observed phenotype reflects a true biological effect of Sox9 loss, supporting our conclusion that Sox9-deficient stem cells contribute minimally to corneal maintenance. To make that point clearer, we have introduced the following sentence in lines 462-464 of the revised version of the manuscript.

      “A possible explanation for this clone may be that spontaneous ligand-independent activity of Cre-ER fusion may have occurred in a bona fide limbal stem cell, as previously reported (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009, Dorà et al., 2015).”

      Reviewer #2(Public revciew):

      Sox9 is a transcription factor crucial for development and tissue homeostasis, and its expression continues in various adult eye cell types, including retinal pigmented epithelium cells, Müller glial cells, and limbal and corneal basal epithelia. To investigate its functional roles in the adult eye, this study employed inducible mouse mutagenesis. Adult-specific Sox9 depletion led to severe retinal degeneration, including the loss of Müller glial cells and photoreceptors. Further, lineage tracing revealed that Sox9 is expressed in a basal limbal stem cell population that supports stem cell maintenance and homeostasis. Mosaic analysis confirmed that Sox9 is essential for the differentiation of limbal stem cells. Overall, the study highlights that Sox9 is critical for both retinal integrity and the differentiation of limbal stem cells in the adult mouse eye.

      Strengths:

      In general, inducible genetic approaches in the adult mouse nervous system are rare and difficult to carry out. Here, the authors employ tamoxifen-inducible mouse mutagenesis to uncover the functional roles of Sox9 in the adult mouse eye.

      Careful analysis suggests that two degeneration phenotypes (mild and severe) are detected in the adult mouse eye upon tamoxifen-dependent Sox9 depletion. Phenotype severity nicely correlates with the efficiency of Cre-mediated Sox9 depletion.

      Molecular marker analysis provides strong evidence of Mueller cell loss and photoreceptor degeneration.

      A clever genetic tracing strategy uncovers a critical role for Sox9 in limbal stem cell differentiation.

      Weaknesses:

      (1) The Introduction can be improved by explaining clearly what was previously known about Sox9 in the eye. A lot of this info is mentioned in a single, 3-page long paragraph in the Discussion. However, the current study's significance and novelty would become clearer if the authors articulated in more detail in the Introduction what was already known about Sox9 in retina cell types (in vitro and in vivo).

      We appreciate this insightful comment. Following the reviewer`s suggestion, we have reorganized the manuscript to provide a clearer scientific context in the Introduction. Specifically, we have moved the relevant background information on Sox9 in different retinal cell types—previously included in a single, extended paragraph in the Discussion—into the Introduction. This helps to better frame our study within the context of existing knowledge.

      Additionally, we have emphasized more explicitly that our work does not focus on embryonic development, as most previous studies on Sox9 have done, but instead investigates its role in the adult mouse retina and limbus/cornea. We believe this represents an important and novel aspect of our study, as the mechanisms of retinal maintenance and limbal stem cell differentiation in the adult have been less extensively studied.

      (2) Because a ubiquitous tamoxifen-inducible CreER line is employed, non-cell autonomous mechanisms possibly contribute to the observed retina degeneration. There is precedence for this in the literature. For example, RPE-specific ablation of Otx2 results in photoreceptor degeneration (PMID: 23761884). Have the authors considered the possibility of non-cell autonomous effects upon ubiquitous Sox9 deletion?

      Given the similar phenotypes between animals lacking Otx2 and Sox9 in specific cell types of the eye, the authors are encouraged to evaluate Otx2 expression in the tamoxifen-induced Sox9 adult retina.

      We appreciate the insightful comment of the reviewer regarding the potential contribution of non-cell autonomous mechanisms to the retinal degeneration observed upon ubiquitous Sox9 deletion. We agree that this is an important consideration, particularly in the context of findings showing that RPE-specific deletion of Otx2 results in secondary photoreceptor degeneration.

      However, we would like to emphasize that RPE-specific deletion of Sox9 does not lead to photoreceptor loss or retinal degeneration, as previously shown (Masuda et al., 2014; Goto et al., 2018; Cohen-Tayar et al., 2018) [PMID: 24634209; PMID: 29609731; PMID: 29986868]. In addition, it was shown that Sox9 deletion in the RPE caused downregulation of visual cycle genes but did not compromise photoreceptor integrity or survival. Interestingly, Otx2 expression was found to be upregulated in the absence of Sox9, further supporting the view that Sox9 is not a simple upstream regulator of Otx2 in the adult RPE (Matsuda, 2014). These findings suggest that RPE dysfunction alone cannot account for the severe retinal phenotype we observe in our model.

      In our study, we observed that photoreceptor degeneration correlates strongly with the depletion of Sox9 Müller glial cells. Given the well-established supportive and neuroprotective roles of Müller glia, we interpret the retinal degeneration in our model to be primarily a consequence of Müller cell dysfunction (confirmed by the loss of Müller glia markers, such as SOX8 and S100). This interpretation is further supported by previous studies showing that selective ablation of Müller glia can lead to photoreceptor degeneration through cell-autonomous mechanisms (Shen et al., 2012) [PMID: 23136411].

      Nevertheless, we agree that this possibility deserves further investigation, and we have acknowledged it in the following paragraph that has been added to the Discussion section (lines 511-523 of the revised ms):

      “An important consideration in our model is the potential contribution of non-cell autonomous mechanisms to photoreceptor degeneration. Sox9 is expressed in both MG and RPE cells, and both cell types are known to support photoreceptor viability (Poché et al., 2008; Masuda et al., 2014). Notably, Sox9 and Otx2 cooperate to regulate visual cycle gene expression in the RPE (Masuda et al., 2014), and loss of Otx2 specifically in the adult RPE leads to secondary photoreceptor degeneration through non-cell autonomous mechanisms (Housset et al., 2013). However, RPE-specific deletion of Sox9 does not induce retinal degeneration and in fact results in Otx2 upregulation (Masuda et al., 2014; Goto et al., 2018; Cohen-Tayar et al., 2018), suggesting that Sox9 is not an upstream regulator of Otx2 in this context. Further investigation into the molecular and cellular interactions between MG, RPE, and photoreceptors may help to clarify the indirect pathways contributing to degeneration in the absence of Sox9.”

      Consistent with the above, a new citation has been included:

      Housset M, Samuel A, Ettaiche M, Bemelmans A, Béby F, Billon N, Lamonerie T. 2013. Loss of Otx2 in the adult retina disrupts retinal pigment epithelium function, causing photoreceptor degeneration. J Neurosci 33:9890–904. doi:10.1523/JNEUROSCI.1099-13.2013.

      (3) The most parsimonious explanation for the dual role of Sox9 in retinal cell types and limbal stem cells is that the cell context is different. For example, Sox9 may cooperate with TF1 in photoreceptors, TF2, in Mueller cells, and TF3 in limbal stem cells, and such cell typespecific cooperation may result in different outcomes (retinal integrity, stem cell differentiation). The authors are encouraged to add a paragraph to the discussion and share their thoughts on the dual role of Sox9.

      We thank the reviewer for this thoughtful and constructive suggestion. In , we have added a paragraph at the end of the Discussion addressing the potential dual role of Sox9 in the cornea and retina. In this new section, we discuss how Sox9 might exert distinct functions depending on the cellular context, possibly through interactions with different transcriptional partners in specific cell types. This may help explain the contrasting roles of Sox9 in maintaining retinal integrity versus regulating stem cell differentiation in the limbal epithelium.

      (4) One more molecular marker for Mueller glial cells would strengthen the conclusion that these cells are lost upon Sox9 deletion.

      We thank the reviewer for this constructive suggestion. To reinforce our conclusion that most Müller glial cells are lost following Sox9 deletion, we analysed the expression of S100, a well-established cytoplasmic marker of Müller glia. As S100 is primarily localized to the innermost Müller cell processes and not restricted to cell bodies, direct cell counting was not feasible. Instead, we quantified the S100+ signal intensity across defined retinal surface areas. This analysis revealed a statistically significant reduction in S100 signal in Sox9<sup>Δ/Δ</sup> retinas compared to controls. These new data, included in the revised Figure 1 (panels F and G), support and extend our previous observations using SOX8, further confirming the loss of Müller glial cells in Sox9-deficient retinas.

      We have also modified the manuscript based on this new evidences as follows:

      In the Results section, lines 168-177 of the revised ms, we have added the following paragraph: “To independently validate the loss of MG cells in Sox9-deficient retinas, we examined the expression of S100, a cytoplasmic marker that labels the processes of adult Müller cells. In control retinas, strong S100 immunoreactivity was observed across the inner retina, outlining the typical radial projections of Müller glia (Fig. 1F). In contrast, Sox9Δ/Δ retinas with an extreme phenotype exhibited a marked reduction in S100 signal (Fig. 1G). Given the diffuse cytoplasmic localization of S100, we quantified its expression by measuring the fluorescence signal within a defined surface area of the retina. This analysis revealed a statistically significant reduction in S100 signal intensity in mutant samples (including both mild and extreme phenotypes) compared to controls (Fig. 1G; Table S4), further supporting the loss of MG cells upon Sox9 deletion.”

      In Methods, line 684 of the revised ms, the anti-S100 antibody reference and its working dilution have been added.

      (5) Using opsins as markers, the authors conclude that the photoreceptors are lost upon Sox9 deletion. However, an alternate possibility is that the photoreceptors are still present and that Sox9 is required for the transcription of opsin genes. In that case, Sox9 (like Otx2) may act as a terminal selector in photoreceptor cells. This point is particularly important because vertebrate terminal selectors (e.g., Nurr1, Otx2, Brn3a) initially affect neuron type identity and eventually lead to cell loss.

      We perfectly understand the reviewer’s point. However, we believe that the possibility that Sox9 regulates opsin gene expression without affecting photoreceptor survival is very unlikely in our model. The primary evidence comes from the histological analysis shown in Figure 1B, where hematoxylin and eosin staining clearly demonstrates the complete loss of the ONL in Sox9<sup>Δ/Δ</sup> retinas exhibiting the extreme phenotype. Similarly, DAPI counterstain also evidences the lack of the ONL in many of our immunofluorescence images of these samples.  This morphological disappearance of the ONL strongly supports the conclusion that photoreceptor cells are not merely transcriptionally silent but are physically absent.

      Furthermore, TUNEL assays in two retinas with a mild phenotype revealed extensive apoptosis within the ONL, suggesting a progressive degeneration process rather than a selective transcriptional effect. While we acknowledge that transcriptional regulation of opsin genes by Sox9 cannot be entirely ruled out, the observed phenotype is more consistent with a structural loss of photoreceptors rather than a change in their molecular identity alone. Therefore, our data support the interpretation that Sox9 is required for photoreceptor survival, likely through non-cell autonomous mechanisms related to Müller glia dysfunction, rather than acting as a terminal selector within photoreceptor cells themselves.

      (6) Quantification is needed for the TUNEL and GFAP analysis in Figure 3.

      We have quantified the GFAP immunofluorescence signal across defined surface areas of the retina and found a statistically significant increase in GFAP expression in Sox9<sup>Δ/Δ</sup> mutants compared to controls (Mann-Whitney U test, P = 0.0240; n = 4 controls, 10 mutants). These quantification data are now included in the revised Figure 3.

      Regarding the TUNEL assay, although extensive apoptosis was clearly observed in two Sox9<<sup>Δ/Δ</sup> retinas with a mild phenotype (as shown in Figure 3A), this pattern was not consistent across the full study mouse cohort. Out of 15 mutant samples analyzed (5 of them previously analyzed and 10 additional ones that have been newly analyzed), only two exhibited this pronounced apoptotic pattern. However, in the remaining 13 mutants, we did observe a small but statistically significant increase in the number of TUNEL+ cells compared to controls (zero-inflated Poisson test, P = 0.028, n = 5 controls, 13 mutants). These results are now included in Figure 3 and in Tables S7 and S8.

      This pattern likely reflects the transient nature of apoptosis in the degenerative process, which may occur rapidly and thus be difficult to capture consistently at a single time point. Nevertheless, the quantification supports our conclusion that Sox9 loss is associated with increased photoreceptor cell death.

      Based on the above, we have included the following paragraphs in the Results section of the manuscript:

      In lines 224-252 of the revised ms, the final version of the paragraph is as follows: “Since photoreceptors are absent in severely affected Sox9-mutant retinas, we conducted TUNEL assays to study the role of cell death in the process of retinal degeneration. In control samples (n=5), almost no TUNEL signal was observed in the retina. In contrast, Sox9<sup>Δ/Δ</sup> mice (n=15) showed numerous TUNEL+ cells, mainly located in the persisting ONL, indicating that photoreceptor cells were dying (Fig. 3A). Although extensive TUNEL staining in the ONL was clearly observed in two Sox9<sup>Δ/Δ</sup> retinas with mild phenotypes, this pattern was not consistently present across the full cohort. In the remaining 13 mutant retinas, we observed a modest but noticeable increase in the number of apoptotic cells compared to controls (Fig. 3B; Table S7). Despite a high frequency of zero counts (particularly among controls), the difference between groups reached statistical significance when analyzed using a zeroinflated Poisson model (P = 0.028; n = 5 controls, 13 mutants). These findings suggest that photoreceptor apoptosis following Sox9 deletion may occur acutely and within a narrow temporal window, making it challenging to capture the full degenerative process at a single time point”.

      Lines 263-269 of the revised ms: “To support these observations quantitatively, we measured GFAP fluorescence intensity across defined retinal surface areas in control and Sox9<sup>Δ/Δ</sup> mice (Fig. 3D; Table S8). This analysis revealed a statistically significant increase in GFAP signal in mutant retinas compared to controls (Mann-Whitney U test, P = 0.0240; n = 4 controls, 10 mutants). These results are consistent with a progressive gliotic  following Sox9 deletion and provide further evidence that MG cells become reactive in the absence of Sox9”.

      Similarly, the section “Estimation of the percentage of tamoxifen-induced, Cre-mediated recombination” has been expanded as follows:

      Lines 660-665 of the revised ms: “In parallel, to quantify GFAP expression as a measure of MG reactivity, we analyzed GFAP immunofluorescence intensity across defined retinal surface areas. Given the cytoplasmic distribution of GFAP within glial processes, direct cell counting was not feasible. Instead, fluorescence intensity was measured using ImageJ, within full-thickness retinal regions in 20x microphotographs of a retinal sections stained for GAFP. The total GFAP signal was normalized to the measured area for each section”.

      (7) Line 269-320: The authors examined available scRNA-Seq data on adult retina. This data provides evidence for Sox9 expression in distinct cell types. However, the dataset does not inform about the functional role of Sox9 because Sox9 mutant cells were not analyzed with RNA-Seq. Hence, all the data that claim that this experiment provides insights into possible Sox9 functional roles must be removed. This includes panels F, G, and H in Figure 5. In general, this section of the paper (Lines 269-320) needs a major revision. Similarly, lines 442-454 in the Discussion should be removed.

      We thank the reviewer for this important observation. We agree that the scRNA-Seq dataset used in this section does not include Sox9 mutant cells and therefore does not allow us to assess the consequences of Sox9 loss-of-function. However, we believe that this analysis still provides valuable complementary information. Specifically, it confirms that Sox9 is expressed in a distinct population of limbal stem cells, and that its expression dynamically changes along differentiation trajectories. Although we do not infer causality or phenotypic consequences, the ability to observe how gene expression programs shift as Sox9 is downregulated offers insights into potential transcriptional programs associated with Sox9 activity.

      We have carefully revised Lines 269–320 to remove any overinterpretations, and eliminated the corresponding lines in the Discussion (Lines 442–454). However, we have retained Panels G, and H in Figure 5 with updated text that reflect the descriptive nature of these findings, specifically to illustrate that the Sox9-positive cell signature is consistent with a stem cell genetic program, and that when Sox9 is downregulated some gene pathways involved in stem cell differentiation are upregulated.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) Figure 1C shows the proportions of Sox9+cells that express Sox8 in control, mild and extreme phenotypes. However, as no quantitative classification of mild and extreme phenotypes is reported along with Figure 1A, the large standard deviation for Sox9∆/∆ mild retina might be due to a misclassification of the sample. Therefore, the authors must ascribe each sample to "mild" or "extreme" based on a quantitative metric.

      We appreciate the reviewer’s suggestion to clarify the classification criteria used to distinguish “mild” and “extreme” phenotypes in Sox9<sup>Δ/Δ</sup> retinas. As noted, our classification was based on a qualitative, phenotypic assessment of retinal morphology in hematoxylin/eosin-stained sections. Specifically, retinas were classified as “extreme” when the outer nuclear layer (ONL) was completely absent, and as “mild” when the ONL was present, although often reduced in thickness. This classification reflects the observable structural depletion of the ONL and aligns well with the extent of Sox9 loss in Müller glial cells, as shown in Figure 1. We acknowledge that some variability exists within the “mild” group, likely due to differences in recombination efficiency and the mosaic nature of tamoxifen-induced deletion.

      The phenotypic classification of each individual sample is explicitly provided in Supplementary Table S1. We have also added a statement in the Results section clarifying that this classification was based on qualitative histological criteria rather than a numerical threshold.

      Lines 104-113 of the revised ms: “We categorized Sox9<sup>Δ/Δ</sup> retinas into “mild” and “extreme” phenotypes in order to facilitate interpretation of our data. Clasification was based on a qualitative assessment of ONL integrity in histological sections. Specifically, samples were classified as “extreme” when the ONL was completely depleted, and as “mild” when the ONL persisted, albeit variably reduced in thickness. This phenotypic classification reflects observable structural differences rather than a fixed quantitative threshold. Some variability exists within the “mild” group, likely due to differences in recombination efficiency and the mosaic nature of tamoxifen-induced Cre-mediated Sox9 deletion”

      (2) The authors infer Sox9 high and Sox9 low groups of limbal stem cells using an existing scRNA-seq dataset. However, an immunohistology-based validation of this difference is missing. Given that limbal stem cells express Sox9, the authors must examine the heterogeneity in Sox9 levels within the Sox8+ population to demonstrate their claim: "...Sox9 expression decreases as transiently amplifying progenitors undergo progressive differentiation from limbal to peripheral corneal cells." in Line 292. Ideally, this must be further validated using differentiation markers corresponding to CB and ILB populations that show lower Sox9 expression according to the pseudotime graph.

      To validate the Sox9 expression results obtained with scRNA-seq, we performed double immunofluorescence for Sox9 and P63, the latter expressed by the basal cells of the limbal epithelium, but not by transient amplifying cells covering the corneal surface (Pellegrini et al., 2001, https://www.pnas.org/doi/abs/10.1073/ pnas.061032098). These results can be observed in the new panel 5F. Accordingly we have included a new paragraph in lines 369-396 of the revised version of the ms:

      “To validate these results, we decided to closely examine Sox9 expression in the limbus using immunofluorescence. Previous analyses revealed that the outer limbus is approximately 100 μm wide, while the inner limbus is wider, around 240 μm (Altshuler 2021). We observed that in the region corresponding to the OLB, most cells showed strong Sox9 expression. In the area corresponding to the ILB, this immunoreactivity appeared weaker in the basal layer (corresponding to the ILB proper), and no expression was detected in the suprabasal layers (flattened cells; Fig 5F left). Double immunofluorescence for SOX9 and P63, which is expressed in basal cells of the limbal epithelium, but not by transient amplifying cells covering the corneal surface (Pellegrini et al., 2001) revealed that Sox9 expression was restricted to P63-positive cells (Fig 5F right). These observations confirm that Sox9 is expressed in a basal cell population within both the OLB and ILB, and that its expression decreases in differentiated transient amplifying cells. ”

      We also have deleted  “This expression pattern is consistent with our immunofluorescence observations" from line 356 of the revised ms.

      (3) The authors' claim of "...Sox9-null cells cannot survive or proliferate as well as their wildtype neighbors, and are hence outcompeted over time, leading to an essentially wild-type cornea" does not seem very convincing in the light of Fig.6D and S3B where Sox9 deletion can still allow for a large LacZ+ clone. Their claim of wild-type cornea due to out-competing neighbors must be validated by increasing the number of Sox9-null progenitors, which can be tested by administering tamoxifen for a significantly longer duration, leading to a majority Sox9 deficient progenitor population, and then examining limbal and corneal defects.

      As previously discussed, we observed only one instance of a large LacZ+ clone across 8 Sox9<sup>Δ/Δ</sup>-LacZ eyes. Based on prior reports of ligand-independent Cre activity (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009; Dorà et al., 2015), we believe this rare event likely resulted from spontaneous recombination in a bona fide limbal stem cell, independent of tamoxifen administration. For this reason, we do not expect that increasing the dose or duration of tamoxifen would eliminate such rare events. Furthermore, due to the mosaic and highly variable recombination efficiency of the CAGG-CreERTM system in the adult eye (see McMahon et al., 2008), attempting to increase the TX dosage would likely lead to systemic toxicity or lethality, without guaranteeing full inactivation of the gene in the limbus. Thus, this system is not well-suited for generating a fully Sox9-deficient limbal epithelium. To overcome this limitation, we crossed our mice with the R26R-LacZ reporter line to track the clonal behavior of Sox9-deficient cells. In control animals (Sox9Δ/+-LacZ), LacZ+ stripes originating from limbal stem cells are readily observed. In contrast, in Sox9Δ/Δ-LacZ mutants, these clones are either absent or drastically reduced. This suggests that Sox9-null cells have a severely impaired ability to form and sustain clones. To rigorously quantify this effect, we compared 8 control and 5 mutant corneas, revealing a highly significant 8-fold reduction in LacZ-positive area in the mutants (6.65 ± 1.77% vs. 0.85 ± 0.85%; p = 0.00017; Fig. 6F; Table S12; Supp. Fig. X???), supporting our claim that Sox9null cells cannot survive or proliferate as well as their wild-type neighbors, and are hence outcompeted over time, leading to an essentially wild-type cornea.

      Minor points

      (1) Quantification for Figure 2C and 2D is missing.

      We have now included quantification of BRN3A+ retinal ganglion cells (Figure 2E) across control and Sox9<sup>Δ/Δ</sup> retinas. Cell counts were performed on matched retinal sections, and the difference between groups was found to be statistically significant through Mann–Whitney U test (Table S5).

      Regarding PAX6/AP2a, we quantified inner retinal neurons by analyzing AP2α+ amacrine cells and PAX6+/AP2α- horizontal cells as distinct subpopulations, rather than simply comparing total PAX6 or AP2α immunoreactivity. This approach allowed us to better resolve specific neuronal subtype changes. Both populations showed a statistically significant reduction in Sox9-deficient retinas relative to controls. The quantification for these analyses has now been incorporated into the revised Figure 2F and G (Table S6).

      Consequently with the above, the following paragraph of the Results section (line 210 of the revised ms:

      “We also studied the status of other retinal cell types. The transcription factor BRN3A was used to identify ganglion cells (Nadal-Nicolás et al., 2009), which were shown to decrease in number in the mutant retinas, compared to control ones (Fig. 2C). Similarly, double immunodetection of the transcription factors PAX6 and AP2A was used to identify both amacrine and horizontal cells, as previously described (Marquardt et al., 2001; Barnstable et al., 1985; Edqvist and Hallböök, 2004), showing a similar reduction in both cell types in degenerated retinas (Fig. 2D).”

      Has been modified as follows:

      “We also studied the status of other retinal cell types. The transcription factor BRN3A was used to identify ganglion cells (Nadal-Nicolás et al., 2009), which were shown to decrease in number in the mutant retinas, compared to control ones (Figs. 2C and 2D and Table S5; n = 5 controls, n = 12 mutants; Mann-Whitney U test, P = 3 × 10<sup>-4</sup>). Similarly, double immunodetection of the transcription factors PAX6 and AP2A was used to identify both amacrine and horizontal cells (Fig. 2E), as previously described (Marquardt et al., 2001; Barnstable et al., 1985; Edqvist and Hallböök, 2004), showing a similar reduction in both cell types in degenerated retinas (Figs. 2F and 2G and Table S6; AP2α+ amacrine cells: n = 3 controls, n = 8 mutants;  2-sample T-tests P = 0.029; PAX6+/AP2α− horizontal cells: n = 3 controls, n = 8 mutants; Mann-Whitney U test P = 0.021). These findings indicate that the loss of Sox9 in the adult retina ultimately leads to the degeneration of multiple inner retinal neuronal populations, beyond the previously described effects on photoreceptors and Müller glia.

      (2) Figure 4G & H: The authors must mention that the dashed lines enclose the limbal area.

      Done

      (3) The authors infer from an existing scRNA-seq dataset that OLB cells have high Sox9 expression as compared to ILB and corneal populations. However, Figures 4A and B do not indicate the anatomical positions of these cell types. The authors must label these for the reader's reference as they state that "[Sox9] expression pattern is consistent with our immunofluorescence observations" in Line 280.

      As previously indicated, we have generated a new panel 5F and a corresponding paragraph to illustrate Sox9 expression pattern in the limbus. Accordingly, we have removed the sentence from line 280.

      (4) Quantification for Figures 6A and 6B is missing.

      We have quantified the number of Sox9 and P63 positive cells in the limbus between mutant and control corneas and found no difference in the number of positive cells. We have included these data in new panel 6C and Table S11.

      Reviewer #2 (Recommendations for the authors):

      Line 24: "synapsis" should be "synapses".

      Done

      (1) Consider starting a new paragraph after line 30.

      Done

      (2) Lines 42-48: make clear that this paragraph provides information only for HUMAN SOX9.

      We now distinguish which studies were conducted in humans and which in mice.

      (3) Line 55: explain to the non-expert reader what the "visual cycle" is.

      Done (lines 64-65 of the revised ms)

      (4) Line 66: consider "inactivate" instead of "suppress".

      We substituted “suppress” with “inactivate”

      (5) Line 90-92: ONLY PCR for the cGMP will provide formal evidence that this is not present in the mouse line.

      We agree with the reviewer that PCR genotyping is the most straightforward method to exclude the presence of the Pde6<sup>brd</sup>1 allele. Although retinal degeneration was never observed in untreated or control animals in our study, we have now removed the term “formal possibility” from the text to better reflect this limitation.

      We have modified the following paragraph (page 116 in the revised version of the manuscript): “Retinal degeneration was never observed in mice that had not been tamoxifen-treated, nor any other controls, eliminating the formal possibility that the retinal degeneration allele of photoreceptor cGMP phosphodiesterase 6b (Pde6brd1) was present in our mice (Bowes et al., 1990).”

      As follows: “Retinal degeneration was never observed in mice that had not been tamoxifentreated, nor any other control groups, making the presence of the retinal degeneration allele of photoreceptor cGMP phosphodiesterase 6b (Pde6<sup>brd1</sup>) unlikely in our mice (Bowes et al., 1990). However, we acknowledge that definitive exclusion of this possibility would require PCR-based genotyping.”

      (6) Line 160-166: This paragraph needs a conclusion.

      We agree with the reviewer and have added the following sentence at the end of the paragraph:

      “These findings indicate that the loss of Sox9 in the adult retina ultimately leads to the degeneration of multiple inner retinal neuronal populations, beyond the previously described effects on photoreceptors and Müller glia”

      (7) Line: 240-265: This paragraph ends without a conclusion.

      We have include the following conclusion:

      “Thus, Sox9 is expressed in a basal limbal stem cell population with the ability to form two types of long-lived cell clones involved in stem cell maintenance and homeostasis.”

      (8) In Results, it needs to be specified when exactly in adulthood the tamoxifen treatment started. This information is only provided in the Methods.

      We have specified the age of the mice at the onset of tamoxifen treatment (two months)  and included it in the schemes of Figs 1A, 4C, 4H, 6E.

      (9) Line 250: Because live imaging is not conducted, the word "dynamics" is not suitable.

      We substituted “dynamics” with “contribution”

      (10) Panel C in Figure 6 is nice and helpful. Consider adding a similar panel in Figure 1.

      Done.

      (11) Line 420: is this the human Sox9 enhancer?

      Yes. It is a human enhancer. We have indicated it in the text.

      (12) Line 459: typo "detected tissue".

      Corrected

      (13) Line 448 and 468: citations are needed.

      Line 448 is deleted in the revised version of the ms.

      (14) 479: typo "clones clones'.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work computationally characterized the threat-reward learning behavior of mice in a  recent study (Akiti et al.), which had prominent individual differences. The authors  constructed a Bayes-adaptive Markov decision process model and fitted the behavioral data  by the model. The model assumed (i) hazard function starting from a prior (with free mean  and SD parameters) and updated in a Bayesian manner through experience (actually no real  threat or reward was given in the experiment), (ii) risk-sensitive evaluation of future  outcomes (calculating lower 𝛼 quantile of outcomes with free 𝛼 parameter), and (iii) heuristic  exploration bonus. The authors found that (i) brave animals had more widespread hazard  priors than timid animals and thereby quickly learned that there was in fact little real threat,  (ii) brave animals may also be less risk-aversive than timid animals in future outcome  evaluation, and (iii) the exploration bonus could explain the observed behavioral features,  including the transition of behavior from the peak to steady-state frequency of bout. Overall,  this work is a novel interesting analysis of threat-reward learning, and provides useful  insights for future experimental and theoretical work. However, there are several issues that I  think need to be addressed.

      Strengths:

      (1) This work provides a normative Bayesian account for individual differences in  braveness/timidity in reward-threat learning behavior, which complements the analysis by  Akiti et al. based on model-free threat reinforcement learning.

      (2) Specifically, the individual differences were characterized by (i) the difference in the  variance of hazard prior and potentially also (ii) the difference in the risk-sensitivity in the  evaluation of future returns.

      Weakness:

      (1) Theoretically the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, but these two effects could not be teased apart in the  fitting analysis of the current data.

      (2) It is currently unclear how (whether) the proposed model corresponds to neurobiological ( rather than behavioral) findings, different from the analysis by Akiti et al.

      We thank reviewer #1 for their useful feedback which we’ve used to improve the discussion,  formatting and clarity of the paper, and for highlighting important questions for future  extensions of our work.

      Major points:

      (1) Line 219

      It was assumed that the exploration bonus was replenished at a steady rate when the animal  was at the nest. An alternative way would be assuming that the exploration bonus slowly  degraded over time or experience, and if doing so, there appears to be a possibility that the  transition of the bout rate from peak to steady-state could be at least partially explained by  such a decrease in the exploration bonus.

      Section 2.2.3 explains the mechanism of the exploration bonus which motivates approach.  We think that the mechanism suggested by the reviewer is, in essence, what is happening in  the model. The exploration pool is indeed depleted over time or bouts of experience at the  object. In the peak confident phase for brave animals and the peak cautious phase for timid  animals, the rate of depletion exceeds the rate of regeneration, since the agent spends only  a single turn at the nest between bouts. In the steady-state phase, the exploration pool has  depleted so much previously that the agent must wait multiple turns at the nest for the pool  to regenerate to a sufficiently high value to justify approaching the object again.

      We have updated section 2.2.3 to explain that agents spend one turn at the nest during peak  phase but multiple turns during steady-state phase. Hopefully, this makes our mechanism  clear:

      “In simulations, when 𝐺(𝑡) is high, the agent has a high motivation to explore the object,  spending only a single turn in the nest state between bouts. In other words, the depletion  from 𝐺0 substantially influences the time point at which approach makes a transition from  peak to steady-state; the steady-state time then depends on the dynamics of depletion  (when at the object) and replenishment (when at the nest). In particular, in the steady-state  phases, the agent must wait multiple turns at the nest for 𝐺(𝑡)  to regenerate so that  informational reward once again exceeds the potential cost of hazard.“

      (2) Line 237- (Section 2.2.6, 2.2.7, Figures 7, 9)

      I was confused by the descriptions about nCVaR. I looked at the cited original literature  Gagne & Dayan 2022, and understood that nCVaR is a risk-sensitive version of expected  future returns (equation 4) with parameter α (α-bar) (ranging from 0 to 1) representing risk  preference. Line 269-271 and Section 4.2 of the present manuscript described (in my  understanding) that α was a parameter of the model. Then, isn't it more natural to report  estimated values of α, rather than nCVaR, for individual animals in Section 2.2.6, 2.2.7,  Figures 7, 9 (even though nCVaR monotonically depends on α)? In Figures 7 and 9, nCVaR  appears to be upper-bounded to 1. The upper limit of α is 1 by definition, but I have no idea why nCVaR was also bounded by 1. So I would like to ask the authors to add more detailed  explanations on nCVaR. Currently, CVaR is explained in Lines 237-243, but actually, there is  no explanation about nCVaR rather than its formal name 'nested conditional value at risk' in  Line 237.

      Thank you for pointing out this error. We have corrected the paper to use nCVaR to refer to  the objective and nCVaR's α, or sometimes just α, to refer to the risk sensitivity parameter  and thus the degree of risk sensitivity.

      (3) Line 333 (and Abstract)

      Given that animals' behaviors could be equally well fitted by the model having both nCVaR ( free α) and hazard prior and the alternative model having only hazard prior (with α = 1), may  it be difficult to confidently claim that brave (/timid) animals had risk-neutral (/risk-aversive)  preference in addition to widespread (/low-variance) hazard prior? Then, it might be good to  somewhat weaken the corresponding expression in the Abstract (e.g., add 'potentially also'  to the result for risk sensitivity) or mention the inseparability of risk sensitivity and prior belief  pessimism (e.g., "... although risk sensitivity and prior belief pessimism could not be teased  apart").

      Thank you for this suggestion, we have duly weakened the wording in the Abstract to say  “potentially more risk neutral”:

      “Some animals begin with cautious exploration, and quickly transition to confident approach  to maximize exploration for reward; we classify them as potentially more risk neutral, and  enjoying a flexible hazard prior. By contrast, other animals only ever approach in a cautious  manner and display a form of  self-censoring; they are characterized by potential risk  aversion and high and inflexible hazard priors.”

      Reviewer #2 (Public Review):

      Shen and Dayan build a Bayes adaptive Markov decision process model with three key  components: an adaptive hazard function capturing potential predation, an intrinsic reward  function providing the urge to explore, and a conditional value at risk (CvaR, closely related  to probability distortion explanations of risk traits). The model itself is very interesting and  has many strengths including considering different sources of risk preference in generating  behavior under uncertainty. I think this model will be useful to consider for those studying  approach/avoid behaviors in dynamic contexts.

      The authors argue that the model explains behavior in a very simple and unconstrained  behavioral task in which animals are shown novel objects and retreat from them in various  manners (different body postures and patterns of motor chunks/syllables). The model itself  does capture lots of the key mouse behavioral variability (at least on average on a  mouse-by-mouse basis) which is interesting and potentially useful. However, the variables in  the model - and the internal states it implies the mice have during the behavior - are  relatively unconstrained given the wide range of explanations one can offer for the mouse  behavior in the original study (Akiti et al). This reviewer commends the authors on an original  and innovative expansion of existing models of animal behaviour, but recommends that the  authors  revise their study to reflect the obvious  challenges . I would also recommend a  reduction in claiming that this exercise gives a normative-like or at least quantitative account  of mental disorders.

      We thank reviewer #2 for highlighting some of the strengths of our paper as well as pointing  out important limitations of Akiti et al’s original study which we’ve inherited as well as some  limitations of our own method. We address their concerns below.

      We have added a paragraph to the discussion discussing the limitations of the state  representation we adopted from Akiti’s study.

      (Reviewer #1 had the same concern, see above) “Motivated by tail-behind versus  tail-exposed in Akiti et al. (2022), we model approach using a dichotomy between cautious  and confident approach states [...]”

      We have reduced the suggestion that our model provides an account of mental disorders in  the abstract.

      Before:

      “On the other hand, “timid” animals, characterized by risk aversion and high and inflexible  hazard priors, display self-censoring that leads to the sort of asymptotic maladaptive  behavior that is often associated with psychiatric illnesses such as anxiety and depression.”

      After:

      “By contrast, other animals only ever approach in a cautious manner and display a form of  self-censoring; they are characterized by potential risk aversion and high and inflexible  hazard priors. “

      My main comment is that this paper is a very nice model creation that can characterize the  heterogeneity rodent behavior in a very simple approach/avoid context (Akiti et al; when a  novel object is placed in an arena) that itself can be interpreted in a multitude of ways. The  use of terms like "exploration", "brave", etc in this context is tricky because the task does not  allow the original authors (Akiti et al) to quantify these "internal states" or "traits" with the  appropriate level of quantitative detail to say whether this model is correct or not in capturing  the internal states that result in the rodent behavior. That said, the original behavioral setup  is so simple that one could imagine capturing the behavioral variability in multiple ways ( potentially without evoking complex computations that the original authors never showed  the mouse brain performs). I would recommend reframing the paper as a new model that  proposes a set of internal states that could give rise to the behavioral heterogeneity  observed in Akiti et al, but nonetheless is at this time only a hypothesis. Furthermore, an  explanation of what would be really required to test this would be appreciated to make the  point clearer.

      We thought very hard about using terms that might be considered to be anthropomorphic  such as ‘timid’ and ‘brave’. We are, of course, aware, of the concerns articulated by  investigators such as LeDoux about this. However, we think that, provided that we are clear  on the first appearance (using ‘scare’ quotes) that we are using them as indeed labels for  latent characteristics that capture correlations in various aspects of behaviour, they are more  helpful than harmful in making our descriptions understandable.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript presents computational modelling of the behaviour of mice during  encounters with novel and familiar objects, originally reported by Akiti et al. (Neuron 110, 2022)          . Mice typically perform short bouts of approach followed by a retreat to a safe  distance, presumably to balance exploration to discover possible rewards with the potential  risk of predation. However, there is considerable heterogeneity in this exploratory behaviour,  both across time as an individual subject becomes more confident in approaching the object,  and across subjects; with some mice rapidly becoming confident to closely explore the  object, while other timid mice never become fully confident that the object is safe. The  current work aims to explain both the dynamics of adaptation of individual animals over time,  and the quantitative and qualitative differences in behaviour between subjects, by modelling  their behaviour as arising from model-based planning in a Bayes adaptive Markov Decision  Process (BAMDP) framework, in which the subjects maintain and update probabilistic  estimates of the uncertain hazard presented by the object, and rationally balance the  potential reward from exploring the object with the potential risk of predation it presents.

      In order to fit these complex models to the behaviour the authors necessarily make  substantial simplifying assumptions, including coarse-graining the exploratory behaviour into  phases quantified by a set of summary statistics related to the approach bouts of the animal.  Inter-individual variation between subjects is modelled both by differences in their prior  beliefs about the possible hazard presented by the object and by differences in their risk  preference, modelled using a conditional value at risk (CVaR) objective, which focuses the  subject's evaluation on different quantiles of the expected distribution of outcomes.  Interestingly these two conceptually different possible sources of inter-subject variation in  brave vs timid exploratory behaviour turn out not to be dissociable in the current dataset as  they can largely compensate for each other in their effects on the measured behaviour.  Nonetheless, the modelling captures a wide range of quantitative and qualitative differences  between subjects in the dynamics of how they explore the object, essentially through  differences in how subject's beliefs about the potential risk and reward presented by the  object evolve over the course of exploration, and are combined to drive behaviour.

      Exploration in the face of risk is a ubiquitous feature of the decision-making problem faced  by organisms, with strong clinical relevance, yet remains poorly understood and  under-studied, making this work a timely and welcome addition to the literature.

      Strengths:

      (1) Individual differences in exploratory behaviour are an interesting, important, and  under-studied topic.

      (2) Application of cutting-edge modelling methods to a rich behavioural dataset, successfully  accounting for diverse qualitative and qualitative features of the data in a normative  framework.

      (3) Thoughtful discussion of the results in the context of prior literature.

      Limitations:

      (1) The model-fitting approach used of coarse-graining the behaviour into phases and fitting  to their summary statistics may not be applicable to exploratory behaviours in more complex  environments where coarse-graining is less straightforward.

      (2) Some aspects of the work could be more usefully clarified within the manuscript.

      We thank reviewer #3 for their positive feedback and helping us to improve the clarity of our  paper. We have added discussion they thought was missing.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25-28

      This part of the Abstract might give an impression that timidity (but not braveness) is  potentially associated with psychiatric illness and even that timidity is thus inferior to  braveness. However, even though extreme timidity might indeed be associated with anxiety  or depression, extreme braveness could also be associated with other psychiatric or  behavioral problems. Moreover, as a population, the existence of both timid and brave  individuals could be advantageous, and it could be a reason why both types of individuals  evolutionarily survived in the case of wild animals (although Akiti et al. used mice, which may  have no or very limited genetic varieties, and so things may be different). So I would like to  encourage the authors to elaborate on the expression of this part of the Abstract and/or  enrich the related discussion in the Discussion.

      This is an important point. We note on line 38 that excessive novelty seeking (potentially  caused by excessive braveness) could also be maladaptive.

      Additionally, we have added a paragraph to the discussion discussing heterogeneity in risk  sensitivity within a population.

      “Our data show that there is substantial variation in the degrees of risk sensitivity across the  mice.  Previous works have reported substantial interpopulation and intrapopulation  differences in risk-sensitivity in humans which depend on gender, age, socioeconomic  status, personality characteristics, wealth and culture (Rieger et al., 2015; Frey et al., 2017).  Despite the normative appeal of 𝛼 = 1, it is possible that a population may benefit from  including individuals with $\alpha$ different from 1.0 or highly negative priors. For example,  more cautious individuals could learn from merely observing the risky behavior of less  cautious individuals. Furthermore, we have only considered risk-sensitivity under epistemic  uncertainty in our work. Risk averse individuals, for instance with 𝛼 < 1 may be more  successful than risk-neutral agents in environments where there are unexpected dangers ( unknown unknowns). Risk-aversion is thus a temperament of ecological and evolutionary  significance (Réale et al., 2007).”

      (2) Line 149

      Section 2.2 consists of eight subsections. I think this organization may not be very  appealing, because there are a bit too many subsections, and their relations are not  immediately clear to readers. So I would like to encourage the authors to make an  elaboration. For example, since 2.2.1 - 2.2.5 describes a summary of model construction  and model fitting whereas 2.2.6-2.2.8 shows the results, it could be good to divide these into  separate sections (2.2.1 - 2.2.5 and 2.3.1 - 2.3.3).

      Thank you for pointing this out. We’ve renumbered the sections as you’ve suggested.

      (3) Line 347-8

      Theoretically, the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, as the authors mentioned in Lines 393-394. Then isn't it  possible to consider environments/conditions in which the two effects can be separated?

      We appreciate this suggestion. Indeed, our original thought in modeling this experiment was  that this would be exactly the case here - with epistemic uncertainty reducing as the object  became more familiar. However, proving to an animal that a single environment is  completely stationary/fixed is hard - reflected in our conclusion here that the exploration  bonus pool replenishes. Thus, we argued in the discussion that a series of environments  would be necessary to separate risk sensitivity from priors.

      (4) Line 407

      It would be nice to add a brief phrase explaining how (in what sense) this model's  assumption was consistent with the reported behavior. Also, should the assumption of  having two discrete approach states (cautious and confident) itself be regarded as a  limitation of the model? If the tail-behind and tail-exposure approaches were not merely  operationally categorized but were indicated to be two qualitatively distinct behaviors in the  experiment by Akiti et al., it is reasonable to model them as two discrete states, but  otherwise, the assumption of two discrete states would need to be mentioned as a  simplification/limitation.

      We have now removed line 407, and now have an additional  paragraph in the discussion  discussing the limitations of the tail-behind and tail-exposure state representation: “Motivated by tail-behind versus tail-exposed in Akiti et al. (2022), we model approach using  a dichotomy between cautious and confident approach states. This is likely a crude  approximation to the continuous and multifaceted nature of animal approach behavior. For  example, during approach animals likely adjust their levels of vigilance continuously (or  discretely; Lloyd and Dayan (2018)) to  monitor threat, and choose different velocities for  movement, and different attentional strategies for inspecting the novel object. We hope  future works will model these additional behavioral complexities, perhaps with additional  internal states, and corroborate these states with neurobiological data.”

      (5) Line 418

      The authors contrasted their model-based analyses with the model-free analyses of Akiti et  al. Another aspect of differences between the authors' model and the model of Akiti et al. is  whether it is normative or mechanistic: while how the model of Akiti et al. can be biologically  implemented appears to be clear (TS dopamine represents threat TD error, and TS  dopamine-dependent cortico-striatal plasticity implements TD error-based update of  model-free threat prediction), biological implementation of the authors' model seems more  elusive. Given this, it might be a fruitful direction to explore how these two models can be  integrated in the future.

      We enthusiastically agree that it would be most interesting in the future to explore the  integration of the two models - and, in the discussion ( Lines 537-548, 454-461) , point to  some first steps that might be fruitful along these lines. There are two separate  considerations here: one is that our account is mostly computational and algorithmic,  whereas Akiti’s model is mostly algorithmic and implementational; the second is, as noted by  the reviewer, that our account is model-based, whereas Akiti’s model is model-free (in the  sense of reinforcement learning; RL). These are related - thanks in no small part to the work  from the group including Akiti, we know a lot more about the implementation of model-free  than model-based RL. However, our model-based account does reach additional features of  behavior not captured in Akiti et al.’s model such as bout duration, frequency, and approach  type. Thus, the temptation of unification.

      (6) Line 426

      Related to the previous point, it would be nice to more specifically describe what variable TS  dopamine can represent in the authors' model if possible.

      In the discussion  (Lines 454-461) , we speculate that  TS dopamine could still respond to the  physical salience of the novel object and affect choices by determining the potential cost of  the encountered threat or the prior on the hazard function. For example, perhaps ablating TS  dopamine reduces the hazard priors which leads to faster transition from cautious to  confident approach and longer bout durations, consistent with the optogenetics behavioral  data reported in Akiti et al.

      Reviewer #2 (Recommendations for the authors):

      My guess is simpler versions of the model would not fit the data well. But this does not mean  for example that the mice have probability distortions (CvaR) or that even probabilistic  reasoning and the internal models necessary to support them are acting in the behavioral  context studied by Akiti. So related to the above, I would ask what other models would fit and  would not fit the data? And what does this mean?

      These are good points. Our model provides an approximately normative account of the  animals’ behavior  in terms of what it achieves relative to a utility function. In practice, the  animals could deploy a precompiled model-free policy (which does not rely on probabilistic  computations) that is exactly equivalent to our model-based policy. With the current  experiment, we cannot conclude whether or not the animals are performing the prospective  calculations in an online manner. Of course, the extent to which animals or humans are  performing probabilistic computations online and have internal models are on-going  questions of study.

      Model comparison is difficult because currently we do not know of any other risk-sensitive  exploration models. We cannot directly compare to the model in Akiti et al. since our model  explains additional features of behavior: bout duration, frequency, and approach type.  Indeed, our model is as simple as it can be in the sense with the exception of nCVaR,  removing any of the other parameters makes it difficult to fit some animals in our dataset. In the future, our model could be used to fit other datasets of risk-sensitive exploration and,  ideally,  be compared to other models.

      Explaining why animals avoid the novel object in what the offers call benign environment is a  very tricky issue. In Akiti et al, the readers are not yet convinced that the mice know that this  environment is benign. Being placed in an arena with a novel object presents mice with a  great uncertainty and we do not know whether they treat this as benign. Therefore, the  alternative explanations in this study need to be carefully discussed in lieu of the limitations  of the initial study.

      It is certainly true that it is unclear if the arena is  completely  benign to the animals. However,  the amount of time the animal spends in the center of the arena decreases significantly from  habituation to novelty days. This suggests that the animals avoid the novel object largely  because of the object itself, rather than the potential danger associated with the arena.  Furthermore, the animals are not reported as exhibiting more extreme behaviours such as  freezing. In any case, our account is relative in the sense that we are comparing the time the  animal spends at the object versus elsewhere in the environment, driven by the relative  novelty and relative risk of the environment versus the object. Trying to get more absolute  measures of these quantities would require a richer experimental set-up, for instance with  different degree of habituation or experience of the occurrence of (other) novel objects, in  general.

      We added a short note to the discussion to explain this:

      “Fourth, we modeled the relative amount of time the animal spends at the object versus  elsewhere in the environment which depends on the differential risk in the two states.  However, it is likely the animals avoid the novel object largely because of the object itself,  rather than the potential danger associated with the arena since they spend much less time  at the center of the arena during novelty than habituation days.”

      Figure 2 - how confident are the authors that each mouse differs from y=1? Related to this,  the behavior in Akiti is very noisy and changes across time. I am not sure if the authors fully  describe at what levels their model captures the behavior vs not in a detailed enough  fashion.

      We have performed a random permutation test on the minute-to-minute data. We have  updated Figure 2 so that brave animals that pass the Benjamini–Hochberg procedure y>1 at  level q=0.05 are represented with solid green dots and animals that don’t pass are  represented with hollow dots. 8 out of 11 brave animals passed Benjamini–Hochberg.

      Reviewer #3 (Recommendations for the authors):

      (1) I could not find information in the preprint about code availability. Please consider making  the code public to help others apply these modelling methods.

      We have released code and included the url in the paper in the Methods section.

      (2) Though the manuscript was generally clearly written, there were a number of places  where some additional information or clarification would be useful:

      a) Please define and explain the terms 'tail-behind' and 'tail-exposed' (used to describe  approach bout types) when first used.

      We have added definitions when we first mention these terms:

      “[...] 'tail-behind' (bouts where the animal's nose was closer to the object than the tail for the  entire bout) and 'tail-exposed' (bouts where the animal's tail is closer to the object than the  nose at some point during the bout), associated respectively with cautious risk-assessment  and engagement”

      b) At lines 57-58 when contrasting the 'model-free' account of Akiti et al with the 'model-based' account of the current work, it would be worth clarifying that these terms are  being used in the RL sense rather than e.g. a model-based analysis of the data.  

      We have updated the relevant lines to say “model-free/based reinforcement learning”.

      c) Line 61, the phrase 'the significant long-run approach of timid animals despite having  reached the "avoid" state' is unclear as the 'avoid' state has not been defined.

      We updated the terminology to “avoidance behavior” to be consistent with Akiti et al.  Avoidance refers to the animal routinely avoiding the object and therefore being unable to  learn whether it is safe.

      d) It was not completely clear to me how the coarse-graining of the behaviour was  implemented. Specifically, how were animals assigned to the brave, intermediate, or timid  group, and how were the parameters of the resulting behavioural phases fit?

      Sorry that this was not clear. Section 2.1 explains how the minute-to-minute behavioral data  was coarse-grained and how animal groups were assigned. We have added further  explanation of Figure 2 to the main text:

      “Fig 2 summarizes our categorization of the animals into the three groups: brave,  intermediate, and timid based on the phases identified in the animal's exploratory  trajectories. Timid animals spend no time in confident approach and are plotted in orange at  the origin of Fig 2. Brave animals differ from intermediate animals in that their approach time  during the first ten minutes of the confident phase is greater than the last ten minutes ( steady-state phase). Brave animals are plotted in green above and intermediate animals  are plotted in black below the y=1 line in Fig 2.”

      We also added extra information to outline the goal, and methodology of coarse-graining and  animal grouping:

      “We sought to capture  these qualitative differences (cautious versus confident) as well as  aspects of the quantitative changes in bout durations and frequencies as the animal learns  about their environment. To make this readily possible, we abstracted the data in two ways:

      averaging  bout statistics over time, and clustering the animals into three groups with  operationally distinct behaviors.”

      e) What purpose does the 'retreat' state serve in the BAMDP model (as opposed to  transitioning directly from 'object' to 'nest' states), and why do subjects not pass through it  following 'detect' states?

      Thank you for pointing this out. We have updated Figure 3 to note that the two “detected  states” also point to the “retreat” state. The reviewer is correct that there could be alternative  versions of the state diagram, and the ‘retreat’ state could indeed have been eliminated.  However, we thought that it was helpful to structure the animal’s progress through state  space.

      f) Why was the hazard function parameterised via the mean and SD at each time step rather  than with a parametric form of the mean and SD as a function of time?

      Since the agent can only spend 2, 3, or 4 turns at the object states, we didn’t see a need to  parameterize the mean and SD as a function of time. Doing so is a good solution to scaling  up the hazard function to more time-steps.

      (3) There were also a couple of points that could potentially be usefully touched on in the  discussion:

      a) What, if any, is the relationship between the CVaR objective and distributional RL? They  seem potentially related due to both focussing on quantiles of the outcome distribution.

      We have added a paragraph to the discussion discussing the connection between  distributional RL and CVaR:

      “CVaR is known to come in different flavors in the case of temporally-extended behavior.  Gagne and Dayan (2021) introduces two alternative time-consistent formulations of CVaR:  nested CVaR (nCVaR) and precommitted CVaR (pCVaR). nCVaR and pCVaR both enjoy  Bellman equations which make it possible to compute approximately optimal policies without  directly computing whole distributions of the outcomes. We use nCVaR in this study for its  computational efficiency. There is, of course, great current interest in distributional  reinforcement learning (Bellemare et al., 2023b) which does acquire such whole  distributions, not the least because of prominent observations linking non-linearities in the  response functions of dopamine neurons to methods for learning distributions of outcomes ( Dabney et al., 2020; Masset et al., 2023; Sousa et al., 2023). One functional motivation for  considering entire outcome distributions is the possibility of using them to determine  risk-sensitive policies (Gagne and Dayan, 2021).

      While it is possible to compute CVaR directly from return distributions, Gagne and Dayan  (2021) showed that this can lead to temporally inconsistent policies where the agent  deviates from its original plans (the authors called this the fixed CVaR or fCVaR measure).

      Rather further removed from our model-based methods is work from Antonov and Dayan  (2023), who consider a model-free exploration strategy which exploits full return distributions  to compute the value of perfect information which is used as a heuristic for trying actions  with uncertain consequences. Future works can examine risk-sensitive versions of Antonov  and Dayan (2023)'s computationally efficient model-free algorithm as one solution to the  burdensome computations in our model-based method.”

      b) Why normatively might subjects have non-neutral risk preference as captured by the  CvaR?

      We also added a paragraph to the discussion discussing the advantage of heterogeneity in  risk sensitivity within a population:

      (Reviewer #1 had the same question, see above) “Our data show that there is substantial  variation in the degrees of risk sensitivity across the mice.  Previous works have reported  substantial interpopulation and intrapopulation differences in risk-sensitivity in humans which  depend on gender, age, socioeconomic status, personality characteristics, wealth and culture [...]”

      c) Relevance of the current modelling work to clinical conditions characterised by  dysregulation of risk assesment (e.g. anxiety or PTSD).

      We’ve added a paragraph to the discussion:

      “Inter-individual differences in risk sensitivity are also of critical importance in psychiatry,  reflected in a panoply of anxiety disorders (Butler and Mathews, 1983; Giorgetta et al., 2012;  Maner et al., 2007; Charpentier et al., 2017), along with worry and rumination (Gagne and  Dayan, 2022). Understanding the spectrum of   extreme priors and extreme values of 𝛼  could have therapeutic implications, adding significance to the search for tasks that can  more cleanly separate them.”

      d) Is it surprising to see differences in risk preference (nCVaR) between the familiar object  and novel object condition, given that risk preference might be conceptualised as a trait  rather than a state variable?

      Thank you for raising this point. You are right that we expected risk sensitivity (nCVaR alpha)  to be the same between FONC and UONC animals on average. It is difficult to know if alpha  is higher for FONC than UONC animals due to the non-identifiability between alpha and  hazard priors. We have added this discussion to the paper:

      “This is surprising if we interpret 𝛼 as a trait that is stable through time. Unfortunately, due to  the non-identifiability between 𝛼 and hazard priors, we cannot verify whether 𝛼 is actually  higher for FONC animals than UONC animals.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Response to Reviewer #3:

      We thank reviewer 3 for spending their valuable time on commenting on our revised paper.

      We would like to reiterate the central conclusion of this work, which appears to have been missed by Reviewer 3. Using a BFP-expressing lineage tracer hPSC line for tracking LMX1A+ midbrain-patterned neural progenitors and their differentiated progeny, we discovered a loss of the LMX1A lineage during pluripotent stem cell differentiation into astrocytes, despite BFP+ neural progenitors were the dominant population at the onset of astrocyte induction.

      Hence, the take-home message of this study is, as summarized in the abstract, ‘ the lineage composition of iPSC-derived astrocytes may not accurately recapitulate the founder progenitor population’ and that one should not take for granted that in vitro/stem cell-derived astrocytes are the descendants of the dominant starting neural progenitors (which is a general assumption in PSC publications as described in the paper and our response to reviewers).

      Please find below our point-by-point response to reviewer comments. We have re-ordered the points according to their relative importance to our main conclusions.

      ‘ the lineage composition of iPSC-derived astrocytes may not accurately recapitulate the founder progenitor population’ and that one should not take for granted that in vitro/stem cell derived astrocytes are the descendants of the dominant starting neural progenitors (which is a general assumption in PSC publications as described in the paper and our response to reviewers).

      Please find below our point-by-point response to their comments. We have re-ordered the points according to their relative importance to our main conclusions.

      …. They used lineage tracing with a LMX1A-Cre/AAVS1-BFP iPSCs line, where the initial expression of LMX1A and Cre allows the long-lasting expression of BFP, yielding BFP+ and BFP- populations, that were sorted when in the astrocytic progenitor expansion. BFP+ showed significantly higher number of cells positive to NFIA and SOX9 than BFP- cells …

      This is a misunderstanding by reviewer 3. As indicated in the first sentence of the second section, BFP- populations used for functional and transcriptomic analysis was not sorted BFP<sup>-</sup> cells, but those derived from unsorted, BFP<sup>+</sup> enriched populations. Our scRNAseq analysis indicated that they were transcriptomically aligned to human midbrain astrocytes. This finding is consistent with the fact that they are derived from midbrain-patterned neural progenitors, presumably minority LMX1A- progenitors.

      Reviewer 3’s comments indicate that they misunderstood the primary aims of our study as a mere functional and transcriptomic comparison of the two astrocyte populations.

      (9) BFP+ cells did not show higher levels of transcripts for LMX1A nor FOXA2. This fact jeopardizes the claim that these cells are still patterned. In the same line, there are not significant differences with cortical astrocytes, indicating a wider repertoire of the initially patterned cells, that seems to lose the midbrain phenotype. Furthermore, common DGE shared by BFP- and BFP+ cells when compared to non-patterned cells indicate that after culture, the pre-pattern in BFP+ cells is somehow lost, and coincides with the progression of BFP- cells.

      The reviewer seems to assume that astrocytes derived from LMX1A+ ventral midbrain progenitors must retain LMX1A expression. We do not take this view and do not claim this in this study. Moreover, we have discussed in the paper that due to a lack of transcriptomic studies of in vivo track regional progenitors (such as LMX1A), it remains unknown whether and to what extent patterning gene expression is maintained in astrocytes of different brain regions.

      Our findings on the lack of LMX1A and FOXA2 in BFP+ astrocytes are supported by several published single-cell transcriptomic studies of human midbrain astrocytes (La Manno et al. 2016; Agarwal et al. 2020; Kamath et al. 2022). We have a paragraph of discussion on this topic in both the original and updated versions of the paper with the relevant publications cited.

      Other points raised by reviewer 3

      (1) It is very intriguing that GFAP is not expressed in late BFP- nor in BFP+ cultures, when authors designated them as mature astrocytes.

      We did not designate our cells as ‘mature’ astrocytes but ‘astrocytes’ based on their global gene expression with the human fetal and adult brain astrocytes as references.

      Moreover, ‘mature’ only appeared once in the paper indicating that our cells lie in between the fetal and adult astrocytes in maturity.

      (2) In Fig. 2D, authors need to change the designation "% of positive nuclei".

      To be corrected in the version of record.

      (3) In Fig. 2E, the text describes a decrease caused by 2APB on the rise elicited by ATP, but the graph shows an increase with ATP+2APB. However, in Fig. 2F, the peak amplitude for BFP+ cells is higher in ATP than in ATP+2APD, which is mentioned in the text, but this is inconsistent with the graph in 2E.

      To be corrected in the version of record.

      (4) The description of Results in the single-cell section is confusing, particularly in the sorted CD49 and unsorted cultures. Where do these cells come from? Are they BFP-, BFP+, unsorted for BFP, or non-patterned? Which are the "all three astrocyte populations"? A more complete description of the "iPSC-derived neurons" is required in this section to allow the reader to understand the type and maturation stage of neurons, and if they are patterned or not.

      As previously reported in the reference cited, CD49 is a novel human astrocyte marker. This is independent of BFP expression. For all three astrocyte populations studied here (BFP+, BFP-, and non-patterned astrocytes), we included both CD49f+ sorted and unsorted samples to account for selection bias caused by FACS. iPSC-derived neurons were included in the sequencing study to provide a reference for cell-type annotation. They were generated following a GABAergic neuron differentiation protocol. However, their maturation stages and/or regional characteristics are not relevant to astrocytes.

      (5) A puzzling fact is that both BFP- and BFP- cells have similar levels of LMX1A, as shown in Fig. S6F. How do authors explain this observation?

      This figure panel shows that LMX1A, LMX1B and FOXA2 are essentially NOT expressed in these astrocytes.

      (6) In Fig. 3B, the non-patterned cells cluster away from the BFP+ and BFP-; on the other hand, early and late BFP- are close and the same is true for early and late BFP+. A possible interpretation of these results is that patterned astrocytes have different paths for differentiation, compared to non-patterned cells. If that can be implied from these data, authors should discuss the alternative ways for astrocytes to differentiate.

      Both BFP+ and BFP- astrocyte are from ventral midbrain patterned neural progenitors, while non-patterned neural progenitors are more akin to that of forebrain. Figure 3B is expected and confirms the patterning effect.

      (7) Fig. 3D shows that cluster 9 is the only one with detectable and coincident expression of both S100B and GFAP expression. Please discuss why these widely-accepted astrocyte transcripts are not found in the other astrocytes clusters. Also, Sox9 is expressed in neurons, astrocyte precursors and astrocytes. Why is that?

      S100B and GFAP are classic astrocyte markers in certain states. We are not relying only on two markers but the genome-wide expression profile as the criteria for astrocytes. As shown in the unbiased reference mapping to multiple human brain astrocyte scRNA-seq datasets, all our astrocyte clusters were mapped with high confidence to human astrocytes.

      SOX9 is an important regulator for astrogenesis, so its expression is expected in precursors (doi.org/10.1016/j.neuron.2012.01.024). In addition, recent studies have uncovered that SOX9 expression is also reported in foetal striatal projection neurons and early postnatal cortical neurons, where SOX9 regulates neuronal synaptogenesis and morphogenesis (dois:10.1016/j.fmre.2024.02.019; 10.1016/j.neuron.2018.10.008). Therefore, the expression of SOX9 in multiple cell types was expected. Instead of using a few selected markers for cell-type annotation, we employed a genomic approach relying on an unbiased reference mapping approach and a combination of various markers to ascertain our annotation results.

      (8) Line 337, Why authors selected a log2 change of 0.25? Typically, 1 or a higher number is used to ensure at least a 2-fold increase, or a 50% decrease. A volcano plot generated by the comparison of BFP+ with BFP- cells would be appropriate. The validation of differences by immunocytochemistry, between BFP+ and BFP-, is inconclusive. The staining is blur in the images presented in Fig. S8C. Quantification of the positive cells, without significant background signal, in both populations is required.

      We used a lenient threshold owing to the following considerations: 1) High FC does not necessarily mean biological relevance, as gene expression does not necessarily translate to protein expression. Therefore, a smaller FC value could also be biologically meaningful. 2) Balance between noise and biological differences. Any threshold was chosen arbitrarily. 3) We are identifying a trend rather than pinpointing a specific set of

      The quality was unfortunately reduced due to restrictions on file size upon submission. A high resolution Fig. S8C is available.

      (10) For the GO analyses, How did authors select 1153 genes? The previous section mentioned 287 genes unique for BFP+ cells. The Results section should include a rationale for performing a wider search for the enriched processes.

      GO enrichment using unique DEGS may not capture the wider landscape of the transcriptomic characteristics of BFP<sup>+</sup> astrocytes. The 287 unique genes were only differentially expressed in BFP<sup>+</sup> astrocytes. However, apart from these 287 genes, other genes among the 1187 DEGs were differentially expressed in BFP<sup>+</sup> astrocytes and in one other population.

      (11) For Fig. 4C and 4D, both p values and the number of genes should be indicated in the graph. I would advise to select the 10 or 15 most significant categories, these panels are very difficult to read. Whereas the listed processes for BFP+ have a relation to Parkinson disease, the ones detected for BFP- cells are related to extracellular matrix and tissue development. Does it mean that BFP+ cells have impaired formation of this matrix, or defective tissue development? This is in contradiction of enhanced calcium responses of BFP+ cells compared to BFP- cells.

      Information on all DEGs, including p values and numbers, is provided in Supplementary data 1-5.

      BFP+ astrocytes do have enrichment for GO terms related to extracellular matrix and tissue development, although not as obvious as BFP- astrocytes. Previous work have shown that both in vitro and in vivo derived astrocytes are functionally heterogeneous, containing functionally distinct subtypes exhibiting different GO enrichment profiles (doi: 10.1016/j.ygeno.2021.01.008; 10.1038/s41598-024-74732-7).

      (12) Both the comparison between midbrain and cortical astrocytes in Fig. S8A, and the volcano plot in S8B do not show consistent changes. For example, RCAN2 in Fig. S8A has the same intensity for cortical and midbrain cells, but is marked as an enriched gene in midbrain in the p vs log2FC graph in Fig. S8B.

      These are integrated analyses of published human datasets. S8A and S8B show the same data in different formats. The differences are better shown in the volcano plot/easier detected by the human eye.

      These are integrated analysis of published human datasets. S8A and S8B are the same data shown in different format. Differences are better shown in volcano plot /easier detected by the human eye. RCAN2 had a higher average expression in the midbrain than in the telencephalon, albeit small, and the difference was statistically significant (as shown in the volcano plot).


      The following is the authors’ response to the original reviews

      Reviewer 1:

      In vitro nature of this work being the fundamental weakness of this paper

      We disagree with this statement. As explained in the provisional response, the aim of this study was to test the validity of a general concept applied in pluripotent stem cell research that pluripotent stem cell-derived astrocytes faithfully represent the lineage heterogeneity of their ancestral neural progenitors and hence preserve the regionality of such progenitors. Our genetic lineage study is justified for addressing this in vitro-driven question. However, we have highlighted the rationale where appropriate in the revised paper.

      If regional identity is not maintained, so what? Don't we already know that this can happen? The authors acknowledge that this is known in the discussion.

      Importance of regional identity: Growing evidence demonstrates the functional heterogeneity of brain astrocytes in health and disease. Therefore, for in vitro disease modeling, it is believed that one should use astrocytes represent the anatomy of disease pathology; for example, midbrain astrocytes for studying dopamine neurodegeneration and Parkinson’s disease. Understanding the dynamics of stem cell-derived astrocytes and identifying astrocyte subtypes is important for their biomedical applications.

      Regional identity change/Discussion: It seems that the reviewer misunderstood the context in which the ‘identity change’ was discussed. The literature referred to (in the Discussion) concerns shifts in regional gene expression in bulk-cultured cells. In the days of pre-single-cell analysis/lineage tracking, one cannot distinguish whether this was due to a change in the transcriptomic landscape in progenies of the same lineage or alterations in lineage heterogeneity, but to interpret at face value as regional identity was not maintained. In the revised paper, we have made an effort to indicate that ‘regional identity’ is used broadly to refer to lineage relationships and/or traits rather than static gene expressioin.

      validation of the markers/additional work

      The scNAseq analysis performed in this study compared the profiles of astrocytes derived from LMX1A+ and LMX1A- ventral midbrain-patterned neural progenitors. Since it is not possible to perform genetic lineage tracking in humans and an analogous mouse lineage tracer line is not available, in vivo validation of these markers with respect to their lineage relationship is not currently feasible. However, we took advantage of abundant single-cell human astrocyte transcriptomic datasets and validated our genes in silico. We also validated the differential expression of selected markers in late BFP+ and BFP- astrocytes using immunocytochemistry, where reliable antibodies are available. The results of the additional analyses are presented in Figure S8 and Supplemental Data 5.

      Knowledge gaps concerning astrocyte development

      Reviewer 1 pointed out a number of knowledge gaps concerning astrocyte development, such as the transcriptomic landscape trajectories of midbrain floor plate cells as they progress towards astrocytes. Indeed, the limited knowledge on regional astrocyte molecule heterogeneity restricts the objective validation of in vitro-derived astrocyte subtypes and the development of novel approaches for their generation in vitro. We agree with the need for in-depth in vivo studies using model organisms, although these are beyond the scope of the current work.

      Reviewer 2:

      (1) The authors argue that the depletion of BFP seen in the unsorted population immediately after the onset of astrogenic induction is due to the growth advantage of the derivatives of the residual LMX1A- population. However, no objective data supporting this idea is provided, and one could also hypothesize that the residual LMX1A- cells could affect the overall LMX1A expression in the culture through negative paracrine regulation.

      We acknowledge the lack of evidence-based explanation for the depletion of BFP+ cells in mixed cultures. We were unable to perform additional experiments because of resource limitations. The design of the LMX1A-Cre/AAVS1-BFP lineage tracer line determines that BFP is expressed irreversibly in LMX1A-expressing cells or their derivatives regardless of their LMX1A expression status. Therefore, the potential negative paracrine regulation of LMX1A by residual LMX1A- cells should not affect cells that have already turned on BFP. We have highlighted the working principles of the LMX1A tracer line in the revised manuscript.

      (2) Furthermore, on line 124 it is stated that: "Interestingly, the sorted BFP+ cells exhibited similar population growth rate to that of unsorted cultures...". In the face of the suggested growth disadvantage of those cells, this statement needs clarification.

      To avoid confusion, we have removed the statement.

      (3) Regarding the fidelity of the model system, it is not clear to me how the TagBFP expression was detected in the BFP+ population supposedly in d87 and d136 pooled astrocytes (Fig S6C) while no LMX1A expression was observed in the same cells (Fig S6F).

      The TagBFP tracer is expressed in the progenies of LMX1A+ cells, regardless of their LMX1A expression status. We have gone through the MS text to ensure that this information has been provided.

      (4) The generated single-cell RNASeq dataset is extremely valuable. However, given the number of conditions included in this study (i.e. early vs late astrocytes, BFP+ vs BFP-, sorted vs unsorted, plus non-patterned and neuronal samples) the resulting analysis lacks detail. For instance, from a developmental perspective and to better grasp the functional significance of astrocytic heterogeneity, it would be interesting to map the identified clusters to early vs late populations and to the BFP status.

      We performed additional bioinformatics analysis, which provided independent support for the relative developmental maturity suggested by functional assays. The additional data are now provided in the revised Figure 3B, C, E.

      Moreover, although comprehensive, Figure S7 is complex to understand given that citations rather than the reference populations are depicted.

      The information provided in the revised Figure S7.

      (5) Do the authors have any consideration regarding the morphology of the astrocytes obtained in this study? None of the late astrocyte images depict a prototypical stellate morphology, which is reported in many other studies involving the generation of iPSC-derived astrocytes and which is associated with the maturity status of the cell.

      The morphology of our astrocytes was not unique to the present study. Many factors may influence the morphology of astrocytes, such as the culture media and supplements used, and maturity status. Based on the functional assays and limited GFAP expression, our astrocytes were relatively immature.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study is methodologically solid and introduces a compelling regulatory model. However, several mechanistic aspects and interpretations require clarification or additional experimental support to strengthen the conclusions.

      Strengths:

      (1) The manuscript presents a compelling structural and biochemical analysis of human glutamine synthetase, offering novel insights into product-induced filamentation.

      (2) The combination of cryo-EM, mutational analysis, and molecular dynamics provides a multifaceted view of filament assembly and enzyme regulation.

      (3) The contrast between human and E. coli GS filamentation mechanisms highlights a potentially unique mode of metabolic feedback in higher organisms.

      Weaknesses:

      (1) The mechanism underlying spontaneous di-decamer formation in the absence of glutamine is insufficiently explored and lacks quantitative biophysical validation.

      (2) Claims of decamer-only behavior in mutants rely solely on negative-stain EM and are not supported by orthogonal solution-based methods.

      We thank the reviewer for the summary and noting of the strengths. We agree that the evolutionary divergence of metabolic feedback in GS homologs is a fruitful avenue for future studies. With regard to the weaknesses, the di-decamer in the absence of glutamine only forms under high (higher than physiological) concentrations of enzyme. Our primary evidence for the mutant behavior was the lack of crosslinking (Figure 1E), with supplementary support from the negative stain. In the revised version we will soften the language to say “reduced” rather than “did not support” filament formation.

      Reviewer #2 (Public review):

      The authors set out to resolve the high-resolution structure of a glutamine synthetase (GS) decamer using cryo-EM, investigate glutamine binding at the decamer interface, and validate structural observations through biochemical assays of ATP hydrolysis linked to enzyme activity. Their work sits at the intersection of structural and functional biology, aiming to bridge atomic-level details with biological mechanisms - a goal with clear relevance to researchers studying enzyme catalysis and metabolic regulation.

      Strengths and weaknesses of methods and results:

      A key strength of the study lies in its use of cryo-EM, a technique well-suited for resolving large, dynamic macromolecular complexes like the GS decamer. The reported resolutions (down to 2.15 Å) initially suggest the potential for detailed structural insights, such as side-chain interactions and ligand density. However, several methodological limitations significantly undermine the reliability of the results:

      (1) Cryo-EM data processing: The absence of critical details about B-factor sharpening - a standard step to enhance map interpretability - is a major concern. For high-resolution maps (<3 Å), sharpening is typically applied to resolve side-chain features, yet the submitted maps (e.g., those in Figures 1D, 2D, and supplementary figures) appear unprocessed, with density quality inconsistent with the claimed resolutions. This makes it difficult to evaluate whether observed features (e.g., glutamine binding) are genuine or artifacts of unsharpened data.

      (2) Modeling and density consistency: The structural models, particularly for glutamine binding at the decamer interface, do not align with the reported resolution. The maps shown in Figure 2D and Supplementary Figure S7 lack sufficient density to confidently place glutamine or even surrounding residues, conflicting with claims of 2.15 Å resolution. Additionally, fitting a non-symmetric ligand (glutamine) into a symmetry-refined map requires justification, as symmetry constraints may distort ligand placement.

      (3) Biochemical assay controls: While the enzyme activity assays aim to link structure to function, they lack essential controls (e.g., blank reactions without GS or substrates, substrate omission tests) to confirm that ATP hydrolysis is GS-dependent. The use of TCEP, a reducing agent, is also not paired with experiments to rule out unintended effects on the PK/LDH system, further limiting confidence in activity measurements.

      Achievement of aims and support for conclusions:

      The study falls short of convincingly achieving its goals. The claimed high-resolution structural details (e.g., side-chain densities, ligand binding) are not supported by the provided maps, which lack sharpening and show inconsistencies in density quality. Similarly, the biochemical data do not robustly validate the structural claims due to missing controls. As a result, the evidence is insufficient to confirm glutamine binding at the decamer interface or the functional relevance of the observed structural features.

      Likely impact and utility:

      If these methodological gaps are addressed, the work could make a meaningful contribution to the field. A well-resolved GS decamer structure would advance understanding of enzyme assembly and ligand recognition, while validated biochemical assays would strengthen the link between structure and function. Improved data processing and clearer reporting of validation steps would also make the structural data more reliable for the community, providing a resource for future studies on GS or related enzymes.

      We disagree with the reviewer’s overall assessment.

      With regard to sharpening and resolution: we examined sharpened maps and in a revised version will present additional supplementary figures showing these maps side by side. We note that the resolutions reported are global and that the most interesting features are, of course, in the periphery and subject to conformational and compositional heterogeneity. We will include supplementary figures of core side chain densities that are more like what are expected by the reviewer in the revision. 

      With regard to modeling: the apo filament and turnover filament datasets were handled nearly identically. The additional density is therefore likely not artefactual to the symmetry operator - however, the lower resolution in this region noted by the reviewer is worthy of further exploration. The maps are public and we think this is the most plausible interpretation of the density, which we based primarily on the biochemical data and will include more speculation in the version.

      With regard to the biochemical controls: we point the reviewer to Figure S1, which shows that omission of ammonia or glutamate in the wild-type (tagless) system removes any coupling of the reactions. We will perform the additional controls to publication quality in the revised version along with the TCEP control. We note that the reducing agent is present across all experiments, ruling out an effect on any specific result. The inclusion of TCEP is also very standard in other published uses of the Coupled ATPase assay (e.g. PMID: 31778111 and PMID: 32483380 by our first author)

      Additional context:

      Cryo-EM has transformed structural biology by enabling high-resolution analysis of large complexes, but its success hinges on rigorous data processing and validation steps that are critical to ensuring reproducibility. The challenges highlighted here are not unique to this study; they reflect broader issues in the field where incomplete reporting of methods can obscure the reliability of results. By addressing these points, the authors would not only strengthen their current work but also set a positive example for transparent and rigorous structural biology research.

      All the data is public and the reviewer or anyone is free to reinterpret the maps and models - and we encourage that rather than just an interpretation of our static figures. In addition, we will upload the raw micrograph data for the apo filament and turnover filament datasets to EMPIAR prior to submitting the revision.

      Reviewer #3 (Public review):

      In this manuscript, the authors propose a product-dependent negative-feedback mechanism of human glutamine synthetase, whereby the product glutamine facilitates filament formation, leading to reduced catalytic specificity for ammonia. Using time-resolved cryo-EM, the authors demonstrate filament formation under product-rich conditions. Multiple high-quality structures, including decameric and di-decameric assemblies, were resolved under different biochemical states and combined with MD simulations, revealing that the conformational space of the active site loop is critical for the GS catalysis. The study also includes extensive steady-state kinetic assays, supporting the view that glutamine regulates GS assembly and its catalytic activity. Overall, this is a detailed and comprehensive study. However, I would advise that a few points be addressed and clarified.

      (1) In Figure 2D and Supplementary Figure 7, the extra density observed between the two decamers does not appear to have the defining features of a glutamine. A less defined density may be expected given the nature of the complex, but even though mutagenesis assays were performed to support this assignment, none of these results constitutes direct and conclusive evidence for glutamine binding at this site. I would thus suggest showing the density maps at multiple contour thresholds to allow readers to also better evaluate the various small molecules under turnover conditions that cannot be well fitted based on this density map, helping to provide a more balanced interpretation of the results.

      (2) On the same point regarding the density for the enzyme under turnover conditions, more details should be provided about the symmetry expansion and classification performed, and also show the approximate ratio of reconstructions that include this density. Did you try symmetry expansion followed by focused classification, especially on the interface region?

      (3) The interface between the two decamers of the model needs to be double-checked and reassigned, especially for the residues surrounding the fitted glutamine. For example, the side chain of the Lys residue shown in the attached figure is most likely modeled incorrectly.

      We thank the reviewer for the feedback. As noted above, we will include supplemental figures that show maps at multiple thresholds and sharpening schemes. We noted in the manuscript and above that our interpretation here is based on integrating biochemical evidence alongside the density and will make that even more clear in the revised manuscript. The filaments +/- the putative glutamine density were processed nearly identically, but we will attempt various schemes of focused classification/symmetry expansion in the revision as well. However, we point out that there is extensive averaging there that makes modeling a bit trickier than expected given the global resolution.

    1. Author response:

      Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only). 

      The reviewer is correct that in this version of the tool, we focused on SNPs with effect on gene expression, as the majority of the SNPs identified by GWASs are non-coding SNPs. In the future improvement, we should also include coding SNPs that change the amino acid sequence of genes. We will discuss this point more in the revised manuscript.

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead. 

      This is a simple scheme to incorporate eQTL data from multiple tissues, assuming that the tissue that gives the strongest association is most relevant, or mainly mediates the effect from the SNP to the phenotype. This is a reasonable approach given that the tissues of origin for most of the phenotypes are unknown. In the future improvement, we should incorporate eQTL data from the appropriate tissue(s) if that is known.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants. 

      This is due to one of the limitations of the current algorithm: no direction of association is predicted explicitly. It could be that increasing the expression of a gene reduced the risk of one disease but increase the risk of another. Currently we have to analyze the details of the SNPs to infer direction once overlapping genes are found. This needs improvement in the future.  

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role. 

      We plan to incorporate reviewer's suggestion in the revised manuscript.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional? 

      We will consider a different presentation in the revised manuscript.

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals? 

      We will add these information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      Missing tissues or sparse eQTL certainly can produce false negatives as the signals linking the two phenotypes are simply not captured in the data. It is less likely to produce false positives as long as the statistical test is well controlled.   

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this? 

      When there are multiple SNPs in LD with multiple genes nearby, it is generally difficult to map the causal SNP and the causal gene it affected, and thus there will be spurious gene-trait links. When we calculate the global similarity based on the gene-trait association profiles,  we tried to control this by simulating with random GWASs that have the same power as the real GWAS and preserve the LD structure, as the spurious links will also be present in the simulated data (but may appear in different loci) that are used to calibrate the statistical significance. 

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these. 

      We assign SNPs to genes based on their strongest eQTL association from the available data. Improvement can be made if the relevant tissues for a trait are known (see response to Reviewer 1 above).

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative. 

      We agree with the reviewer that the reported trait links are speculative, and they should be treated as hypotheses generated from the computational analyses. To truly validate some of these proposed relationships, deeper functional analyses and experimental tests are needed.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness. 

      We agree more control on LD and power-matched simulations will be important for testing the robustness of the predictions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

      This review focuses on the whole extravasation journey of leukocyte and highlights involvement of extracellular matrix (ECM) in multiple phases of the process. ECM may exert their roles either as a collective structure or as individual components. In the revision, for those functions involving specific matrix components, we will emphasize the matrix components and incorporate this information to subheadings as suggested. The parts of macrophage phenotype (Section 10-11) are included for its pivotal roles on deciding the tissue fate following inflammation (ie. to resolve / to regenerate damages incurred or to sustain inflammation), which is an important aspect of this review. ECM could modify macrophage phenotypes either directly (section 10) or indirectly via modulations of tissue stiffness or other cell types like fibroblasts (section 9). However, as pointed out by other reviewers as well, we acknowledge that Section 11 does not integrate well enough to the rest of the review. We plan to reorganize this part and to emphasize its link to ECM during the revision for better integration. We will reformat Table 1 for easier comprehension. We will consider restructuring Figure 2, which outlines various events influencing tissue decision of resolution/inflammation, perhaps by breaking up into two separate figures, to better focus the message. We will also check the language to improve readability.

      Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

      We will add a transition paragraph between Section 6 and Section 7 to provide a narrative that the extravasation processes affect downstream leukocyte functions. While lymphocytes follow a similar extravasation principle, their in-tissue activities differ from innate leukocytes. We will thus include discussion of lymphocyte-ECM crosstalk to Section 8 and/or 9 in the revision. We will restructure Section 11 and Figure 3 to better integrate to the rest of the review: In the current manuscript, we merely describe the capability of the MIKA framework to describe identity of any tissue macrophages and thus the framework could serve as a roadmap to facilitate identity normalization of pathological macrophages. We plan, in the revision, by employing the MIKA framework, to discuss and demonstrate linkage between macrophage identities and expression/production of modulators to functional ECM effectors described in Section 8-9. Regarding the comment of limited discussion of translational implications / therapeutic strategies, we will try to enrich this aspect throughout the manuscript where appropriate, in addition to the existing ones (eg. line 293-297; 388-391; 460-463; 512-517) We will also revise figure structure in general to avoid too dense information and to improve clarity. We will consider to provide a glossary explaining specialized terms to expand readership accessibility.

      Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

      We agree with and appreciate the specific and helpful suggestions by the reviewer. During the revision, we will provide the requested background description of MIKA to enhance accessibility of general readership. As pointed out by other reviewers, since this part (Section 11) is less well-integrated to the rest of the review, we will restructure this part by linking tissue macrophage identities under MIKA framework to modulation of functional ECM effectors described in previous sections (Section 8-9). We acknowledge the current figure organization might be overly information-dense and will consider breaking down the contents to multiple figures. The size and color-coding issues will also be addressed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

      Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM.

      Strengths:

      The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC lines including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript.

      Weaknesses:

      I have a few comments and suggestions for the authors. See below.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their  conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids

      recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the revised version of the manuscript we will use the single-cell RNA sequencing data and immunostainings to provide this information. Based on previous analyses from Birey et al (Cell Stem Cell, 2022), we expect interneurons within assembloids to express mostly calbindin (CALB2) and somatostatin (SST) at this in vitro stage of development; parvalbumin subtype appears later based on data from Birey et al (Nature, 2017) and more recently from Varela et al, (bioRxiv, 2025).

      In parallel, we will analyze available scRNA-seq data from developing human primary brain tissue a similar age as the one used in the manuscript, and check whether these subtypes of interneurons are similar to the ones within assembloids.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Figure S1). 

      We go agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer this important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Figure 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain astrocytes, we think these glia contribute to the observed pro-inflammatory changes. Based on these results and because ADM is known to have strong anti-inflammatory properties, the effects of ADM on hypoxic astrocytes should be investigated in future studies focused on hypoxia-induced inflammation. In the revision, we will address this comment in the discussion section and cite the appropriate papers.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the included experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision we will plot and include in the figures the data about the cell-type expression of ADM and its receptors in hCOs.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrom, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. We will revise the manuscript by incorporating a paragraph about this in the Discussion section.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hCOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we will add data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We will expand our discussion to include more details and the need to validate these findings using in vivo models, while also acknowledging that different species (e.g. rodents versus non-human primates versus humans) might have different responses to hypoxia.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we suggest these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other processes during cortical development. In the revised manuscript, we will include citations about the effects of hypoxia on interneuron proliferation, maturation and circuit integration as available, and also expand to other cell types known to be affected.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error and we will correct it in our revision.